Re: Mahout-232-0.8.patch using
Hi, Thanks for your suggestion Sebastian. But fact is that , I am working on SVM in my curriculum and I want to compare results in terms timing and accuracy of different classification techniques on Hadoop or Mahout. That is the only reason I want solution for SVM with Mahout i.e. Mahout-232 patch. -- Amol Kakade. On Tue, Mar 4, 2014 at 1:24 PM, Sebastian Schelter s...@apache.org wrote: Hi Amol, SVMs are not integrated in Mahout. I'd suggest you try our logistic regression classifier instead. Best, Sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use Mahout-232-0.8.patch for SVM in Mahout I have been trying for last 2 days but getting errors. -- Amol Kakade.
Re: Issue updating a FileDataModel
Thanks Sebastian. Although I got the FileDataModel updating correctly after following your advice, everything seems to point that I will need to use a database to back my dataModel. On Mon, Mar 3, 2014 at 3:47 PM, Sebastian Schelter s...@apache.org wrote: I think it depends on the difference between the time of the call to refresh() and the last modified time of the file. --sebastian On 03/03/2014 04:45 PM, Juan José Ramos wrote: Thanks for the reply, Sebastian. I do not have concurrent updates, but they actually may happen very, very close in time. Would the fact of adding the new preferences to new files or appending to the existing one make any difference or does everything depends on the time elapsed between two calls to recommender.refresh(null)? Many thanks. On Mon, Mar 3, 2014 at 1:18 PM, Sebastian Schelter s...@apache.org wrote: Hi Juan, IIRC then FileDataModel has a parameter that determines how much time must have been spent since the last modification of the underlying file. You can also directly append new data to the original file. If you want a to have a DataModel that can be concurrently updated, I suggest your data to a database. --sebastian On 03/02/2014 11:11 PM, Juan José Ramos wrote: I am having issues refreshing my recommender, in particular with the DataModel. I am using a FileDataModel and a GenericItemBasedRecommender that also has a CachingItemSimilarity wrapping a FileItemSimilarity. But for the test I am running I am making things even simpler. By the time I instantiate the recommender, these two files are in the FileSystem: data/datamodel.txt 0,1,0.0 data/datamodel.0.txt 0,2,1.0 And then I run the code you can find below: --- FileDataModel dataModel = new FileDataModel(new File(data/dataModel.txt )); FileItemSimilarity itemSimilarity = new FileItemSimilarity(new File( data/similarities)); GenericItemBasedRecommender itemRecommender = newGenericItemBasedRecommender(dataModel, itemSimilarity); System.out.println(Number of users in the system: + itemRecommender.getDataModel().getNumUsers()+ and + itemRecommender.getDataModel().getNumItems() + items); FileWriter writer = new FileWriter(new File(data/dataModel.1.txt)); writer.write(1,2,1.0\r); writer.close(); writer = new FileWriter(new File(data/dataModel.2.txt)); writer.write(2,2,1.0\r); writer.close(); writer = new FileWriter(new File(data/dataModel.3.txt)); writer.write(3,2,1.0\r); writer.close(); writer = new FileWriter(new File(data/dataModel.4.txt)); writer.write(4,2,1.0\r); writer.close(); writer = new FileWriter(new File(data/dataModel.5.txt)); writer.write(5,2,1.0\r); writer.close(); writer = new FileWriter(new File(data/dataModel.6.txt)); writer.write(6,2,1.0\r); writer.close(); itemRecommender.refresh(null); System.out.println(Number of users in the system: + itemRecommender.getDataModel().getNumUsers()+ and + itemRecommender.getDataModel().getNumItems() + items); --- The output is the same in both println: Number of users in the system: 2 and 2items. So, only the information from the files that were on the system by the time I run this test seem to get loaded on the DataModel. What can be causing that? Is there a maximum number of updates a FileDataModel can take up in every refresh? Could it be that actually by the time I call itemRecommender.refresh(null) the files have not been written to the FileSystem? Should I be calling refresh in a different manner? Thank you for your help.
Re: Mahout-232-0.8.patch using
I think you should rather choose a different library that already offers an SVM than trying to revive a 4 year old patch. --sebastian On 03/04/2014 08:51 AM, Amol Kakade wrote: Hi, I am new user of Mahout and want to run sample SVM algorithm with Mahout. Can you please list me steps to use Mahout-232-0.8.patch for SVM in Mahout I have been trying for last 2 days but getting errors. -- Amol Kakade.
Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Hi following command: /usr/lib/hadoop-yarn/bin/yarn jar mahout-distribution-0.9/mahout-examples-0.9.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest When I used hadoop 1.x then it worked. Now I use hadoop-2.2.0 it gives me: 14/03/04 15:25:58 INFO mapreduce.BuildForest: Partial Mapred implementation 14/03/04 15:25:58 INFO mapreduce.BuildForest: Building the forest... 14/03/04 15:26:01 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 14/03/04 15:26:05 INFO input.FileInputFormat: Total input paths to process : 1 14/03/04 15:26:05 INFO mapreduce.JobSubmitter: number of splits:1 14/03/04 15:26:05 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/03/04 15:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393936067845_0011 14/03/04 15:26:07 INFO impl.YarnClientImpl: Submitted application application_1393936067845_0011 to ResourceManager at /0.0.0.0:8032 14/03/04 15:26:07 INFO mapreduce.Job: The url to track the job: http://vm38.dbweb.ee:8088/proxy/application_1393936067845_0011/ 14/03/04 15:26:07 INFO mapreduce.Job: Running job: job_1393936067845_0011 14/03/04 15:26:36 INFO mapreduce.Job: Job job_1393936067845_0011 running in uber mode : false 14/03/04 15:26:36 INFO mapreduce.Job: map 0% reduce 0% 14/03/04 15:27:00 INFO mapreduce.Job: map 100% reduce 0% 14/03/04 15:27:26 INFO mapreduce.Job: Job job_1393936067845_0011 completed successfully 14/03/04 15:27:26 INFO mapreduce.Job: Counters: 27 File System Counters FILE: Number of bytes read=2994 FILE: Number of bytes written=80677 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=880103 HDFS: Number of bytes written=2483042 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=46056 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input records=9994 Map output records=100 Input split bytes=123 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=425 CPU time spent (ms)=32890 Physical memory (bytes) snapshot=189755392 Virtual memory (bytes) snapshot=992145408 Total committed heap usage (bytes)=111673344 File Input Format Counters Bytes Read=879980 File Output Format Counters
Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Mahout 0.9 not supported hadoop 2 dependencies. You can use mahout-1.0-SNAPSHOT or add to your mahout patch from https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2 support. On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote: Hi following command: /usr/lib/hadoop-yarn/bin/yarn jar mahout-distribution-0.9/mahout-examples-0.9.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest When I used hadoop 1.x then it worked. Now I use hadoop-2.2.0 it gives me: 14/03/04 15:25:58 INFO mapreduce.BuildForest: Partial Mapred implementation 14/03/04 15:25:58 INFO mapreduce.BuildForest: Building the forest... 14/03/04 15:26:01 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032 14/03/04 15:26:05 INFO input.FileInputFormat: Total input paths to process : 1 14/03/04 15:26:05 INFO mapreduce.JobSubmitter: number of splits:1 14/03/04 15:26:05 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/03/04 15:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393936067845_0011 14/03/04 15:26:07 INFO impl.YarnClientImpl: Submitted application application_1393936067845_0011 to ResourceManager at /0.0.0.0:8032 14/03/04 15:26:07 INFO mapreduce.Job: The url to track the job: http://vm38.dbweb.ee:8088/proxy/application_1393936067845_0011/ 14/03/04 15:26:07 INFO mapreduce.Job: Running job: job_1393936067845_0011 14/03/04 15:26:36 INFO mapreduce.Job: Job job_1393936067845_0011 running in uber mode : false 14/03/04 15:26:36 INFO mapreduce.Job: map 0% reduce 0% 14/03/04 15:27:00 INFO mapreduce.Job: map 100% reduce 0% 14/03/04 15:27:26 INFO mapreduce.Job: Job job_1393936067845_0011 completed successfully 14/03/04 15:27:26 INFO mapreduce.Job: Counters: 27 File System Counters FILE: Number of bytes read=2994 FILE: Number of bytes written=80677 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=880103 HDFS: Number of bytes written=2483042 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=46056 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input records=9994 Map output records=100 Input split bytes=123 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=425 CPU time spent
PCA with ssvd leads to StackOverFlowError
Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca true -U false -V false -t 3 -ow I also tried to put -us true as mentionned in https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but the option is not available anymore. The output of the previous command is : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true], --computeU=[false], --computeV=[false], --endPhase=[2147483647], --input=[/user/myUser/Echant100k], --minSplitSize=[-1], --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100], --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0], --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]} Exception in thread main java.lang.StackOverflowError at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) ... I search online and didn't find a solution to my problem. Can you help me ? Thanks in advance, -- Kévin Moulart
Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Sory, I didn't see that you try use mahout-1.0-snapshot. You used /usr/lib/hadoop-yarn/bin/yarn but need use /usr/lib/hadoop/bin/hadoop and then your example will be success. On Tue, Mar 4, 2014 at 3:45 PM, Sergey Svinarchuk ssvinarc...@hortonworks.com wrote: Mahout 0.9 not supported hadoop 2 dependencies. You can use mahout-1.0-SNAPSHOT or add to your mahout patch from https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2 support. On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote: Hi following command: /usr/lib/hadoop-yarn/bin/yarn jar mahout-distribution-0.9/mahout-examples-0.9.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest When I used hadoop 1.x then it worked. Now I use hadoop-2.2.0 it gives me: 14/03/04 15:25:58 INFO mapreduce.BuildForest: Partial Mapred implementation 14/03/04 15:25:58 INFO mapreduce.BuildForest: Building the forest... 14/03/04 15:26:01 INFO client.RMProxy: Connecting to ResourceManager at / 0.0.0.0:8032 14/03/04 15:26:05 INFO input.FileInputFormat: Total input paths to process : 1 14/03/04 15:26:05 INFO mapreduce.JobSubmitter: number of splits:1 14/03/04 15:26:05 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files.filesizes is deprecated. Instead, use mapreduce.job.cache.files.filesizes 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files is deprecated. Instead, use mapreduce.job.cache.files 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files.timestamps is deprecated. Instead, use mapreduce.job.cache.files.timestamps 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/03/04 15:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1393936067845_0011 14/03/04 15:26:07 INFO impl.YarnClientImpl: Submitted application application_1393936067845_0011 to ResourceManager at /0.0.0.0:8032 14/03/04 15:26:07 INFO mapreduce.Job: The url to track the job: http://vm38.dbweb.ee:8088/proxy/application_1393936067845_0011/ 14/03/04 15:26:07 INFO mapreduce.Job: Running job: job_1393936067845_0011 14/03/04 15:26:36 INFO mapreduce.Job: Job job_1393936067845_0011 running in uber mode : false 14/03/04 15:26:36 INFO mapreduce.Job: map 0% reduce 0% 14/03/04 15:27:00 INFO mapreduce.Job: map 100% reduce 0% 14/03/04 15:27:26 INFO mapreduce.Job: Job job_1393936067845_0011 completed successfully 14/03/04 15:27:26 INFO mapreduce.Job: Counters: 27 File System Counters FILE: Number of bytes read=2994 FILE: Number of bytes written=80677 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=880103 HDFS: Number of bytes written=2483042 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=46056 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input
Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Hi thanks for reply. Here is my output: [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop version Hadoop 2.2.0.2.0.6.0-101 Subversion g...@github.com:hortonworks/hadoop.git -r b07b2906c36defd389c8b5bd22bebc1bead8115b Compiled by jenkins on 2014-01-09T05:18Z Compiled with protoc 2.5.0 From source with checksum 704f1e463ebc4fb89353011407e965 This command was run using /usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-101.jar [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop jar mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest ... 14/03/04 16:22:51 INFO mapreduce.Job: map 0% reduce 0% 14/03/04 16:23:12 INFO mapreduce.Job: map 100% reduce 0% 14/03/04 16:23:43 INFO mapreduce.Job: Job job_1393936067845_0013 completed successfully 14/03/04 16:23:44 INFO mapreduce.Job: Counters: 27 File System Counters FILE: Number of bytes read=2994 FILE: Number of bytes written=80677 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=880103 HDFS: Number of bytes written=2436546 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=45253 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input records=9994 Map output records=100 Input split bytes=123 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=456 CPU time spent (ms)=36010 Physical memory (bytes) snapshot=180752384 Virtual memory (bytes) snapshot=994275328 Total committed heap usage (bytes)=101187584 File Input Format Counters Bytes Read=879980 File Output Format Counters Bytes Written=2436546 Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:113) at org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89) at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:294) at org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:228) at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:188) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:252) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Tervitades, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) -BEGIN PUBLIC KEY- MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa BjM8j36yJvoBVsfOHQIDAQAB -END PUBLIC KEY- On 04/03/14 16:11, Sergey Svinarchuk wrote: Sory, I didn't see that you try use mahout-1.0-snapshot. You used /usr/lib/hadoop-yarn/bin/yarn but need use /usr/lib/hadoop/bin/hadoop and then your example will be success. On Tue, Mar 4, 2014 at 3:45 PM, Sergey Svinarchuk ssvinarc...@hortonworks.com wrote: Mahout 0.9 not supported hadoop 2 dependencies. You can use mahout-1.0-SNAPSHOT or add to your mahout patch from https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2 support. On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote: Hi following command: /usr/lib/hadoop-yarn/bin/yarn jar mahout-distribution-0.9/mahout-examples-0.9.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest When I used hadoop 1.x then it worked. Now I use hadoop-2.2.0 it gives me: 14/03/04 15:25:58 INFO
Re: how to recommend users already consumed items
I’d suggest a command line option if you want to submit a patch. Most people will want that line executed so the default should be the current behavior. But a large minority will want it your way. And please do submit a patch with the Jira, it will make your life easier when new releases come out you won’t have to manage a fork. On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote: Juan, I don't understand your solution, if there are no ratings how can you blend the recommendations from the system and the user's already read news. Anyway, I think, as Pat does, the best way is to remove the mentioned line. It should be the responsibility of the business logic to remove user's items if needed. I will also create a Jira issue as you suggested. thanks On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote: You are not the only one to see this so I'd recommend creating an option for the Job, which will be checked before executing that line of code then submit it as a patch to the Jira you need to create in any case. That way it might get into the mainline and you won't have to maintain a fork. Avoiding the cost of a fork over a trivial issue like this is a grand idea.
Re: PCA with ssvd leads to StackOverFlowError
Kevin, thanks for reporting this. Stack overflow error has not been known to happen to date. But i will take a look. It looks like a bug in the mean computation code, given your stack trace, although it may have been induced by some circumstances specific to your deployment. What version is it? 0.9? As for -us, it is not known to have been removed to me. If it were, it happened without my knowledge. I will take a look at the trunk. -d On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote: Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca true -U false -V false -t 3 -ow I also tried to put -us true as mentionned in https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but the option is not available anymore. The output of the previous command is : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true], --computeU=[false], --computeV=[false], --endPhase=[2147483647], --input=[/user/myUser/Echant100k], --minSplitSize=[-1], --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100], --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0], --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]} Exception in thread main java.lang.StackOverflowError at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) ... I search online and didn't find a solution to my problem. Can you help me ? Thanks in advance, -- Kévin Moulart
Re: PCA with ssvd leads to StackOverFlowError
It doesn't look like -us has been removed. At least i see it on the head of the trunk, SSVDCli.java, line 62: addOption(uSigma, us, Compute U * Sigma, String.valueOf(false)); i.e. short version(single dash) -us true, or long version(double-dash) --uSigma true. Can you check again with 0.9? thanks. On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Kevin, thanks for reporting this. Stack overflow error has not been known to happen to date. But i will take a look. It looks like a bug in the mean computation code, given your stack trace, although it may have been induced by some circumstances specific to your deployment. What version is it? 0.9? As for -us, it is not known to have been removed to me. If it were, it happened without my knowledge. I will take a look at the trunk. -d On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote: Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca true -U false -V false -t 3 -ow I also tried to put -us true as mentionned in https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but the option is not available anymore. The output of the previous command is : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true], --computeU=[false], --computeV=[false], --endPhase=[2147483647], --input=[/user/myUser/Echant100k], --minSplitSize=[-1], --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100], --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0], --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]} Exception in thread main java.lang.StackOverflowError at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) ... I search online and didn't find a solution to my problem. Can you help me ? Thanks in advance, -- Kévin Moulart
Re: PCA with ssvd leads to StackOverFlowError
as for the stack trace, it looks like it doesn't agree with current trunk. Again, i need to know which version you are running. But from looking at current trunk, i don't really see how that may be happening at the moment. On Tue, Mar 4, 2014 at 9:40 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: It doesn't look like -us has been removed. At least i see it on the head of the trunk, SSVDCli.java, line 62: addOption(uSigma, us, Compute U * Sigma, String.valueOf(false)); i.e. short version(single dash) -us true, or long version(double-dash) --uSigma true. Can you check again with 0.9? thanks. On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov dlie...@gmail.comwrote: Kevin, thanks for reporting this. Stack overflow error has not been known to happen to date. But i will take a look. It looks like a bug in the mean computation code, given your stack trace, although it may have been induced by some circumstances specific to your deployment. What version is it? 0.9? As for -us, it is not known to have been removed to me. If it were, it happened without my knowledge. I will take a look at the trunk. -d On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote: Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca true -U false -V false -t 3 -ow I also tried to put -us true as mentionned in https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but the option is not available anymore. The output of the previous command is : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true], --computeU=[false], --computeV=[false], --endPhase=[2147483647], --input=[/user/myUser/Echant100k], --minSplitSize=[-1], --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100], --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0], --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]} Exception in thread main java.lang.StackOverflowError at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) ... I search online and didn't find a solution to my problem. Can you help me ? Thanks in advance, -- Kévin Moulart
Re: PCA with ssvd leads to StackOverFlowError
I have not seen the stackoverflow error, but this code has been fixed since .8 Sent from my iPhone On Mar 4, 2014, at 12:40 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: It doesn't look like -us has been removed. At least i see it on the head of the trunk, SSVDCli.java, line 62: addOption(uSigma, us, Compute U * Sigma, String.valueOf(false)); i.e. short version(single dash) -us true, or long version(double-dash) --uSigma true. Can you check again with 0.9? thanks. On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Kevin, thanks for reporting this. Stack overflow error has not been known to happen to date. But i will take a look. It looks like a bug in the mean computation code, given your stack trace, although it may have been induced by some circumstances specific to your deployment. What version is it? 0.9? As for -us, it is not known to have been removed to me. If it were, it happened without my knowledge. I will take a look at the trunk. -d On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote: Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca true -U false -V false -t 3 -ow I also tried to put -us true as mentionned in https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but the option is not available anymore. The output of the previous command is : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true], --computeU=[false], --computeV=[false], --endPhase=[2147483647], --input=[/user/myUser/Echant100k], --minSplitSize=[-1], --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100], --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0], --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]} Exception in thread main java.lang.StackOverflowError at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) ... I search online and didn't find a solution to my problem. Can you help me ? Thanks in advance, -- Kévin Moulart
Re: PCA with ssvd leads to StackOverFlowError
The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7 which had this issue (from ur stacktrace, its apparent u r using Mahout 0.7). Please upgrade to the latest mahout version. On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart kevinmoul...@gmail.com wrote: Hi, I'm trying to apply a PCA to reduce the dimension of a matrix of 1603 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and I always get a StackOverflowError : Here is my command line : mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100 -pca true -U false -V false -t 3 -ow I also tried to put -us true as mentionned in https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but the option is not available anymore. The output of the previous command is : MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments: {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true], --computeU=[false], --computeV=[false], --endPhase=[2147483647], --input=[/user/myUser/Echant100k], --minSplitSize=[-1], --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100], --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0], --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp], --uHalfSigma=[false], --vHalfSigma=[false]} Exception in thread main java.lang.StackOverflowError at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) at org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55) ... I search online and didn't find a solution to my problem. Can you help me ? Thanks in advance, -- Kévin Moulart
Re: how to recommend users already consumed items
I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. What do you think? Best, Sebastian On 03/04/2014 05:32 PM, Pat Ferrel wrote: I’d suggest a command line option if you want to submit a patch. Most people will want that line executed so the default should be the current behavior. But a large minority will want it your way. And please do submit a patch with the Jira, it will make your life easier when new releases come out you won’t have to manage a fork. On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote: Juan, I don't understand your solution, if there are no ratings how can you blend the recommendations from the system and the user's already read news. Anyway, I think, as Pat does, the best way is to remove the mentioned line. It should be the responsibility of the business logic to remove user's items if needed. I will also create a Jira issue as you suggested. thanks On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote: You are not the only one to see this so I'd recommend creating an option for the Job, which will be checked before executing that line of code then submit it as a patch to the Jira you need to create in any case. That way it might get into the mainline and you won't have to maintain a fork. Avoiding the cost of a fork over a trivial issue like this is a grand idea.
Re: how to recommend users already consumed items
Sent from my iPhone On Mar 4, 2014, at 22:13, Sebastian Schelter s...@apache.org wrote: I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. +1 for that What do you think? Best, Sebastian On 03/04/2014 05:32 PM, Pat Ferrel wrote: I’d suggest a command line option if you want to submit a patch. Most people will want that line executed so the default should be the current behavior. But a large minority will want it your way. And please do submit a patch with the Jira, it will make your life easier when new releases come out you won’t have to manage a fork. On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote: Juan, I don't understand your solution, if there are no ratings how can you blend the recommendations from the system and the user's already read news. Anyway, I think, as Pat does, the best way is to remove the mentioned line. It should be the responsibility of the business logic to remove user's items if needed. I will also create a Jira issue as you suggested. thanks On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote: You are not the only one to see this so I'd recommend creating an option for the Job, which will be checked before executing that line of code then submit it as a patch to the Jira you need to create in any case. That way it might get into the mainline and you won't have to maintain a fork. Avoiding the cost of a fork over a trivial issue like this is a grand idea.
Re: how to recommend users already consumed items
I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. I agree (if the parameter is missing then defaults to current behavior as Pat suggested) On 03/04/2014 05:32 PM, Pat Ferrel wrote: I'd suggest a command line option if you want to submit a patch. Most people will want that line executed so the default should be the current behavior. But a large minority will want it your way. And please do submit a patch with the Jira, it will make your life easier when new releases come out you won't have to manage a fork. On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote: Juan, I don't understand your solution, if there are no ratings how can you blend the recommendations from the system and the user's already read news. Anyway, I think, as Pat does, the best way is to remove the mentioned line. It should be the responsibility of the business logic to remove user's items if needed. I will also create a Jira issue as you suggested. thanks On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote: You are not the only one to see this so I'd recommend creating an option for the Job, which will be checked before executing that line of code then submit it as a patch to the Jira you need to create in any case. That way it might get into the mainline and you won't have to maintain a fork. Avoiding the cost of a fork over a trivial issue like this is a grand idea.
Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
Margusja, From trunk, can you build mahout using the following command and try again: mvn clean package -DskipTests=true -Dhadoop2.version=2.2.0 Best Gokhan On Tue, Mar 4, 2014 at 4:25 PM, Margusja mar...@roo.ee wrote: Hi thanks for reply. Here is my output: [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop version Hadoop 2.2.0.2.0.6.0-101 Subversion g...@github.com:hortonworks/hadoop.git -r b07b2906c36defd389c8b5bd22bebc1bead8115b Compiled by jenkins on 2014-01-09T05:18Z Compiled with protoc 2.5.0 From source with checksum 704f1e463ebc4fb89353011407e965 This command was run using /usr/lib/hadoop/hadoop-common- 2.2.0.2.0.6.0-101.jar [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop jar mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -d input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p -t 100 -o nsl-forest ... 14/03/04 16:22:51 INFO mapreduce.Job: map 0% reduce 0% 14/03/04 16:23:12 INFO mapreduce.Job: map 100% reduce 0% 14/03/04 16:23:43 INFO mapreduce.Job: Job job_1393936067845_0013 completed successfully 14/03/04 16:23:44 INFO mapreduce.Job: Counters: 27 File System Counters FILE: Number of bytes read=2994 FILE: Number of bytes written=80677 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=880103 HDFS: Number of bytes written=2436546 HDFS: Number of read operations=5 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=45253 Total time spent by all reduces in occupied slots (ms)=0 Map-Reduce Framework Map input records=9994 Map output records=100 Input split bytes=123 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=456 CPU time spent (ms)=36010 Physical memory (bytes) snapshot=180752384 Virtual memory (bytes) snapshot=994275328 Total committed heap usage (bytes)=101187584 File Input Format Counters Bytes Read=879980 File Output Format Counters Bytes Written=2436546 Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.mahout.classifier.df.mapreduce.partial. PartialBuilder.processOutput(PartialBuilder.java:113) at org.apache.mahout.classifier.df.mapreduce.partial. PartialBuilder.parseOutput(PartialBuilder.java:89) at org.apache.mahout.classifier.df.mapreduce.Builder.build( Builder.java:294) at org.apache.mahout.classifier.df.mapreduce.BuildForest. buildForest(BuildForest.java:228) at org.apache.mahout.classifier.df.mapreduce.BuildForest.run( BuildForest.java:188) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.mahout.classifier.df.mapreduce.BuildForest.main( BuildForest.java:252) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke( NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) Tervitades, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee http://ee.linkedin.com/in/margusroo skype: margusja ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314) -BEGIN PUBLIC KEY- MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa BjM8j36yJvoBVsfOHQIDAQAB -END PUBLIC KEY- On 04/03/14 16:11, Sergey Svinarchuk wrote: Sory, I didn't see that you try use mahout-1.0-snapshot. You used /usr/lib/hadoop-yarn/bin/yarn but need use /usr/lib/hadoop/bin/hadoop and then your example will be success. On Tue, Mar 4, 2014 at 3:45 PM, Sergey Svinarchuk ssvinarc...@hortonworks.com wrote: Mahout 0.9 not supported hadoop 2 dependencies. You can use mahout-1.0-SNAPSHOT or add to your mahout patch from https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2 support. On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote: Hi following command: /usr/lib/hadoop-yarn/bin/yarn jar
Re: how to recommend users already consumed items
I have created a Jira issue already. I only use the non-hadoop part of Mahout recommender algorithms. May be I can create a patch for that part. However, I have not done it before, and don't know how to proceed. On Wed, Mar 5, 2014 at 1:01 AM, Sebastian Schelter s...@apache.org wrote: Would you be willing to set up a jira issue and create a patch for this? --sebastian On 03/04/2014 11:58 PM, Mario Levitin wrote: I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. I agree (if the parameter is missing then defaults to current behavior as Pat suggested) On 03/04/2014 05:32 PM, Pat Ferrel wrote: I'd suggest a command line option if you want to submit a patch. Most people will want that line executed so the default should be the current behavior. But a large minority will want it your way. And please do submit a patch with the Jira, it will make your life easier when new releases come out you won't have to manage a fork. On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote: Juan, I don't understand your solution, if there are no ratings how can you blend the recommendations from the system and the user's already read news. Anyway, I think, as Pat does, the best way is to remove the mentioned line. It should be the responsibility of the business logic to remove user's items if needed. I will also create a Jira issue as you suggested. thanks On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote: You are not the only one to see this so I'd recommend creating an option for the Job, which will be checked before executing that line of code then submit it as a patch to the Jira you need to create in any case. That way it might get into the mainline and you won't have to maintain a fork. Avoiding the cost of a fork over a trivial issue like this is a grand idea.
Re: how to recommend users already consumed items
That's fine, I was talking about the non-distributed part only. This page has instructions on how to create patches: https://mahout.apache.org/developers/how-to-contribute.html Let me know if you need more infos! Best, Sebastian On 03/05/2014 12:27 AM, Mario Levitin wrote: I have created a Jira issue already. I only use the non-hadoop part of Mahout recommender algorithms. May be I can create a patch for that part. However, I have not done it before, and don't know how to proceed. On Wed, Mar 5, 2014 at 1:01 AM, Sebastian Schelter s...@apache.org wrote: Would you be willing to set up a jira issue and create a patch for this? --sebastian On 03/04/2014 11:58 PM, Mario Levitin wrote: I think we should introduce a new parameter for the recommend() method in the Recommender interface that tells whether already known items should be recommended or not. I agree (if the parameter is missing then defaults to current behavior as Pat suggested) On 03/04/2014 05:32 PM, Pat Ferrel wrote: I'd suggest a command line option if you want to submit a patch. Most people will want that line executed so the default should be the current behavior. But a large minority will want it your way. And please do submit a patch with the Jira, it will make your life easier when new releases come out you won't have to manage a fork. On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote: Juan, I don't understand your solution, if there are no ratings how can you blend the recommendations from the system and the user's already read news. Anyway, I think, as Pat does, the best way is to remove the mentioned line. It should be the responsibility of the business logic to remove user's items if needed. I will also create a Jira issue as you suggested. thanks On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote: On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote: You are not the only one to see this so I'd recommend creating an option for the Job, which will be checked before executing that line of code then submit it as a patch to the Jira you need to create in any case. That way it might get into the mainline and you won't have to maintain a fork. Avoiding the cost of a fork over a trivial issue like this is a grand idea.