Re: Mahout-232-0.8.patch using

2014-03-04 Thread Amol Kakade
Hi,
Thanks for your suggestion Sebastian.
But fact is that , I am working on SVM in my curriculum and I want to
compare results in terms timing and accuracy of different classification
techniques on Hadoop or Mahout. That is the only reason I want solution for
SVM with Mahout i.e. Mahout-232 patch.

--
Amol  Kakade.


On Tue, Mar 4, 2014 at 1:24 PM, Sebastian Schelter s...@apache.org wrote:

 Hi Amol,

 SVMs are not integrated in Mahout. I'd suggest you try our logistic
 regression classifier instead.

 Best,
 Sebastian


 On 03/04/2014 08:51 AM, Amol Kakade wrote:

 Hi,
 I am new user of Mahout and want to run sample SVM algorithm with Mahout.
 Can you please list me steps to use Mahout-232-0.8.patch for SVM in Mahout
 I have been trying for last 2 days but getting errors.
 --
 Amol  Kakade.





Re: Issue updating a FileDataModel

2014-03-04 Thread Juan José Ramos
Thanks Sebastian.

Although I got the FileDataModel updating correctly after following your
advice, everything seems to point that I will need to use a database to
back my dataModel.


On Mon, Mar 3, 2014 at 3:47 PM, Sebastian Schelter s...@apache.org wrote:

 I think it depends on the difference between the time of the call to
 refresh() and the last modified time of the file.

 --sebastian


 On 03/03/2014 04:45 PM, Juan José Ramos wrote:

 Thanks for the reply, Sebastian.

 I do not have concurrent updates, but they actually may happen very, very
 close in time.

 Would the fact of adding the new preferences to new files or appending to
 the existing one make any difference or does everything depends on the
 time
 elapsed between two calls to recommender.refresh(null)?

 Many thanks.


 On Mon, Mar 3, 2014 at 1:18 PM, Sebastian Schelter s...@apache.org
 wrote:

  Hi Juan,

 IIRC then FileDataModel has a parameter that determines how much time
 must
 have been spent since the last modification of the underlying file. You
 can
 also directly append new data to the original file.

 If you want a to have a DataModel that can be concurrently updated, I
 suggest your data to a database.

 --sebastian


 On 03/02/2014 11:11 PM, Juan José Ramos wrote:

  I am having issues refreshing my recommender, in particular with the
 DataModel.

 I am using a FileDataModel and a GenericItemBasedRecommender that also
 has
 a CachingItemSimilarity wrapping a FileItemSimilarity. But for the test
 I
 am running I am making things even simpler.

 By the time I instantiate the recommender, these two files are in the
 FileSystem:
 data/datamodel.txt
 0,1,0.0

 data/datamodel.0.txt
 0,2,1.0

 And then I run the code you can find below:

 
 ---

 FileDataModel dataModel = new FileDataModel(new
 File(data/dataModel.txt
 ));

  FileItemSimilarity itemSimilarity = new FileItemSimilarity(new
 File(
 data/similarities));

GenericItemBasedRecommender itemRecommender =
 newGenericItemBasedRecommender(dataModel, itemSimilarity);


  System.out.println(Number of users in the system:  +
 itemRecommender.getDataModel().getNumUsers()+ and  +
 itemRecommender.getDataModel().getNumItems() + items);

FileWriter writer = new FileWriter(new File(data/dataModel.1.txt));

writer.write(1,2,1.0\r);

writer.close();

  writer = new FileWriter(new File(data/dataModel.2.txt));

writer.write(2,2,1.0\r);

writer.close();

  writer = new FileWriter(new File(data/dataModel.3.txt));

writer.write(3,2,1.0\r);

writer.close();

  writer = new FileWriter(new File(data/dataModel.4.txt));

writer.write(4,2,1.0\r);

writer.close();

  writer = new FileWriter(new File(data/dataModel.5.txt));

writer.write(5,2,1.0\r);

writer.close();

  writer = new FileWriter(new File(data/dataModel.6.txt));

writer.write(6,2,1.0\r);

writer.close();

itemRecommender.refresh(null);

System.out.println(Number of users in the system:  +
 itemRecommender.getDataModel().getNumUsers()+ and  +
 itemRecommender.getDataModel().getNumItems() + items);

 
 ---

 The output is the same in both println: Number of users in the system: 2
 and 2items. So, only the information from the files that were on the
 system
 by the time I run this test seem to get loaded on the DataModel.

 What can be causing that? Is there a maximum number of updates a
 FileDataModel can take up in every refresh?

 Could it be that actually by the time I call
 itemRecommender.refresh(null)
 the files have not been written to the FileSystem?

 Should I be calling refresh in a different manner?

 Thank you for your help.








Re: Mahout-232-0.8.patch using

2014-03-04 Thread Sebastian Schelter
I think you should rather choose a different library that already offers 
an SVM than trying to revive a 4 year old patch.


--sebastian

On 03/04/2014 08:51 AM, Amol Kakade wrote:

Hi,
I am new user of Mahout and want to run sample SVM algorithm with Mahout.
Can you please list me steps to use Mahout-232-0.8.patch for SVM in Mahout
I have been trying for last 2 days but getting errors.
--
Amol  Kakade.





Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

2014-03-04 Thread Margusja

Hi

following command:
/usr/lib/hadoop-yarn/bin/yarn jar 
mahout-distribution-0.9/mahout-examples-0.9.jar 
org.apache.mahout.classifier.df.mapreduce.BuildForest -d 
input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 
-p -t 100 -o nsl-forest


When I used hadoop 1.x then it worked.
Now I use hadoop-2.2.0 it gives me:
14/03/04 15:25:58 INFO mapreduce.BuildForest: Partial Mapred implementation
14/03/04 15:25:58 INFO mapreduce.BuildForest: Building the forest...
14/03/04 15:26:01 INFO client.RMProxy: Connecting to ResourceManager at 
/0.0.0.0:8032
14/03/04 15:26:05 INFO input.FileInputFormat: Total input paths to 
process : 1

14/03/04 15:26:05 INFO mapreduce.JobSubmitter: number of splits:1
14/03/04 15:26:05 INFO Configuration.deprecation: user.name is 
deprecated. Instead, use mapreduce.job.user.name
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.jar is 
deprecated. Instead, use mapreduce.job.jar
14/03/04 15:26:05 INFO Configuration.deprecation: 
mapred.cache.files.filesizes is deprecated. Instead, use 
mapreduce.job.cache.files.filesizes
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files is 
deprecated. Instead, use mapreduce.job.cache.files
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.reduce.tasks is 
deprecated. Instead, use mapreduce.job.reduces
14/03/04 15:26:05 INFO Configuration.deprecation: 
mapred.output.value.class is deprecated. Instead, use 
mapreduce.job.output.value.class
14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.map.class is 
deprecated. Instead, use mapreduce.job.map.class
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.job.name is 
deprecated. Instead, use mapreduce.job.name
14/03/04 15:26:05 INFO Configuration.deprecation: 
mapreduce.inputformat.class is deprecated. Instead, use 
mapreduce.job.inputformat.class
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.input.dir is 
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.dir is 
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/03/04 15:26:05 INFO Configuration.deprecation: 
mapreduce.outputformat.class is deprecated. Instead, use 
mapreduce.job.outputformat.class
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.map.tasks is 
deprecated. Instead, use mapreduce.job.maps
14/03/04 15:26:05 INFO Configuration.deprecation: 
mapred.cache.files.timestamps is deprecated. Instead, use 
mapreduce.job.cache.files.timestamps
14/03/04 15:26:05 INFO Configuration.deprecation: 
mapred.output.key.class is deprecated. Instead, use 
mapreduce.job.output.key.class
14/03/04 15:26:05 INFO Configuration.deprecation: mapred.working.dir is 
deprecated. Instead, use mapreduce.job.working.dir
14/03/04 15:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for 
job: job_1393936067845_0011
14/03/04 15:26:07 INFO impl.YarnClientImpl: Submitted application 
application_1393936067845_0011 to ResourceManager at /0.0.0.0:8032
14/03/04 15:26:07 INFO mapreduce.Job: The url to track the job: 
http://vm38.dbweb.ee:8088/proxy/application_1393936067845_0011/

14/03/04 15:26:07 INFO mapreduce.Job: Running job: job_1393936067845_0011
14/03/04 15:26:36 INFO mapreduce.Job: Job job_1393936067845_0011 running 
in uber mode : false

14/03/04 15:26:36 INFO mapreduce.Job:  map 0% reduce 0%
14/03/04 15:27:00 INFO mapreduce.Job:  map 100% reduce 0%
14/03/04 15:27:26 INFO mapreduce.Job: Job job_1393936067845_0011 
completed successfully

14/03/04 15:27:26 INFO mapreduce.Job: Counters: 27
File System Counters
FILE: Number of bytes read=2994
FILE: Number of bytes written=80677
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=880103
HDFS: Number of bytes written=2483042
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=46056
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=9994
Map output records=100
Input split bytes=123
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=425
CPU time spent (ms)=32890
Physical memory (bytes) snapshot=189755392
Virtual memory (bytes) snapshot=992145408
Total committed heap usage (bytes)=111673344
File Input Format Counters
Bytes Read=879980
File Output Format Counters

Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

2014-03-04 Thread Sergey Svinarchuk
Mahout 0.9 not supported hadoop 2 dependencies.
You can use mahout-1.0-SNAPSHOT or add to your mahout patch from
https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2
support.


On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote:

 Hi

 following command:
 /usr/lib/hadoop-yarn/bin/yarn jar 
 mahout-distribution-0.9/mahout-examples-0.9.jar
 org.apache.mahout.classifier.df.mapreduce.BuildForest -d
 input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p
 -t 100 -o nsl-forest

 When I used hadoop 1.x then it worked.
 Now I use hadoop-2.2.0 it gives me:
 14/03/04 15:25:58 INFO mapreduce.BuildForest: Partial Mapred implementation
 14/03/04 15:25:58 INFO mapreduce.BuildForest: Building the forest...
 14/03/04 15:26:01 INFO client.RMProxy: Connecting to ResourceManager at /
 0.0.0.0:8032
 14/03/04 15:26:05 INFO input.FileInputFormat: Total input paths to process
 : 1
 14/03/04 15:26:05 INFO mapreduce.JobSubmitter: number of splits:1
 14/03/04 15:26:05 INFO Configuration.deprecation: user.name is
 deprecated. Instead, use mapreduce.job.user.name
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.jar is
 deprecated. Instead, use mapreduce.job.jar
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapred.cache.files.filesizes is deprecated. Instead, use
 mapreduce.job.cache.files.filesizes
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files is
 deprecated. Instead, use mapreduce.job.cache.files
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.reduce.tasks is
 deprecated. Instead, use mapreduce.job.reduces
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapred.output.value.class is deprecated. Instead, use
 mapreduce.job.output.value.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.map.class is
 deprecated. Instead, use mapreduce.job.map.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.job.name is
 deprecated. Instead, use mapreduce.job.name
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapreduce.inputformat.class is deprecated. Instead, use
 mapreduce.job.inputformat.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.input.dir is
 deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.dir is
 deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapreduce.outputformat.class is deprecated. Instead, use
 mapreduce.job.outputformat.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.map.tasks is
 deprecated. Instead, use mapreduce.job.maps
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapred.cache.files.timestamps is deprecated. Instead, use
 mapreduce.job.cache.files.timestamps
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.key.class
 is deprecated. Instead, use mapreduce.job.output.key.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.working.dir is
 deprecated. Instead, use mapreduce.job.working.dir
 14/03/04 15:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job:
 job_1393936067845_0011
 14/03/04 15:26:07 INFO impl.YarnClientImpl: Submitted application
 application_1393936067845_0011 to ResourceManager at /0.0.0.0:8032
 14/03/04 15:26:07 INFO mapreduce.Job: The url to track the job:
 http://vm38.dbweb.ee:8088/proxy/application_1393936067845_0011/
 14/03/04 15:26:07 INFO mapreduce.Job: Running job: job_1393936067845_0011
 14/03/04 15:26:36 INFO mapreduce.Job: Job job_1393936067845_0011 running
 in uber mode : false
 14/03/04 15:26:36 INFO mapreduce.Job:  map 0% reduce 0%
 14/03/04 15:27:00 INFO mapreduce.Job:  map 100% reduce 0%
 14/03/04 15:27:26 INFO mapreduce.Job: Job job_1393936067845_0011 completed
 successfully
 14/03/04 15:27:26 INFO mapreduce.Job: Counters: 27
 File System Counters
 FILE: Number of bytes read=2994
 FILE: Number of bytes written=80677
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=880103
 HDFS: Number of bytes written=2483042
 HDFS: Number of read operations=5
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=2
 Job Counters
 Launched map tasks=1
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=46056
 Total time spent by all reduces in occupied slots (ms)=0
 Map-Reduce Framework
 Map input records=9994
 Map output records=100
 Input split bytes=123
 Spilled Records=0
 Failed Shuffles=0
 Merged Map outputs=0
 GC time elapsed (ms)=425
 CPU time spent 

PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Kevin Moulart
Hi,

I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
I always get a StackOverflowError :

Here is my command line :
mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
-pca true -U false -V false -t 3 -ow

I also tried to put -us true as mentionned in
https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but
the option is not available anymore.

The output of the previous command is :
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
{--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true],
--computeU=[false], --computeV=[false], --endPhase=[2147483647],
--input=[/user/myUser/Echant100k], --minSplitSize=[-1],
--outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100],
--oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
--rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
--uHalfSigma=[false], --vHalfSigma=[false]}
Exception in thread main java.lang.StackOverflowError
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
...

I search online and didn't find a solution to my problem.

Can you help me ?

Thanks in advance,

-- 
Kévin Moulart


Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

2014-03-04 Thread Sergey Svinarchuk
Sory, I didn't see that you try use mahout-1.0-snapshot.
You used /usr/lib/hadoop-yarn/bin/yarn but need use
/usr/lib/hadoop/bin/hadoop and then your example will be success.


On Tue, Mar 4, 2014 at 3:45 PM, Sergey Svinarchuk 
ssvinarc...@hortonworks.com wrote:

 Mahout 0.9 not supported hadoop 2 dependencies.
 You can use mahout-1.0-SNAPSHOT or add to your mahout patch from
 https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2
 support.


 On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote:

 Hi

 following command:
 /usr/lib/hadoop-yarn/bin/yarn jar 
 mahout-distribution-0.9/mahout-examples-0.9.jar
 org.apache.mahout.classifier.df.mapreduce.BuildForest -d
 input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5
 -p -t 100 -o nsl-forest

 When I used hadoop 1.x then it worked.
 Now I use hadoop-2.2.0 it gives me:
 14/03/04 15:25:58 INFO mapreduce.BuildForest: Partial Mapred
 implementation
 14/03/04 15:25:58 INFO mapreduce.BuildForest: Building the forest...
 14/03/04 15:26:01 INFO client.RMProxy: Connecting to ResourceManager at /
 0.0.0.0:8032
 14/03/04 15:26:05 INFO input.FileInputFormat: Total input paths to
 process : 1
 14/03/04 15:26:05 INFO mapreduce.JobSubmitter: number of splits:1
 14/03/04 15:26:05 INFO Configuration.deprecation: user.name is
 deprecated. Instead, use mapreduce.job.user.name
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.jar is
 deprecated. Instead, use mapreduce.job.jar
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapred.cache.files.filesizes is deprecated. Instead, use
 mapreduce.job.cache.files.filesizes
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.cache.files is
 deprecated. Instead, use mapreduce.job.cache.files
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.reduce.tasks is
 deprecated. Instead, use mapreduce.job.reduces
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapred.output.value.class is deprecated. Instead, use
 mapreduce.job.output.value.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapreduce.map.class is
 deprecated. Instead, use mapreduce.job.map.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.job.name is
 deprecated. Instead, use mapreduce.job.name
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapreduce.inputformat.class is deprecated. Instead, use
 mapreduce.job.inputformat.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.input.dir is
 deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.dir is
 deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapreduce.outputformat.class is deprecated. Instead, use
 mapreduce.job.outputformat.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.map.tasks is
 deprecated. Instead, use mapreduce.job.maps
 14/03/04 15:26:05 INFO Configuration.deprecation:
 mapred.cache.files.timestamps is deprecated. Instead, use
 mapreduce.job.cache.files.timestamps
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.output.key.class
 is deprecated. Instead, use mapreduce.job.output.key.class
 14/03/04 15:26:05 INFO Configuration.deprecation: mapred.working.dir is
 deprecated. Instead, use mapreduce.job.working.dir
 14/03/04 15:26:06 INFO mapreduce.JobSubmitter: Submitting tokens for job:
 job_1393936067845_0011
 14/03/04 15:26:07 INFO impl.YarnClientImpl: Submitted application
 application_1393936067845_0011 to ResourceManager at /0.0.0.0:8032
 14/03/04 15:26:07 INFO mapreduce.Job: The url to track the job:
 http://vm38.dbweb.ee:8088/proxy/application_1393936067845_0011/
 14/03/04 15:26:07 INFO mapreduce.Job: Running job: job_1393936067845_0011
 14/03/04 15:26:36 INFO mapreduce.Job: Job job_1393936067845_0011 running
 in uber mode : false
 14/03/04 15:26:36 INFO mapreduce.Job:  map 0% reduce 0%
 14/03/04 15:27:00 INFO mapreduce.Job:  map 100% reduce 0%
 14/03/04 15:27:26 INFO mapreduce.Job: Job job_1393936067845_0011
 completed successfully
 14/03/04 15:27:26 INFO mapreduce.Job: Counters: 27
 File System Counters
 FILE: Number of bytes read=2994
 FILE: Number of bytes written=80677
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=880103
 HDFS: Number of bytes written=2483042
 HDFS: Number of read operations=5
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=2
 Job Counters
 Launched map tasks=1
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=46056
 Total time spent by all reduces in occupied slots (ms)=0
 Map-Reduce Framework
 Map input 

Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

2014-03-04 Thread Margusja

Hi thanks for reply.

Here is my output:

[hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop version Hadoop 2.2.0.2.0.6.0-101
Subversion g...@github.com:hortonworks/hadoop.git -r 
b07b2906c36defd389c8b5bd22bebc1bead8115b

Compiled by jenkins on 2014-01-09T05:18Z
Compiled with protoc 2.5.0
From source with checksum 704f1e463ebc4fb89353011407e965
This command was run using 
/usr/lib/hadoop/hadoop-common-2.2.0.2.0.6.0-101.jar


[hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop jar 
mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar 
org.apache.mahout.classifier.df.mapreduce.BuildForest -d 
input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 
-p -t 100 -o nsl-forest


...
14/03/04 16:22:51 INFO mapreduce.Job:  map 0% reduce 0%
14/03/04 16:23:12 INFO mapreduce.Job:  map 100% reduce 0%
14/03/04 16:23:43 INFO mapreduce.Job: Job job_1393936067845_0013 
completed successfully

14/03/04 16:23:44 INFO mapreduce.Job: Counters: 27
File System Counters
FILE: Number of bytes read=2994
FILE: Number of bytes written=80677
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=880103
HDFS: Number of bytes written=2436546
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=45253
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
Map input records=9994
Map output records=100
Input split bytes=123
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=456
CPU time spent (ms)=36010
Physical memory (bytes) snapshot=180752384
Virtual memory (bytes) snapshot=994275328
Total committed heap usage (bytes)=101187584
File Input Format Counters
Bytes Read=879980
File Output Format Counters
Bytes Written=2436546
Exception in thread main java.lang.IncompatibleClassChangeError: Found 
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at 
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:113)
at 
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
at 
org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:294)
at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:228)
at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:188)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:252)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



Tervitades, Margus (Margusja) Roo
+372 51 48 780
http://margus.roo.ee
http://ee.linkedin.com/in/margusroo
skype: margusja
ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
-BEGIN PUBLIC KEY-
MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
BjM8j36yJvoBVsfOHQIDAQAB
-END PUBLIC KEY-

On 04/03/14 16:11, Sergey Svinarchuk wrote:

Sory, I didn't see that you try use mahout-1.0-snapshot.
You used /usr/lib/hadoop-yarn/bin/yarn but need use
/usr/lib/hadoop/bin/hadoop and then your example will be success.


On Tue, Mar 4, 2014 at 3:45 PM, Sergey Svinarchuk 
ssvinarc...@hortonworks.com wrote:


Mahout 0.9 not supported hadoop 2 dependencies.
You can use mahout-1.0-SNAPSHOT or add to your mahout patch from
https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2
support.


On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote:


Hi

following command:
/usr/lib/hadoop-yarn/bin/yarn jar 
mahout-distribution-0.9/mahout-examples-0.9.jar
org.apache.mahout.classifier.df.mapreduce.BuildForest -d
input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5
-p -t 100 -o nsl-forest

When I used hadoop 1.x then it worked.
Now I use hadoop-2.2.0 it gives me:
14/03/04 15:25:58 INFO 

Re: how to recommend users already consumed items

2014-03-04 Thread Pat Ferrel
I’d suggest a command line option if you want to submit a patch. Most people 
will want that line executed so the default should be the current behavior. But 
a large minority will want it your way. 

And please do submit a patch with the Jira, it will make your life easier when 
new releases come out you won’t have to manage a fork.

On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote:

Juan, I don't understand your solution, if there are no ratings how can you
blend the recommendations from the system and the user's already read news.

Anyway, I think, as Pat does, the best way is to remove the mentioned line.
It should be the responsibility of the business logic to remove user's
items if needed.

I will also create a Jira issue as you suggested.

thanks
On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote:
 
 You are not the only one to see this so I'd recommend creating an option
 for the Job, which will be checked before executing that line of code
 then
 submit it as a patch to the Jira you need to create in any case.
 
 That way it might get into the mainline and you won't have to maintain a
 fork.
 
 
 Avoiding the cost of a fork over a trivial issue like this is a grand idea.
 



Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Dmitriy Lyubimov
Kevin, thanks for reporting this.

Stack overflow error has not been known to happen to date. But i will take
a look. It looks like a bug in the mean computation code, given your stack
trace, although it may have been induced by some circumstances specific to
your deployment.

 What version is it? 0.9?

As for -us, it is not known to have been removed to me. If it were, it
happened without my knowledge. I will take a look at the trunk.

-d


On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote:

 Hi,

 I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
 columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
 I always get a StackOverflowError :

 Here is my command line :
 mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
 -pca true -U false -V false -t 3 -ow

 I also tried to put -us true as mentionned in

 https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but
 the option is not available anymore.

 The output of the previous command is :
 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
 Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
 and HADOOP_CONF_DIR=/etc/hadoop/conf
 MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
 {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true],
 --computeU=[false], --computeV=[false], --endPhase=[2147483647],
 --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
 --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100],
 --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
 --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
 --uHalfSigma=[false], --vHalfSigma=[false]}
 Exception in thread main java.lang.StackOverflowError
 at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
  at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 ...

 I search online and didn't find a solution to my problem.

 Can you help me ?

 Thanks in advance,

 --
 Kévin Moulart



Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Dmitriy Lyubimov
It doesn't look like -us has been removed. At least i see it on the head of
the trunk, SSVDCli.java, line 62:

addOption(uSigma, us, Compute U * Sigma, String.valueOf(false));

i.e. short version(single dash) -us true, or long version(double-dash)
--uSigma true. Can you check again with 0.9? thanks.


On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:

 Kevin, thanks for reporting this.

 Stack overflow error has not been known to happen to date. But i will take
 a look. It looks like a bug in the mean computation code, given your stack
 trace, although it may have been induced by some circumstances specific to
 your deployment.

  What version is it? 0.9?

 As for -us, it is not known to have been removed to me. If it were, it
 happened without my knowledge. I will take a look at the trunk.

 -d


 On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote:

 Hi,

 I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
 columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
 and
 I always get a StackOverflowError :

 Here is my command line :
 mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
 100
 -pca true -U false -V false -t 3 -ow

 I also tried to put -us true as mentionned in

 https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but
 the option is not available anymore.

 The output of the previous command is :
 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
 Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
 and HADOOP_CONF_DIR=/etc/hadoop/conf
 MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
 {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true],
 --computeU=[false], --computeV=[false], --endPhase=[2147483647],
 --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
 --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100],
 --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
 --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
 --uHalfSigma=[false], --vHalfSigma=[false]}
 Exception in thread main java.lang.StackOverflowError
 at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
  at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 ...

 I search online and didn't find a solution to my problem.

 Can you help me ?

 Thanks in advance,

 --
 Kévin Moulart





Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Dmitriy Lyubimov
as for the stack trace, it looks like it doesn't agree with current trunk.
Again, i need to know which version you are running.

But from looking at current trunk, i don't really see how that may be
happening at the moment.


On Tue, Mar 4, 2014 at 9:40 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:

 It doesn't look like -us has been removed. At least i see it on the head
 of the trunk, SSVDCli.java, line 62:

 addOption(uSigma, us, Compute U * Sigma, String.valueOf(false));

 i.e. short version(single dash) -us true, or long version(double-dash)
 --uSigma true. Can you check again with 0.9? thanks.


 On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov dlie...@gmail.comwrote:

 Kevin, thanks for reporting this.

 Stack overflow error has not been known to happen to date. But i will
 take a look. It looks like a bug in the mean computation code, given your
 stack trace, although it may have been induced by some circumstances
 specific to your deployment.

  What version is it? 0.9?

 As for -us, it is not known to have been removed to me. If it were, it
 happened without my knowledge. I will take a look at the trunk.

 -d


 On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote:

 Hi,

 I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
 columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
 and
 I always get a StackOverflowError :

 Here is my command line :
 mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
 100
 -pca true -U false -V false -t 3 -ow

 I also tried to put -us true as mentionned in

 https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but
 the option is not available anymore.

 The output of the previous command is :
 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
 Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
 and HADOOP_CONF_DIR=/etc/hadoop/conf
 MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
 {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true],
 --computeU=[false], --computeV=[false], --endPhase=[2147483647],
 --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
 --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100],
 --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
 --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
 --uHalfSigma=[false], --vHalfSigma=[false]}
 Exception in thread main java.lang.StackOverflowError
 at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
  at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 at

 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 ...

 I search online and didn't find a solution to my problem.

 Can you help me ?

 Thanks in advance,

 --
 Kévin Moulart






Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Suneel Marthi
I have not seen the stackoverflow error, but this code has been fixed since .8 

Sent from my iPhone

 On Mar 4, 2014, at 12:40 PM, Dmitriy Lyubimov dlie...@gmail.com wrote:
 
 It doesn't look like -us has been removed. At least i see it on the head of
 the trunk, SSVDCli.java, line 62:
 
addOption(uSigma, us, Compute U * Sigma, String.valueOf(false));
 
 i.e. short version(single dash) -us true, or long version(double-dash)
 --uSigma true. Can you check again with 0.9? thanks.
 
 
 On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov dlie...@gmail.com wrote:
 
 Kevin, thanks for reporting this.
 
 Stack overflow error has not been known to happen to date. But i will take
 a look. It looks like a bug in the mean computation code, given your stack
 trace, although it may have been induced by some circumstances specific to
 your deployment.
 
 What version is it? 0.9?
 
 As for -us, it is not known to have been removed to me. If it were, it
 happened without my knowledge. I will take a look at the trunk.
 
 -d
 
 
 On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart kevinmoul...@gmail.comwrote:
 
 Hi,
 
 I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
 columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
 and
 I always get a StackOverflowError :
 
 Here is my command line :
 mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
 100
 -pca true -U false -V false -t 3 -ow
 
 I also tried to put -us true as mentionned in
 
 https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but
 the option is not available anymore.
 
 The output of the previous command is :
 MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
 Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
 and HADOOP_CONF_DIR=/etc/hadoop/conf
 MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
 {--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true],
 --computeU=[false], --computeV=[false], --endPhase=[2147483647],
 --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
 --outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100],
 --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
 --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
 --uHalfSigma=[false], --vHalfSigma=[false]}
 Exception in thread main java.lang.StackOverflowError
 at
 
 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 at
 
 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 at
 
 org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 ...
 
 I search online and didn't find a solution to my problem.
 
 Can you help me ?
 
 Thanks in advance,
 
 --
 Kévin Moulart
 
 


Re: PCA with ssvd leads to StackOverFlowError

2014-03-04 Thread Suneel Marthi
The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7 which 
had this issue (from ur stacktrace, its apparent u r using Mahout 0.7).  Please 
upgrade to the latest mahout version.





On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart kevinmoul...@gmail.com wrote:
 
Hi,

I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
I always get a StackOverflowError :

Here is my command line :
mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
-pca true -U false -V false -t 3 -ow

I also tried to put -us true as mentionned in
https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18modificationDate=1381347063000api=v2but
the option is not available anymore.

The output of the previous command is :
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
{--abtBlockHeight=[20], --blockHeight=[1], --broadcast=[true],
--computeU=[false], --computeV=[false], --endPhase=[2147483647],
--input=[/user/myUser/Echant100k], --minSplitSize=[-1],
--outerProdBlockHeight=[3], --output=[/user/myUser/Echant/SVD100],
--oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
--rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
--uHalfSigma=[false], --vHalfSigma=[false]}
Exception in thread main java.lang.StackOverflowError
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
...

I search online and didn't find a solution to my problem.

Can you help me ?

Thanks in advance,

-- 
Kévin Moulart

Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter
I think we should introduce a new parameter for the recommend() method 
in the Recommender interface that tells whether already known items 
should be recommended or not.


What do you think?

Best,
Sebastian

On 03/04/2014 05:32 PM, Pat Ferrel wrote:

I’d suggest a command line option if you want to submit a patch. Most people 
will want that line executed so the default should be the current behavior. But 
a large minority will want it your way.

And please do submit a patch with the Jira, it will make your life easier when 
new releases come out you won’t have to manage a fork.

On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote:

Juan, I don't understand your solution, if there are no ratings how can you
blend the recommendations from the system and the user's already read news.

Anyway, I think, as Pat does, the best way is to remove the mentioned line.
It should be the responsibility of the business logic to remove user's
items if needed.

I will also create a Jira issue as you suggested.

thanks
On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote:


On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote:


You are not the only one to see this so I'd recommend creating an option
for the Job, which will be checked before executing that line of code

then

submit it as a patch to the Jira you need to create in any case.

That way it might get into the mainline and you won't have to maintain a
fork.



Avoiding the cost of a fork over a trivial issue like this is a grand idea.







Re: how to recommend users already consumed items

2014-03-04 Thread Gokhan Capan
Sent from my iPhone

 On Mar 4, 2014, at 22:13, Sebastian Schelter s...@apache.org wrote:

 I think we should introduce a new parameter for the recommend() method
 in the Recommender interface that tells whether already known items
 should be recommended or not.
+1 for that

 What do you think?

 Best,
 Sebastian

 On 03/04/2014 05:32 PM, Pat Ferrel wrote:
 I’d suggest a command line option if you want to submit a patch. Most people 
 will want that line executed so the default should be the current behavior. 
 But a large minority will want it your way.

 And please do submit a patch with the Jira, it will make your life easier 
 when new releases come out you won’t have to manage a fork.

 On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com wrote:

 Juan, I don't understand your solution, if there are no ratings how can you
 blend the recommendations from the system and the user's already read news.

 Anyway, I think, as Pat does, the best way is to remove the mentioned line.
 It should be the responsibility of the business logic to remove user's
 items if needed.

 I will also create a Jira issue as you suggested.

 thanks
 On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com wrote:

 On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com wrote:

 You are not the only one to see this so I'd recommend creating an option
 for the Job, which will be checked before executing that line of code
 then
 submit it as a patch to the Jira you need to create in any case.

 That way it might get into the mainline and you won't have to maintain a
 fork.

 Avoiding the cost of a fork over a trivial issue like this is a grand idea.



Re: how to recommend users already consumed items

2014-03-04 Thread Mario Levitin

 I think we should introduce a new parameter for the recommend() method in
 the Recommender interface that tells whether already known items should be
 recommended or not.


I agree (if the parameter is missing then defaults to current behavior as
Pat suggested)





On 03/04/2014 05:32 PM, Pat Ferrel wrote:

 I'd suggest a command line option if you want to submit a patch. Most
 people will want that line executed so the default should be the current
 behavior. But a large minority will want it your way.

 And please do submit a patch with the Jira, it will make your life easier
 when new releases come out you won't have to manage a fork.

 On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com
 wrote:

 Juan, I don't understand your solution, if there are no ratings how can
 you
 blend the recommendations from the system and the user's already read
 news.

 Anyway, I think, as Pat does, the best way is to remove the mentioned
 line.
 It should be the responsibility of the business logic to remove user's
 items if needed.

 I will also create a Jira issue as you suggested.

 thanks
 On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

  On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com
 wrote:

  You are not the only one to see this so I'd recommend creating an option
 for the Job, which will be checked before executing that line of code

 then

 submit it as a patch to the Jira you need to create in any case.

 That way it might get into the mainline and you won't have to maintain a
 fork.


 Avoiding the cost of a fork over a trivial issue like this is a grand
 idea.






Re: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected

2014-03-04 Thread Gokhan Capan
Margusja,

From trunk, can you build mahout using the following command and try again:
mvn clean package -DskipTests=true -Dhadoop2.version=2.2.0

Best

Gokhan


On Tue, Mar 4, 2014 at 4:25 PM, Margusja mar...@roo.ee wrote:

 Hi thanks for reply.

 Here is my output:

 [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop version Hadoop
 2.2.0.2.0.6.0-101
 Subversion g...@github.com:hortonworks/hadoop.git -r
 b07b2906c36defd389c8b5bd22bebc1bead8115b
 Compiled by jenkins on 2014-01-09T05:18Z
 Compiled with protoc 2.5.0
 From source with checksum 704f1e463ebc4fb89353011407e965
 This command was run using /usr/lib/hadoop/hadoop-common-
 2.2.0.2.0.6.0-101.jar

 [hduser@vm38 ~]$ /usr/lib/hadoop/bin/hadoop jar
 mahout/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
 org.apache.mahout.classifier.df.mapreduce.BuildForest -d
 input/data666.noheader.data -ds input/data666.noheader.data.info -sl 5 -p
 -t 100 -o nsl-forest

 ...
 14/03/04 16:22:51 INFO mapreduce.Job:  map 0% reduce 0%
 14/03/04 16:23:12 INFO mapreduce.Job:  map 100% reduce 0%
 14/03/04 16:23:43 INFO mapreduce.Job: Job job_1393936067845_0013 completed
 successfully
 14/03/04 16:23:44 INFO mapreduce.Job: Counters: 27

 File System Counters
 FILE: Number of bytes read=2994
 FILE: Number of bytes written=80677
 FILE: Number of read operations=0
 FILE: Number of large read operations=0
 FILE: Number of write operations=0
 HDFS: Number of bytes read=880103
 HDFS: Number of bytes written=2436546

 HDFS: Number of read operations=5
 HDFS: Number of large read operations=0
 HDFS: Number of write operations=2
 Job Counters
 Launched map tasks=1
 Data-local map tasks=1
 Total time spent by all maps in occupied slots (ms)=45253

 Total time spent by all reduces in occupied slots (ms)=0
 Map-Reduce Framework
 Map input records=9994
 Map output records=100
 Input split bytes=123
 Spilled Records=0
 Failed Shuffles=0
 Merged Map outputs=0
 GC time elapsed (ms)=456
 CPU time spent (ms)=36010
 Physical memory (bytes) snapshot=180752384
 Virtual memory (bytes) snapshot=994275328
 Total committed heap usage (bytes)=101187584

 File Input Format Counters
 Bytes Read=879980
 File Output Format Counters
 Bytes Written=2436546

 Exception in thread main java.lang.IncompatibleClassChangeError: Found
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
 at org.apache.mahout.classifier.df.mapreduce.partial.
 PartialBuilder.processOutput(PartialBuilder.java:113)
 at org.apache.mahout.classifier.df.mapreduce.partial.
 PartialBuilder.parseOutput(PartialBuilder.java:89)
 at org.apache.mahout.classifier.df.mapreduce.Builder.build(
 Builder.java:294)
 at org.apache.mahout.classifier.df.mapreduce.BuildForest.
 buildForest(BuildForest.java:228)
 at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(
 BuildForest.java:188)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
 at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(
 BuildForest.java:252)

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(
 NativeMethodAccessorImpl.java:57)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:212)



 Tervitades, Margus (Margusja) Roo
 +372 51 48 780
 http://margus.roo.ee
 http://ee.linkedin.com/in/margusroo
 skype: margusja
 ldapsearch -x -h ldap.sk.ee -b c=EE (serialNumber=37303140314)
 -BEGIN PUBLIC KEY-
 MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCvbeg7LwEC2SCpAEewwpC3ajxE
 5ZsRMCB77L8bae9G7TslgLkoIzo9yOjPdx2NN6DllKbV65UjTay43uUDyql9g3tl
 RhiJIcoAExkSTykWqAIPR88LfilLy1JlQ+0RD8OXiWOVVQfhOHpQ0R/jcAkM2lZa
 BjM8j36yJvoBVsfOHQIDAQAB
 -END PUBLIC KEY-

 On 04/03/14 16:11, Sergey Svinarchuk wrote:

 Sory, I didn't see that you try use mahout-1.0-snapshot.
 You used /usr/lib/hadoop-yarn/bin/yarn but need use
 /usr/lib/hadoop/bin/hadoop and then your example will be success.


 On Tue, Mar 4, 2014 at 3:45 PM, Sergey Svinarchuk 
 ssvinarc...@hortonworks.com wrote:

  Mahout 0.9 not supported hadoop 2 dependencies.
 You can use mahout-1.0-SNAPSHOT or add to your mahout patch from
 https://issues.apache.org/jira/browse/MAHOUT-1329 for added hadoop 2
 support.


 On Tue, Mar 4, 2014 at 3:38 PM, Margusja mar...@roo.ee wrote:

  Hi

 following command:
 /usr/lib/hadoop-yarn/bin/yarn jar 

Re: how to recommend users already consumed items

2014-03-04 Thread Mario Levitin
I have created a Jira issue already.
I only use the non-hadoop part of Mahout recommender algorithms.
May be I can create a patch for that part. However, I have not done it
before, and don't know how to proceed.


On Wed, Mar 5, 2014 at 1:01 AM, Sebastian Schelter s...@apache.org wrote:

 Would you be willing to set up a jira issue and create a patch for this?

 --sebastian


 On 03/04/2014 11:58 PM, Mario Levitin wrote:


 I think we should introduce a new parameter for the recommend() method in
 the Recommender interface that tells whether already known items should
 be
 recommended or not.



 I agree (if the parameter is missing then defaults to current behavior as
 Pat suggested)





 On 03/04/2014 05:32 PM, Pat Ferrel wrote:


  I'd suggest a command line option if you want to submit a patch. Most
 people will want that line executed so the default should be the current
 behavior. But a large minority will want it your way.

 And please do submit a patch with the Jira, it will make your life
 easier
 when new releases come out you won't have to manage a fork.

 On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com
 wrote:

 Juan, I don't understand your solution, if there are no ratings how can
 you
 blend the recommendations from the system and the user's already read
 news.

 Anyway, I think, as Pat does, the best way is to remove the mentioned
 line.
 It should be the responsibility of the business logic to remove user's
 items if needed.

 I will also create a Jira issue as you suggested.

 thanks
 On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:

   On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com

 wrote:

   You are not the only one to see this so I'd recommend creating an
 option

 for the Job, which will be checked before executing that line of code

  then

  submit it as a patch to the Jira you need to create in any case.

 That way it might get into the mainline and you won't have to
 maintain a
 fork.


  Avoiding the cost of a fork over a trivial issue like this is a grand
 idea.









Re: how to recommend users already consumed items

2014-03-04 Thread Sebastian Schelter

That's fine, I was talking about the non-distributed part only.

This page has instructions on how to create patches:

https://mahout.apache.org/developers/how-to-contribute.html

Let me know if you need more infos!

Best,
Sebastian


On 03/05/2014 12:27 AM, Mario Levitin wrote:

I have created a Jira issue already.
I only use the non-hadoop part of Mahout recommender algorithms.
May be I can create a patch for that part. However, I have not done it
before, and don't know how to proceed.


On Wed, Mar 5, 2014 at 1:01 AM, Sebastian Schelter s...@apache.org wrote:


Would you be willing to set up a jira issue and create a patch for this?

--sebastian


On 03/04/2014 11:58 PM, Mario Levitin wrote:




I think we should introduce a new parameter for the recommend() method in
the Recommender interface that tells whether already known items should
be
recommended or not.




I agree (if the parameter is missing then defaults to current behavior as
Pat suggested)







On 03/04/2014 05:32 PM, Pat Ferrel wrote:



  I'd suggest a command line option if you want to submit a patch. Most

people will want that line executed so the default should be the current
behavior. But a large minority will want it your way.

And please do submit a patch with the Jira, it will make your life
easier
when new releases come out you won't have to manage a fork.

On Mar 2, 2014, at 12:38 PM, Mario Levitin mariolevi...@gmail.com
wrote:

Juan, I don't understand your solution, if there are no ratings how can
you
blend the recommendations from the system and the user's already read
news.

Anyway, I think, as Pat does, the best way is to remove the mentioned
line.
It should be the responsibility of the business logic to remove user's
items if needed.

I will also create a Jira issue as you suggested.

thanks
On Sun, Mar 2, 2014 at 7:12 PM, Ted Dunning ted.dunn...@gmail.com
wrote:

   On Sun, Mar 2, 2014 at 8:52 AM, Pat Ferrel p...@occamsmachete.com


wrote:

   You are not the only one to see this so I'd recommend creating an
option


for the Job, which will be checked before executing that line of code

  then


  submit it as a patch to the Jira you need to create in any case.


That way it might get into the mainline and you won't have to
maintain a
fork.


  Avoiding the cost of a fork over a trivial issue like this is a grand

idea.