Re: Need help in executing SSVD for dimensionality reduction on Mahout

2014-03-17 Thread Dmitriy Lyubimov
If the rows in the input for SSVD are data points you are trying to create reduced space for, then rows of USigma represent the same points in the PCA (reduced) space. The mapping between the input rows and output rows is by same keys in the sequence files. However, it doesn't look like your input

Fwd: Need help in executing SSVD for dimensionality reduction on Mahout

2014-03-17 Thread Vijaya Pratap
Hi, I am trying to use SSVD for dimensionality reduction on Mahout, the input is a sample data in CSV format. Below is a snippet of the input 22,2,44,36,5,9,2824,2,4,733,285,169 25,1,150,175,3,9,4037,2,18,1822,254,171 I have executed the below steps. 1. Loaded the csv file and Vectorized the da

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
As mahout streamingkmeans has no problems in sequential mode, I would like to try sequential mode. However, "java.lang.OutofMemoryError" occurs. I wonder where to set JVM heap size for sequential mode? Is it the same with mapreduce mode? -Original Message- From: fx MA XIAOJUN [mailto:x

Re: reduce is too slow in StreamingKmeans

2014-03-17 Thread Suneel Marthi
-rskm option works only in sequential mode and fails in MR. That's still an issue in present trunk that needs to be fixed. That should explain why Streaming KMeans with -rskm works only in sequential mode for you. Mahout 0.9 has been built with Hadoop 1.2.1 profile, not sure if that's gonna wor

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
Thank you for your extremely quick reply. >> What do u mean by this? kmeans hasn't changed between 0.8 and 0.9. Did u >> mean Streaming KMeans here? I want to try using -rskm in streaming kmeans. But in mahout 0.8, if setting -rskm as true, errors occur. I heard that the bug has been fixed in 0.

Re: Mahout parallel K-Means - algorithms analysis

2014-03-17 Thread Weishung Chung
You could take a look at org.apache.mahout.clustering.classify/ClusterClassificationMapper Enjoy, Wei Shung On Sat, Mar 15, 2014 at 2:51 PM, Suneel Marthi wrote: > The clustering code is cimapper and cireducer. Following the clustering, > there is cluster classification which is mapper only. >

Re: Normalization in Mahout

2014-03-17 Thread Suneel Marthi
On Monday, March 17, 2014 8:10 AM, Bikash Gupta wrote: Want to achieve few things 1. Normalize input data of clustering and classification algorithm Not sure what you consider as normalization, but: If u r trying to normalize text, Lucene's analyzers do it while generating term vectors

Re: Normalization in Mahout

2014-03-17 Thread Bikash Gupta
Want to achieve few things 1. Normalize input data of clustering and classification algorithm 2. Normalize output data to plot in graph On Mon, Mar 17, 2014 at 5:32 PM, Suneel Marthi wrote: > What r u trying to do? > > > > > > On Monday, March 17, 2014 7:45 AM, Bikash Gupta > wrote: > > Hi, >

Re: Normalization in Mahout

2014-03-17 Thread Suneel Marthi
What r u trying to do? On Monday, March 17, 2014 7:45 AM, Bikash Gupta wrote: Hi, Do we have any utility for Column and Row normalization in Mahout? -- Thanks & Regards Bikash Gupta

Normalization in Mahout

2014-03-17 Thread Bikash Gupta
Hi, Do we have any utility for Column and Row normalization in Mahout? -- Thanks & Regards Bikash Gupta

Re: Problem with FileSystem in Kmeans

2014-03-17 Thread Bikash Gupta
I have 3 node cluster of CDH4.6, however I have build Mahout 0.9 with Hadoop 2.x profile. I have also created a mount point for these node and the path uri is same as HDFS. I have manually configured filesystem parameter conf.set("fs.hdfs.impl",org. apache.hadoop.hdfs.DistributedFileSystem.class

Re: Problem with FileSystem in Kmeans

2014-03-17 Thread Suneel Marthi
Have not seen that behavior with KMeans, what were ur settings again? Sorry joining late onto this thread, hence have not looked at the entire history. On Monday, March 17, 2014 6:52 AM, Bikash Gupta wrote: Suneel, Just for information, I havent found this issue in Canopy. Canopy cluster

Re: Problem with FileSystem in Kmeans

2014-03-17 Thread Bikash Gupta
Suneel, Just for information, I havent found this issue in Canopy. Canopy cluster-0 was created in HDFS only. However Kmeans cluster-0 was created in local file system and cluster-1 in HDFS and after that it spit an error as it was unable to locate cluster-0 On Mon, Mar 17, 2014 at 3:10 PM, Sun

Re: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-03-17 Thread Suneel Marthi
R u running on Hadoop 2.x which seems to be the case here. Compile with hadoop 2 profile: mvn -DskipTests clean install -Dhadoop2.profile= On Monday, March 17, 2014 5:57 AM, Margusja wrote: Hi Here is my output: [speech@h14 ~]$ mahout/bin/mahout seqdirectory -c UTF-8 -i /user/speech/dem

java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but class was expected

2014-03-17 Thread Margusja
Hi Here is my output: [speech@h14 ~]$ mahout/bin/mahout seqdirectory -c UTF-8 -i /user/speech/demo -o demo-seqfiles MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. Running on hadoop, using /usr/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf MAHOUT-JOB: /home/speech/mahout/examp

Re: Problem with FileSystem in Kmeans

2014-03-17 Thread Suneel Marthi
This problem's specifically to do with Canopy clustering and is not an issue with KMeans. I had seen this behavior with Canopy and looking at the code its indeed an issue wherein cluster-0 is created on the local file system and the remaining clusters land on HDFS. Please file a JIRA for this

Re: reduce is too slow in StreamingKmeans

2014-03-17 Thread Suneel Marthi
On Monday, March 17, 2014 3:43 AM, fx MA XIAOJUN wrote: Thank you for your quick reply. As to -km, I thought it was log10, instead of ln. I was wrong... This time I set -km 14 and run mahout streamingkmeans again.(CDH 5.0 Mrv1, Mahout 0.8) The maps run faster than before, but the red

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
Thank you for your quick reply. As to -km, I thought it was log10, instead of ln. I was wrong... This time I set -km 14 and run mahout streamingkmeans again.(CDH 5.0 Mrv1, Mahout 0.8) The maps run faster than before, but the reduce was still stuck at 76% for ever. So, I uninstalled mahout 0.