Hi Admittedly my system was extremely slow and I could see why such error would have come with my reuters example.
I apologize not running properly and posting the code but I have been checking and rectifying my mistakes. Yet, I am getting this small error: *java.lang.IllegalStateException: No clusters found. Check your -c path. Exception in thread "main" java.lang.InterruptedException: K-Means Iteration failed processing /tmp/mahout-work-hduser/reuters-kmeans-clusters/part-randomSeed * I was running the traiining example under examples/bin/./cluster-reuters.sh Appreciate your help. Regards On 27 September 2013 11:48, Pavan K Narayanan <[email protected]>wrote: > Hi Daniele > > I installed Mahout 0.8 in Hadoop 1.2.1 in a diferent Ubuntu 12.04 LTS > (Hadoop configured properly and Mahout is running) and I try to run almost > all of them -- 20 newsgroups, reuters, synthetic control data and getting > the following errors. > > *For Reuters*: got stuck on the reduce task for a long time so had to > break the operation using crtl+c > > bigdata@bigdata-OptiPlex-390:~/mahout-distribution-0.8/examples/bin$ > ./cluster-reuters.sh > Please select a number to choose the corresponding clustering algorithm > 1. kmeans clustering > 2. fuzzykmeans clustering > 3. dirichlet clustering > 4. lda clustering > 5. minhash clustering > Enter your choice : 1 > ok. You chose 1 and we'll use kmeans Clustering > creating work directory at /tmp/mahout-work-bigdata > Converting to Sequence Files from Directory > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. > Warning: $HADOOP_HOME is deprecated. > > Running on hadoop, using /home/bigdata/hadoop-1.2.1/bin/hadoop and > HADOOP_CONF_DIR=/home/bigdata/hadoop-1.2.1/conf > MAHOUT-JOB: > /home/bigdata/mahout-distribution-0.8/mahout-examples-0.8-job.jar > Warning: $HADOOP_HOME is deprecated. > > 13/09/27 11:26:54 INFO common.AbstractJob: Command line arguments: > {--charset=[UTF-8], --chunkSize=[5], --endPhase=[2147483647], > --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], > --input=[/tmp/mahout-work-bigdata/reuters-out], --keyPrefix=[], > --method=[mapreduce], > --output=[/tmp/mahout-work-bigdata/reuters-out-seqdir], --startPhase=[0], > --tempDir=[temp]} > 13/09/27 11:26:55 INFO input.FileInputFormat: Total input paths to process > : 3743 > 13/09/27 11:26:56 INFO util.NativeCodeLoader: Loaded the native-hadoop > library > 13/09/27 11:26:56 WARN snappy.LoadSnappy: Snappy native library not loaded > 13/09/27 11:26:59 INFO mapred.JobClient: Running job: job_201309271028_0001 > 13/09/27 11:27:00 INFO mapred.JobClient: map 0% reduce 0% > 13/09/27 11:27:16 INFO mapred.JobClient: map 46% reduce 0% > 13/09/27 11:27:19 INFO mapred.JobClient: map 78% reduce 0% > 13/09/27 11:27:22 INFO mapred.JobClient: map 100% reduce 0% > 13/09/27 11:27:22 INFO mapred.JobClient: Job complete: > job_201309271028_0001 > 13/09/27 11:27:22 INFO mapred.JobClient: Counters: 18 > 13/09/27 11:27:22 INFO mapred.JobClient: Job Counters > 13/09/27 11:27:22 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=18361 > 13/09/27 11:27:22 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 13/09/27 11:27:22 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 13/09/27 11:27:22 INFO mapred.JobClient: Launched map tasks=1 > 13/09/27 11:27:22 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 13/09/27 11:27:22 INFO mapred.JobClient: File Output Format Counters > 13/09/27 11:27:22 INFO mapred.JobClient: Bytes Written=1889543 > 13/09/27 11:27:22 INFO mapred.JobClient: FileSystemCounters > 13/09/27 11:27:22 INFO mapred.JobClient: HDFS_BYTES_READ=3439773 > 13/09/27 11:27:22 INFO mapred.JobClient: FILE_BYTES_WRITTEN=57671 > 13/09/27 11:27:22 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1889543 > 13/09/27 11:27:22 INFO mapred.JobClient: File Input Format Counters > 13/09/27 11:27:22 INFO mapred.JobClient: Bytes Read=0 > 13/09/27 11:27:22 INFO mapred.JobClient: Map-Reduce Framework > 13/09/27 11:27:22 INFO mapred.JobClient: Map input records=3742 > 13/09/27 11:27:22 INFO mapred.JobClient: Physical memory (bytes) > snapshot=131174400 > 13/09/27 11:27:22 INFO mapred.JobClient: Spilled Records=0 > 13/09/27 11:27:22 INFO mapred.JobClient: CPU time spent (ms)=9920 > 13/09/27 11:27:22 INFO mapred.JobClient: Total committed heap usage > (bytes)=116916224 > 13/09/27 11:27:22 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=1074016256 > 13/09/27 11:27:22 INFO mapred.JobClient: Map output records=3742 > 13/09/27 11:27:22 INFO mapred.JobClient: SPLIT_RAW_BYTES=362622 > 13/09/27 11:27:22 INFO driver.MahoutDriver: Program took 28377 ms > (Minutes: 0.47295) > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. > Warning: $HADOOP_HOME is deprecated. > > Running on hadoop, using /home/bigdata/hadoop-1.2.1/bin/hadoop and > HADOOP_CONF_DIR=/home/bigdata/hadoop-1.2.1/conf > MAHOUT-JOB: > /home/bigdata/mahout-distribution-0.8/mahout-examples-0.8-job.jar > Warning: $HADOOP_HOME is deprecated. > > 13/09/27 11:27:25 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum > n-gram size is: 1 > 13/09/27 11:27:25 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum > LLR value: 1.0 > 13/09/27 11:27:25 INFO vectorizer.SparseVectorsFromSequenceFiles: Number > of reduce tasks: 1 > 13/09/27 11:27:25 INFO vectorizer.SparseVectorsFromSequenceFiles: > Tokenizing documents in /tmp/mahout-work-bigdata/reuters-out-seqdir > 13/09/27 11:27:26 INFO input.FileInputFormat: Total input paths to process > : 1 > 13/09/27 11:27:26 INFO mapred.JobClient: Running job: job_201309271028_0002 > 13/09/27 11:27:27 INFO mapred.JobClient: map 0% reduce 0% > 13/09/27 11:27:36 INFO mapred.JobClient: map 100% reduce 0% > 13/09/27 11:27:36 INFO mapred.JobClient: Job complete: > job_201309271028_0002 > 13/09/27 11:27:36 INFO mapred.JobClient: Counters: 19 > 13/09/27 11:27:36 INFO mapred.JobClient: Job Counters > 13/09/27 11:27:36 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=5689 > 13/09/27 11:27:36 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 13/09/27 11:27:36 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 13/09/27 11:27:36 INFO mapred.JobClient: Launched map tasks=1 > 13/09/27 11:27:36 INFO mapred.JobClient: Data-local map tasks=1 > 13/09/27 11:27:36 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 13/09/27 11:27:36 INFO mapred.JobClient: File Output Format Counters > 13/09/27 11:27:36 INFO mapred.JobClient: Bytes Written=2640631 > 13/09/27 11:27:36 INFO mapred.JobClient: FileSystemCounters > 13/09/27 11:27:36 INFO mapred.JobClient: HDFS_BYTES_READ=1889686 > 13/09/27 11:27:36 INFO mapred.JobClient: FILE_BYTES_WRITTEN=57246 > 13/09/27 11:27:36 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2640631 > 13/09/27 11:27:36 INFO mapred.JobClient: File Input Format Counters > 13/09/27 11:27:36 INFO mapred.JobClient: Bytes Read=1889543 > 13/09/27 11:27:36 INFO mapred.JobClient: Map-Reduce Framework > 13/09/27 11:27:36 INFO mapred.JobClient: Map input records=3742 > 13/09/27 11:27:36 INFO mapred.JobClient: Physical memory (bytes) > snapshot=125198336 > 13/09/27 11:27:36 INFO mapred.JobClient: Spilled Records=0 > 13/09/27 11:27:36 INFO mapred.JobClient: CPU time spent (ms)=1580 > 13/09/27 11:27:36 INFO mapred.JobClient: Total committed heap usage > (bytes)=123797504 > 13/09/27 11:27:36 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=1074016256 > 13/09/27 11:27:36 INFO mapred.JobClient: Map output records=3742 > 13/09/27 11:27:36 INFO mapred.JobClient: SPLIT_RAW_BYTES=143 > 13/09/27 11:27:36 INFO vectorizer.SparseVectorsFromSequenceFiles: Creating > Term Frequency Vectors > 13/09/27 11:27:36 INFO vectorizer.DictionaryVectorizer: Creating > dictionary from > /tmp/mahout-work-bigdata/reuters-out-seqdir-sparse-kmeans/tokenized-documents > and saving at > /tmp/mahout-work-bigdata/reuters-out-seqdir-sparse-kmeans/wordcount > 13/09/27 11:27:36 INFO input.FileInputFormat: Total input paths to process > : 1 > 13/09/27 11:27:38 INFO mapred.JobClient: Running job: job_201309271028_0003 > 13/09/27 11:27:39 INFO mapred.JobClient: map 0% reduce 0% > 13/09/27 11:27:46 INFO mapred.JobClient: map 100% reduce 0% > > ^Cbigdata@bigdata-OptiPlex-390:~/mahout-distribution-0.8/examples/bin$ > > *Synthetic control data* -- hadoop not running (hadoop was running , i > ran jps command and also set classpath once again > > bigdata@bigdata-OptiPlex-390:~/mahout-distribution-0.8/examples/bin$ > ./cluster-syntheticcontrol.sh > Please select a number to choose the corresponding clustering algorithm > 1. canopy clustering > 2. kmeans clustering > 3. fuzzykmeans clustering > 4. dirichlet clustering > 5. meanshift clustering > Enter your choice : 2 > ok. You chose 2 and we'll use kmeans Clustering > creating work directory at /tmp/mahout-work-bigdata > Downloading Synthetic control data > % Total % Received % Xferd Average Speed Time Time Time > Current > Dload Upload Total Spent Left > Speed > 100 281k 100 281k 0 0 64314 0 0:00:04 0:00:04 --:--:-- > 82742 > Checking the health of DFS... > Warning: $HADOOP_HOME is deprecated. > > ls: Cannot access .: No such file or directory. > HADOOP is not running. Please make sure you hadoop is running. > > > appreciate your help > > regards > Pavan > > > On 26 September 2013 17:13, Darius Miliauskas <[email protected] > > wrote: > >> Dear Pavan, >> >> There is the newer release of mahout (0.8). Why do you use 0.6? Have you >> tried ./build-cluster-syntheticcontrol.sh or >> ./cluster-syntheticcontrol.shfrom >> ../mahout-distribution-0.8/examples\bin? >> >> >> Ciao, >> >> Darius >> >> >> 2013/9/26 Pavan K Narayanan <[email protected]> >> >> > Folks, >> > >> > I am currently attempting to run the Synthetic_control data example on >> > Mahout. I have installed Hadoop-1.2.1 and Mahout 0.6 in my Ubuntu. >> > >> > I prepared the following hadoop runtime configuration file to set all >> the >> > paths required. the following are the contents of the hadooprc.sh >> > >> > *export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386 >> > export HADOOP_HOME=/home/hduser/hadoop-1.2.1 >> > export MAHOUT_HOME=/home/hduser/mahout-distribution-0.6 >> > export PATH=$MAHOUT_HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH >> > export >> > >> > >> CLASSPATH=$JAVA_HOME:/home/hduser/hadoop-1.2.1/hadoop-core-1.2.1.jar:$MAHOUT_HOME/mahout-core-0.6.jar >> > * >> > And also tried the following as suggested by Saeed Iqbal's blog for >> runtime >> > configuration file >> > >> > *export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-i386 >> > export HADOOP_HOME=/home/hduser/hadoop-1.2.1 >> > export HADOOP_CONF_DIR=/home/hduser/hadoop-1.2.1/conf >> > export MAHOUT_HOME=/home/hduser/mahout-distribution-0.6/bin >> > export PATH=$PATH:$MAHOUT_HOME* >> > >> > The above file initializes Mahout and I followed the commands below to >> > write the synthetic control data into HDFS. fRom this link: >> > >> > >> https://cwiki.apache.org/confluence/display/MAHOUT/Clustering+of+synthetic+control+data >> > >> > $HADOOP_HOME/bin/hadoop fs -mkdir testdata >> > $HADOOP_HOME/bin/hadoop fs -put <PATH TO synthetic_control.data> >> testdata >> > >> > the mvn clean install option gave me a build failure error but when i >> typed >> > maven -version i got the latest maven installed. >> > >> > I tried to enter this command from mahout_home/bin >> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job >> > and got the following error: >> > >> > org.apache.mahout.clustering.syntheticcontrol.kmeans.Job command not >> found >> > >> > Can anyone tell me where I am going wrong? how to fix this? really >> > appreciate your help >> > >> > Regads >> > Pavan >> > >> > >
