Suneel Hi. Yes, you were right. It had nothing to do with version matching between Hadoop and Mahout. Even a simple Hadoop example have failed (Pi calculation from example jar file) and machine reboot resolved Hadoop issue. The machine had been running without reboot for over a couple of months and believe that it was lacking some system resources. Regards,,, Y.Mandai
2013/5/12 Suneel Marthi <[email protected]> > Its definitely not a Mahout-Hadoop compatibility issue and is more to do > with your hadoop setup. > > Check this link: > > > http://stackoverflow.com/questions/15585630/file-jobtracker-info-could-only-be-replicated-to-0-nodes-instead-of-1 > > > > > > ________________________________ > From: 万代豊 <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Saturday, May 11, 2013 1:14 PM > Subject: Re: Class Not Found from 0.8-SNAPSHOT for > org.apache.lucene.analysis.WhitespaceAnalyzer > > > Well, my Mahout-0.8-SNAPSHOT is now fine with the analyzer option > "org.apache.lucene.analysis.core.WhitespaceAnalyzer", but there are still > some steps to get over with... > This could be the Hadoop version incompatibility issue and if so, then what > should be the right/minimum Hadoop version? (At least "ClusterDump" with > Mahout-SNAPSHOT-0.8 worked fine against exisiting K-means result previously > done in 0.7) > I've been with Hadoop-0.20.203 (Pseudo-distributed) and Mahout-0.7 for > sometime and have just recently upgraded Mahout side up to 0.8-SNAPSHOT. > > $MAHOUT_HOME/bin/mahout seq2sparse --namedVector -i NHTSA-seqfile01/ -o > NHTSA-namedVector -ow -a org.apache.lucene.analysis.core.WhitespaceAnalyzer > -chunk 200 -wt tfidf -s 5 -md 3 -x 90 -ng 2 -ml 50 -seq -n 2 > Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= > MAHOUT-JOB: > /usr/local/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar > 13/05/12 01:45:48 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum > n-gram size is: 2 > 13/05/12 01:45:48 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum > LLR value: 50.0 > 13/05/12 01:45:48 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of > reduce tasks: 1 > 13/05/12 01:45:48 WARN hdfs.DFSClient: DataStreamer Exception: > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar > could only be replicated to 0 nodes, instead of 1 > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) > > at org.apache.hadoop.ipc.Client.call(Client.java:1030) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > at $Proxy1.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.addBlock(Unknown Source) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > > 13/05/12 01:45:48 WARN hdfs.DFSClient: Error Recovery for block null bad > datanode[0] nodes == null > 13/05/12 01:45:48 WARN hdfs.DFSClient: Could not get block locations. > Source file > "/home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar" > - Aborting... > 13/05/12 01:45:48 INFO mapred.JobClient: Cleaning up the staging area > > hdfs://localhost:9000/home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001 > Exception in thread "main" org.apache.hadoop.ipc.RemoteException: > java.io.IOException: File > /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar > could only be replicated to 0 nodes, instead of 1 > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) > > at org.apache.hadoop.ipc.Client.call(Client.java:1030) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > at $Proxy1.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.addBlock(Unknown Source) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > 13/05/12 01:45:48 ERROR hdfs.DFSClient: Exception closing file > /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar : > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar > could only be replicated to 0 nodes, instead of 1 > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) > > org.apache.hadoop.ipc.RemoteException: java.io.IOException: File > /home/hadoop/mapred/staging/hadoop/.staging/job_201305120144_0001/job.jar > could only be replicated to 0 nodes, instead of 1 > at > > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377) > at org.apache.hadoop.ipc.Client.call(Client.java:1030) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) > at $Proxy1.addBlock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy1.addBlock(Unknown Source) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255) > at > > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446) > > Sorry for the long error log. > I believe my Hadoop-0.20.203 is up and running correctly... > > $JAVA_HOME/bin/jps > 13322 TaskTracker > 12985 DataNode > 12890 NameNode > 13937 Jps > 13080 SecondaryNameNode > 13219 JobTracker > Hope someone could help this out. > Regards,, > Y.Mandai > > > 2013/5/9 Yutaka Mandai <[email protected]> > > > Suneel > > Great to know. > > Thanks! > > Y.Mandai > > > > iPhoneから送信⌘ > > > > On 2013/05/07, at 22:24, Suneel Marthi <[email protected]> wrote: > > > > > It should be > > > org.apache.lucene.analysis.core.WhitespaceAnalyzer ( u were missing the > > 'core') > > > > > > Mahout trunk's presently at Lucene 4.2.1. Lucene's has gone through a > > major refactor in 4.x. > > > Check Lucene 4.2.1 docs for the correct package name. > > > > > > > > > > > > > > > ________________________________ > > > From: 万代豊 <[email protected]> > > > To: "[email protected]" <[email protected]> > > > Sent: Tuesday, May 7, 2013 3:20 AM > > > Subject: Class Not Found from 0.8-SNAPSHOT for > > org.apache.lucene.analysis.WhitespaceAnalyzer > > > > > > > > > Hi all > > > I guest I must've seen somewhere on very similar topics on classname > > change > > > in Mahout-0.8-SNAPSHOT for some of the Lucene analyzer and here is > > another > > > one that I need to be solved. > > > Mahout gave me an error for seq2sparse with Lucene analyzer option as > > > follows, > > > which of cource had been working in at least Mahout 0.7. > > > > > > $MAHOUT_HOME/bin/mahout seq2sparse --namedVector -i NHTSA-seqfile01/ -o > > > NHTSA-namedVector -ow -a org.apache.lucene.analysis.WhitespaceAnalyzer > > > -chunk 200 -wt tfidf -s 5 -md 3 -x 90 -ng 2 -ml 50 -seq -n 2 > > > Running on hadoop, using /usr/local/hadoop/bin/hadoop and > > HADOOP_CONF_DIR= > > > MAHOUT-JOB: > > > /usr/local/trunk/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar > > > 13/05/07 15:41:12 INFO vectorizer.SparseVectorsFromSequenceFiles: > Maximum > > > n-gram size is: 2 > > > 13/05/07 15:41:18 INFO vectorizer.SparseVectorsFromSequenceFiles: > Minimum > > > LLR value: 50.0 > > > 13/05/07 15:41:18 INFO vectorizer.SparseVectorsFromSequenceFiles: > Number > > of > > > reduce tasks: 1 > > > Exception in thread "main" java.lang.ClassNotFoundException: > > > org.apache.lucene.analysis.WhitespaceAnalyzer > > > I have confirmed what classpath Mahout is refering to as; > > > $ $MAHOUT_HOME/bin/mahout classpath > > > and obtained Lucene related classpath as below. > > > > > > > > > /usr/local/trunk/examples/target/dependency/lucene-analyzers-common-4.2.1.jar > > > /usr/local/trunk/examples/target/dependency/lucene-benchmark-4.2.1.jar: > > > /usr/local/trunk/examples/target/dependency/lucene-core-4.2.1.jar > > > /usr/local/trunk/examples/target/dependency/lucene-facet-4.2.1.jar > > > > /usr/local/trunk/examples/target/dependency/lucene-highlighter-4.2.1.jar > > > /usr/local/trunk/examples/target/dependency/lucene-memory-4.2.1.jar > > > /usr/local/trunk/examples/target/dependency/lucene-queries-4.2.1.jar > > > > /usr/local/trunk/examples/target/dependency/lucene-queryparser-4.2.1.jar > > > /usr/local/trunk/examples/target/dependency/lucene-sandbox-4.2.1.jar > > > > > > I want to believe this to be simple classname change related issue. > > > Please let me be advised. > > > Regards,,, > > > Y.Mandai > > >
