You can ignore the warnings.
On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan <nt_mahm...@yahoo.com> wrote: Oh yes... Thanks Andrew you are right Meanwhile I see two warnings WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Is there any concern about them? R.egards, Mahmood On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <suneel_mar...@yahoo.com> wrote: Thanks Andrew, that seems to have been the issue all the while. Nevertheless, it is better to run from Head if running on Hadoop 2.3.0 On Saturday, March 8, 2014 2:42 PM, Andrew Musselman <andrew.mussel...@gmail.com> wrote: You have upper-case in your command but lower-case in your declaration in the properties file; correct that and it should work. Note: org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt_mahm...@yahoo.com>wrote: > No success Suneel... > > Please see the attachment which is the output of > mvn clean package -Dhadoop2.version=2.3.0 > > Additionally: > > > hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 > src/conf/driver.classes.default.props > org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = > wikipediaXmlSplitter : wikipedia splitter > #Utils > org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors > from a sequence file to text > org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump > cluster output to text > org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence > File dumper > > > hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter > -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 > > Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and > HADOOP_CONF_DIR= > MAHOUT-JOB: > /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar > 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class: > wikipediaXMLSplitter > > java.lang.ClassNotFoundException: wikipediaXMLSplitter > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:423) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:186) > at > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props > found on classpath, will use command-line arguments only > > Unknown program 'wikipediaXMLSplitter' chosen. > Valid program names are: > arff.vector: : Generate Vectors from an ARFF file or directory > baumwelch: : Baum-Welch algorithm for unsupervised HMM training > canopy: : Canopy clustering > cat: : Print a file or resource as the logistic regression models would > see it > cleansvd: : Cleanup and verification of SVD output > clusterdump: : Dump cluster output to text > clusterpp: : Groups Clustering Output In Clusters > cmdump: : Dump confusion matrix in HTML or text formats > concatmatrices: : Concatenates 2 matrices of same cardinality into a > single matrix > cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) > cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. > evaluateFactorization: : compute RMSE and MAE of a rating matrix > factorization against probes > fkmeans: : Fuzzy K-means clustering > hmmpredict: : Generate random sequence of observations by given HMM > itemsimilarity: : Compute the item-item-similarities for item-based > collaborative filtering > kmeans: : K-means clustering > lucene.vector: : Generate Vectors from a Lucene index > lucene2seq: : Generate Text SequenceFiles from a Lucene index > matrixdump: : Dump matrix in CSV format > matrixmult: : Take the product of two matrices > parallelALS: : ALS-WR factorization of a rating matrix > qualcluster: : Runs clustering experiments and summarizes results in a > CSV > recommendfactorized: : Compute recommendations using the factorization > of a rating matrix > recommenditembased: : Compute recommendations using item-based > collaborative filtering > regexconverter: : Convert text files on a per line basis based on > regular expressions > resplit: : Splits a set of SequenceFiles into a number of equal splits > rowid: : Map SequenceFile<Text,VectorWritable> to > {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>} > rowsimilarity: : Compute the pairwise similarities of the rows of a > matrix > runAdaptiveLogistic: : Score new production data using a probably > trained and validated AdaptivelogisticRegression model > runlogistic: : Run a logistic regression model against CSV data > seq2encoded: : Encoded Sparse Vector generation from Text sequence files > seq2sparse: : Sparse Vector generation from Text sequence files > seqdirectory: : Generate sequence files (of Text) from a directory > seqdumper: : Generic Sequence File dumper > seqmailarchives: : Creates SequenceFile from a directory containing > gzipped mail archives > seqwiki: : Wikipedia xml dump to sequence file > spectralkmeans: : Spectral k-means clustering > split: : Split Input data into test and train sets > splitDataset: : split a rating dataset into training and probe parts > ssvd: : Stochastic SVD > streamingkmeans: : Streaming k-means clustering > svd: : Lanczos Singular Value Decomposition > testnb: : Test the Vector-based Bayes classifier > trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model > trainlogistic: : Train a logistic regression using stochastic gradient > descent > trainnb: : Train the Vector-based Bayes classifier > transpose: : Take the transpose of a matrix > validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model > against hold-out data set > vecdist: : Compute the distances between a set of Vectors (or Cluster or > Canopy, they must fit in memory) and a list of Vectors > vectordump: : Dump vectors from a sequence file to text > viterbi: : Viterbi decoding of hidden states from given output states > sequence > wikipediaXmlSplitter: : wikipedia splitter > hadoop@solaris:~/mahout-distribution-0.9$ > > > > Regards, > Mahmood > > > On Saturday, March 8, 2014 7:28 PM, Suneel Marthi < > suneel_mar...@yahoo.com> wrote: > > mvn clean package -Dhadoop2.version=2.3.0 > > please give that a try. > > > On Saturday, March 8, 2014 9:56 AM, Mahmood Naderan < > nt_mahm...@yahoo.com> wrote: > > >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION> > > Excuse me, if I have 2.3.0 which command is correct > mvn clean package -Dhadoop2.3.0.=2.3.0 > mvn clean package -Dhadoop2.version=2.3.0 > > Regards, > Mahmood > > > On Saturday, March 8, 2014 3:50 PM, Suneel Marthi < > suneel_mar...@yahoo.com> wrote: > Not sure what's so disappointing here, it was never officially announced > that Mahout 0.9 had Hadoop 2.x support. > > From trunk, can you build mahout for hadoop2 using this command: > > mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION> > > > On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt_mahm...@yahoo.com> > wrote: > That is rather disappointing.... > > >b) Work off of present Head and build with Hadoop 2.x profile. > Can you explain more? > > > Regards, > Mahmood > > > On Friday, March 7, 2014 8:09 PM, Suneel Marthi <suneel_mar...@yahoo.com> > wrote: > The example as documented on the Wiki should work. The issue u seem to > be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a > Hadoop 2.3 environment. I don't think that's gonna work. > > Suggest that you either: > > a) Switch to a Hadoop 1.2.1 environment > b) Work off of present Head and build with Hadoop 2.x profile. > > Mahout 0.9 is not certified for Hadoop 2.x. > > > > > On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt_mahm...@yahoo.com> > wrote: > FYI, I am trying to complete the wikipedia example from Apache's document > https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example > > > Regards, > Mahmood > > > > On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt_mahm...@yahoo.com> > wrote: > > In fact, see this file > src/conf/driver.classes.default.props > > which is not exactly as what you said. Still I have the same problem. > Please see the complete log > > hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 > src/conf/driver.classes.default.props > org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = > wikipediaXmlSplitter : wikipedia splitter > #Utils > org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors > from a sequence file to text > org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump > cluster output to > text > org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence > File dumper > > > > hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d > examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 > Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and > HADOOP_CONF_DIR= > MAHOUT-JOB: > /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar > 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: > wikipediaXMLSplitter > java.lang.ClassNotFoundException: wikipediaXMLSplitter > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at > java.lang.ClassLoader.loadClass(ClassLoader.java:423) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:186) > at > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > 14/03/07 17:19:04 WARN driver.MahoutDriver: No > wikipediaXMLSplitter.props found on classpath, will use command-line > arguments only > Unknown program 'wikipediaXMLSplitter' chosen. > Valid program names are: > arff.vector: : Generate Vectors from an ARFF file or directory > baumwelch: : Baum-Welch algorithm for unsupervised HMM training > canopy: : Canopy clustering > cat: : Print a file or resource as the logistic regression models would > see it > cleansvd: : Cleanup and verification of SVD output > clusterdump: : Dump cluster output to text > clusterpp: : Groups Clustering Output In Clusters > cmdump: : Dump confusion matrix in HTML or text formats > concatmatrices: : Concatenates 2 matrices of same cardinality into a > single matrix > cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) > cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. > evaluateFactorization: : compute RMSE > and MAE of a rating matrix factorization against probes > fkmeans: : Fuzzy K-means clustering > hmmpredict: : Generate random sequence of observations by given HMM > itemsimilarity: : Compute the item-item-similarities for item-based > collaborative filtering > kmeans: : K-means clustering > lucene.vector: : Generate Vectors from a Lucene index > lucene2seq: : Generate Text SequenceFiles from a Lucene index > matrixdump: : Dump matrix in CSV format > matrixmult: : Take the product of two matrices > parallelALS: : ALS-WR factorization of a rating matrix > qualcluster: : Runs clustering experiments and summarizes results in a > CSV > recommendfactorized: : Compute recommendations using the factorization > of a rating matrix > recommenditembased: : Compute recommendations using item-based > collaborative filtering > regexconverter: : Convert text files on a per > line basis based on regular expressions > resplit: : Splits a set of SequenceFiles into a number of equal splits > rowid: : Map SequenceFile<Text,VectorWritable> to > {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>} > rowsimilarity: : Compute the pairwise similarities of the rows of a > matrix > runAdaptiveLogistic: : Score new production data using a probably > trained and validated AdaptivelogisticRegression model > runlogistic: : Run a logistic regression model against CSV data > seq2encoded: : Encoded Sparse Vector generation from Text sequence files > seq2sparse: : Sparse Vector generation from Text sequence files > seqdirectory: : Generate sequence files (of Text) from a directory > seqdumper: : Generic Sequence File dumper > seqmailarchives: : Creates SequenceFile from a directory containing > gzipped mail archives > > seqwiki: : Wikipedia xml dump to sequence file > spectralkmeans: : Spectral k-means clustering > split: : Split Input data into test and train sets > splitDataset: : split a rating dataset into training and probe parts > ssvd: : Stochastic SVD > streamingkmeans: : Streaming k-means clustering > svd: : Lanczos Singular Value Decomposition > testnb: : Test the Vector-based Bayes classifier > trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model > trainlogistic: : Train a logistic regression using stochastic gradient > descent > trainnb: : Train the Vector-based Bayes classifier > transpose: : Take the transpose of a matrix > validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model > against hold-out data set > vecdist: : Compute the distances between a set of Vectors (or Cluster or > Canopy, they must fit in memory) and a list of > Vectors > vectordump: : Dump vectors from a sequence file to text > viterbi: : Viterbi decoding of hidden states from given output states > sequence > wikipediaXmlSplitter: : wikipedia splitter > hadoop@solaris:~/mahout-distribution-0.9$ > > > > > > > Regards, > Mahmood > > > > On Friday, March 7, 2014 5:02 PM, Suneel Marthi <suneel_mar...@yahoo.com> > wrote: > > Mehmood, > > wikipediaXMLSplitter is not present in driver.classes.default.props. To > accomplish what u r trying to do, u can edit > src/conf/driver.classes/default/props and add an entry for > wikipediaXMLSplitter. > > org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = > wikipediaXmlSplitter : wikipedia splitter > > You should then be able to invoke via: > > mahout wikipediaXmlSplitter -d<path> -o<path> -c64 > > please give that a try. > > > > > > > > > On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt_mahm...@yahoo.com> > wrote: > > Hi > When I run > > mahout wikipediaXMLSplitter -d > examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 > > I get this error > 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: > wikipediaXMLSplitter > java.lang.ClassNotFoundException: wikipediaXMLSplitter > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:423) > at java.lang.ClassLoader.loadClass(ClassLoader.java:356) > at java.lang.Class.forName0(Native > Method) > at java.lang.Class.forName(Class.java:186) > at > org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at > java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props > found on classpath, will use command-line arguments only > Unknown program 'wikipediaXMLSplitter' chosen. > > > However the wikipediaXMLSplitter exists in > > mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java > > I know that it is possible to pass the full path but is there any way to > define a variable that points to the correct location. Something like > > export > WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/ > > Where should I add that? > > > Regards, > Mahmood > > > > > > > > > > > > >