FYI, I am trying to complete the wikipedia example from Apache's document https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
Regards, Mahmood On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <[email protected]> wrote: In fact, see this file src/conf/driver.classes.default.props which is not exactly as what you said. Still I have the same problem. Please see the complete log hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 src/conf/driver.classes.default.props org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter #Utils org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a sequence file to text org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump cluster output to text org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File dumper hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/07 17:19:04 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: : Runs clustering experiments and summarizes results in a CSV recommendfactorized: : Compute recommendations using the factorization of a rating matrix recommenditembased: : Compute recommendations using item-based collaborative filtering regexconverter: : Convert text files on a per line basis based on regular expressions resplit: : Splits a set of SequenceFiles into a number of equal splits rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>} rowsimilarity: : Compute the pairwise similarities of the rows of a matrix runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model runlogistic: : Run a logistic regression model against CSV data seq2encoded: : Encoded Sparse Vector generation from Text sequence files seq2sparse: : Sparse Vector generation from Text sequence files seqdirectory: : Generate sequence files (of Text) from a directory seqdumper: : Generic Sequence File dumper seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives seqwiki: : Wikipedia xml dump to sequence file spectralkmeans: : Spectral k-means clustering split: : Split Input data into test and train sets splitDataset: : split a rating dataset into training and probe parts ssvd: : Stochastic SVD streamingkmeans: : Streaming k-means clustering svd: : Lanczos Singular Value Decomposition testnb: : Test the Vector-based Bayes classifier trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model trainlogistic: : Train a logistic regression using stochastic gradient descent trainnb: : Train the Vector-based Bayes classifier transpose: : Take the transpose of a matrix validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors vectordump: : Dump vectors from a sequence file to text viterbi: : Viterbi decoding of hidden states from given output states sequence wikipediaXmlSplitter: : wikipedia splitter hadoop@solaris:~/mahout-distribution-0.9$ Regards, Mahmood On Friday, March 7, 2014 5:02 PM, Suneel Marthi <[email protected]> wrote: Mehmood, wikipediaXMLSplitter is not present in driver.classes.default.props. To accomplish what u r trying to do, u can edit src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter. org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : wikipedia splitter You should then be able to invoke via: mahout wikipediaXmlSplitter -d<path> -o<path> -c64 please give that a try. On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <[email protected]> wrote: Hi When I run mahout wikipediaXMLSplitter -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64 I get this error 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: wikipediaXMLSplitter java.lang.ClassNotFoundException: wikipediaXMLSplitter at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, will use command-line arguments only Unknown program 'wikipediaXMLSplitter' chosen. However the wikipediaXMLSplitter exists in mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java I know that it is possible to pass the full path but is there any way to define a variable that points to the correct location. Something like export WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/ Where should I add that? Regards, Mahmood
