Re: mahout command

Suneel Marthi Sat, 08 Mar 2014 11:59:50 -0800

You can ignore the warnings.





On Saturday, March 8, 2014 2:58 PM, Mahmood Naderan <nt_mahm...@yahoo.com> 
wrote:
 
Oh yes... Thanks Andrew you are right
Meanwhile I see two warnings

WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found on classpath, 
will use command-line arguments only
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your 
platform... using builtin-java classes where applicable

Is there any concern about them?


 
R.egards,
Mahmood



On Saturday, March 8, 2014 11:19 PM, Suneel Marthi <suneel_mar...@yahoo.com> 
wrote:
 
Thanks Andrew, that seems to have been the issue all the while.
Nevertheless, it is better to run from Head if running on Hadoop 2.3.0





On Saturday, March 8, 2014 2:42 PM, Andrew Musselman 
<andrew.mussel...@gmail.com> wrote:

You have upper-case in your command but lower-case in your declaration in
the properties file; correct that and it should work.

Note:
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
wikipediaXmlSplitter : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
-d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64



On Sat, Mar 8, 2014 at 11:11 AM, Mahmood Naderan <nt_mahm...@yahoo.com>wrote:

> No success Suneel...
>
> Please see the attachment which is the output of
>      mvn clean package -Dhadoop2.version=2.3.0
>
> Additionally:
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to text
> org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence
> File dumper
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ bin/mahout wikipediaXMLSplitter
> -d examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/08 22:37:03 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
>
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
 java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/08 22:37:03 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
>
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE and MAE of a rating matrix
> factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the
 item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
>
 collaborative filtering
>   regexconverter: : Convert text files on a per line basis based on
> regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence
 files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>   seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value
 Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>   transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi
 decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 7:28 PM, Suneel Marthi <
> suneel_mar...@yahoo.com> wrote:
>
> mvn clean package -Dhadoop2.version=2.3.0
>
> please give that a try.
>
>
>   On Saturday, March 8, 2014 9:56
 AM, Mahmood Naderan <
> nt_mahm...@yahoo.com> wrote:
>
> >mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
> Excuse me, if I have 2.3.0 which command is correct
> mvn clean package -Dhadoop2.3.0.=2.3.0
> mvn clean package -Dhadoop2.version=2.3.0
>
> Regards,
> Mahmood
>
>
>   On Saturday, March 8, 2014 3:50 PM, Suneel Marthi <
> suneel_mar...@yahoo.com> wrote:
>  Not sure what's so disappointing here, it was never officially announced
> that Mahout 0.9 had Hadoop 2.x support.
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
>   On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <nt_mahm...@yahoo.com>
> wrote:
>  That is rather disappointing....
>
> >b) Work off of present Head and build with Hadoop 2.x profile.
> Can you explain more?
>
>
> Regards,
> Mahmood
>
>
>   On Friday, March 7, 2014 8:09
 PM, Suneel Marthi <suneel_mar...@yahoo.com>
> wrote:
>  The example as documented on the Wiki should work.  The issue u seem to
> be running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a
> Hadoop 2.3 environment. I don't think that's gonna work.
>
> Suggest that you either:
>
> a) Switch to a Hadoop 1.2.1 environment
> b) Work off of present Head and build with Hadoop 2.x profile.
>
> Mahout 0.9 is not certified for Hadoop 2.x.
>
>
>
>
>   On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <nt_mahm...@yahoo.com>
> wrote:
>  FYI, I am trying to complete the wikipedia example from Apache's document
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <nt_mahm...@yahoo.com>
> wrote:
>
> In fact,  see this file
>    
 src/conf/driver.classes.default.props
>
> which is not exactly as what you said. Still I have the same problem.
> Please see the complete log
>
> hadoop@solaris:~/mahout-distribution-0.9$ head -n 5
> src/conf/driver.classes.default.props
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
> #Utils
> org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors
> from a sequence file to text
> org.apache.mahout.utils.clustering.ClusterDumper = clusterdump : Dump
> cluster output to
> text
> org.apache.mahout.utils.SequenceFileDumper =
 seqdumper : Generic Sequence
> File dumper
>
>
>
> hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
> Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and
> HADOOP_CONF_DIR=
> MAHOUT-JOB:
> /export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> 14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at
> java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at
 org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 17:19:04 WARN driver.MahoutDriver: No
> wikipediaXMLSplitter.props found on classpath, will use command-line
> arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
> Valid program names are:
>   arff.vector: : Generate Vectors from an ARFF file or directory
>   baumwelch: : Baum-Welch algorithm for unsupervised HMM training
>   canopy: : Canopy clustering
>   cat: : Print a file or resource as the logistic regression models would
> see it
>   cleansvd: : Cleanup and verification of SVD output
>   clusterdump: : Dump cluster output to text
>   clusterpp: : Groups Clustering Output In Clusters
>   cmdump: : Dump confusion matrix in HTML or text formats
>   concatmatrices: : Concatenates 2 matrices of same cardinality into a
> single matrix
>   cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
>   cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
>   evaluateFactorization: : compute RMSE
> and MAE of a rating matrix factorization against probes
>   fkmeans: : Fuzzy K-means clustering
>   hmmpredict: : Generate random sequence of observations by given HMM
>   itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering
>   kmeans: : K-means clustering
>   lucene.vector: : Generate Vectors from a Lucene index
>   lucene2seq: : Generate Text SequenceFiles from a Lucene index
>   matrixdump: : Dump matrix in CSV format
>   matrixmult: : Take the product of two matrices
>   parallelALS: : ALS-WR factorization of a rating matrix
>   qualcluster: : Runs clustering experiments and summarizes results in a
> CSV
>   recommendfactorized: : Compute recommendations using the factorization
> of a rating matrix
>   recommenditembased: : Compute recommendations using item-based
> collaborative filtering
>   regexconverter: : Convert text files on a per
> line basis based on regular expressions
>   resplit: : Splits a set of SequenceFiles into a number of equal splits
>   rowid: : Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>   rowsimilarity: : Compute the pairwise similarities of the rows of a
> matrix
>   runAdaptiveLogistic: : Score new production data using a probably
> trained and validated AdaptivelogisticRegression model
>   runlogistic: : Run a logistic regression model against CSV data
>   seq2encoded: : Encoded Sparse Vector generation from Text sequence files
>   seq2sparse: : Sparse Vector generation from Text sequence files
>   seqdirectory: : Generate sequence files (of Text) from a directory
>   seqdumper: : Generic Sequence File dumper
>   seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives
>
>
 seqwiki: : Wikipedia xml dump to sequence file
>   spectralkmeans: : Spectral k-means clustering
>   split: : Split Input data into test and train sets
>   splitDataset: : split a rating dataset into training and probe parts
>   ssvd: : Stochastic SVD
>   streamingkmeans: : Streaming k-means clustering
>   svd: : Lanczos Singular Value Decomposition
>   testnb: : Test the Vector-based Bayes classifier
>   trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
>   trainlogistic: : Train a logistic regression using stochastic gradient
> descent
>   trainnb: : Train the Vector-based Bayes classifier
>  
 transpose: : Take the transpose of a matrix
>   validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set
>   vecdist: : Compute the distances between a set of Vectors (or Cluster or
> Canopy, they must fit in memory) and a list of
> Vectors
>   vectordump: : Dump vectors from a sequence file to text
>   viterbi: : Viterbi decoding of hidden states from given output states
> sequence
>   wikipediaXmlSplitter: : wikipedia splitter
> hadoop@solaris:~/mahout-distribution-0.9$
>
>
>
>
>
>
> Regards,
> Mahmood
>
>
>
> On Friday, March 7, 2014 5:02 PM, Suneel Marthi <suneel_mar...@yahoo.com>
> wrote:
>
> Mehmood,
>
> wikipediaXMLSplitter is not present in driver.classes.default.props. To
> accomplish what u r trying to do, u can edit
> src/conf/driver.classes/default/props and add an entry for
> wikipediaXMLSplitter.
>
> org.apache.mahout.text.wikipedia.WikipediaXmlSplitter =
> wikipediaXmlSplitter : wikipedia splitter
>
> You should then be able to invoke via:
>
> mahout wikipediaXmlSplitter -d<path> -o<path> -c64
>
> please give that a try.
>
>
>
>
>
>
>
>
> On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <nt_mahm...@yahoo.com>
> wrote:
>
> Hi
> When I run
>
>     mahout wikipediaXMLSplitter -d
> examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
>
> I get this error
> 14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class:
> wikipediaXMLSplitter
> java.lang.ClassNotFoundException: wikipediaXMLSplitter
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native
> Method)
>     at java.lang.Class.forName(Class.java:186)
>     at
>
 org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at
> java.lang.reflect.Method.invoke(Method.java:601)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> 14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props
> found on classpath, will use command-line arguments only
> Unknown program 'wikipediaXMLSplitter' chosen.
>
>
> However the wikipediaXMLSplitter exists in
>
> mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java
>
> I know that it is possible to pass the full path but is there any way to
> define a variable that points to the correct location. Something like
>
>    export
> WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/
>
> Where should I add that?
>
>
> Regards,
> Mahmood
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: mahout command

Reply via email to