Re: mahout command

Suneel Marthi Sat, 08 Mar 2014 04:22:23 -0800

Not sure what's so disappointing here, it was never officially announced that 
Mahout 0.9 had Hadoop 2.x support.


From trunk, can you build mahout for hadoop2 using this command:

mvn clean package -Dhadoop2.version=<YOUR_HADOOP2_VERSION>



On Friday, March 7, 2014 12:12 PM, Mahmood Naderan <[email protected]> wrote:
 
That is rather disappointing.... 

>b) Work off of present Head and build with Hadoop 2.x profile. 
Can you explain more? 


 
Regards,
Mahmood



On Friday, March 7, 2014 8:09 PM, Suneel Marthi <[email protected]> wrote:
 
The example as documented on the Wiki should work.  The issue u seem to be 
running Mahout 0.9 distro that was built with hadoop 1.2.1 profile on a Hadoop 
2.3 environment. I don't think that's gonna work.

Suggest that you either:

a) Switch to a
 Hadoop 1.2.1 environment
b) Work off of present Head and build with Hadoop 2.x profile. 

Mahout 0.9 is not certified for Hadoop 2.x.






On Friday, March 7, 2014 11:16 AM, Mahmood Naderan <[email protected]> wrote:
 
FYI, I am trying to complete the wikipedia example from Apache's document
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Example

 
Regards,
Mahmood




On Friday, March 7, 2014 5:23 PM, Mahmood Naderan <[email protected]> wrote:

In fact,  see this file
    src/conf/driver.classes.default.props

which is not exactly as what you said. Still I have the same problem. Please 
see the complete log

hadoop@solaris:~/mahout-distribution-0.9$ head -n 5 
src/conf/driver.classes.default.props 
org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : 
wikipedia splitter
#Utils
org.apache.mahout.utils.vectors.VectorDumper = vectordump : Dump vectors from a 
sequence file to text
org.apache.mahout.utils.clustering.ClusterDumper = clusterdump :
 Dump cluster output to
text
org.apache.mahout.utils.SequenceFileDumper = seqdumper : Generic Sequence File 
dumper



hadoop@solaris:~/mahout-distribution-0.9$ mahout wikipediaXMLSplitter -d 
examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64
Running on hadoop, using /export/home/hadoop/hadoop-2.3.0/bin/hadoop and 
HADOOP_CONF_DIR=
MAHOUT-JOB: 
/export/home/hadoop/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
14/03/07 17:19:04 WARN driver.MahoutDriver: Unable to add class: 
wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at
java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 17:19:04 WARN driver.MahoutDriver: No
wikipediaXMLSplitter.props found on classpath, will use command-line arguments 
only
Unknown program 'wikipediaXMLSplitter' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump:
 : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  concatmatrices: : Concatenates 2 matrices of same cardinality into a single 
matrix
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  evaluateFactorization: : compute RMSE
and MAE of a rating matrix factorization against probes
  fkmeans: : Fuzzy K-means clustering
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based 
collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate
 Vectors from a Lucene index
  lucene2seq: : Generate Text SequenceFiles from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  parallelALS: : ALS-WR factorization of a rating matrix
  qualcluster: : Runs clustering experiments and summarizes results in a CSV
  recommendfactorized: : Compute recommendations using the factorization of a 
rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative 
filtering
  regexconverter: : Convert text files on a per
line basis based on regular expressions
  resplit: : Splits a set of SequenceFiles into a number of equal splits
  rowid: : Map SequenceFile<Text,VectorWritable> to
 {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and 
validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped 
mail archives
 
seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  streamingkmeans: : Streaming k-means clustering
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model 
against hold-out data set
 
 vecdist: : Compute the distances between a set of Vectors (or Cluster or 
Canopy, they must fit in memory) and a list of
Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
  wikipediaXmlSplitter: : wikipedia splitter
hadoop@solaris:~/mahout-distribution-0.9$ 





 
Regards,
Mahmood



On Friday, March 7, 2014 5:02 PM, Suneel Marthi <[email protected]> wrote:

Mehmood,

wikipediaXMLSplitter is not present in driver.classes.default.props. To 
accomplish what u r trying to do, u can edit 
src/conf/driver.classes/default/props and add an entry for wikipediaXMLSplitter.

org.apache.mahout.text.wikipedia.WikipediaXmlSplitter = wikipediaXmlSplitter : 
wikipedia splitter

You should then be able to invoke via:

mahout wikipediaXmlSplitter -d<path> -o<path> -c64

please give that a try.








On Friday, March 7, 2014 8:11 AM, Mahmood Naderan <[email protected]> wrote:

Hi
When I run 

    mahout wikipediaXMLSplitter -d 
examples/temp/enwiki-latest-pages-articles.xml -o wikipedia/chunks -c 64

I get this error
14/03/07 16:24:13 WARN driver.MahoutDriver: Unable to add class: 
wikipediaXMLSplitter
java.lang.ClassNotFoundException: wikipediaXMLSplitter
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at
 java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native
Method)
    at java.lang.Class.forName(Class.java:186)
    at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at
java.lang.reflect.Method.invoke(Method.java:601)
    at
 org.apache.hadoop.util.RunJar.main(RunJar.java:212)
14/03/07 16:24:13 WARN driver.MahoutDriver: No wikipediaXMLSplitter.props found 
on classpath, will use command-line arguments only
Unknown program 'wikipediaXMLSplitter' chosen.


However the wikipediaXMLSplitter exists in
mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/WikipediaXmlSplitter.java

I know that it is possible to pass the full path but is there any way to define 
a variable that points to the correct location. Something like 

   export 
WIKI=mahout-distribution-0.9/integration/src/main/java/org/apache/mahout/text/wikipedia/

Where should I add that?

 
Regards,
Mahmood

Re: mahout command

Reply via email to