Re: How to run Mahout java code from commandline ?

praveenesh kumar Sun, 25 Sep 2011 00:53:16 -0700

Okay.. Heres what I am trying to do.

My code is this :



 import java.io.File;
 import java.io.IOException;
 import java.nio.charset.Charset;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Collection;
 import java.util.HashSet;
 import java.util.Map;
 import java.util.Set;
 import java.util.List;

 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.SequenceFile;
 import org.apache.hadoop.io.Text;
 //import org.apache.lucene.util.Attribute;
 import org.apache.mahout.common.FileLineIterable;
 import org.apache.mahout.common.StringRecordIterator;

 import org.apache.mahout.fpm.pfpgrowth.convertors.ContextStatusUpdater;
 import
org.apache.mahout.fpm.pfpgrowth.convertors.SequenceFileOutputCollector;
 import
org.apache.mahout.fpm.pfpgrowth.convertors.string.StringOutputConverter;



 import
org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns;
 import org.apache.mahout.fpm.pfpgrowth.fpgrowth.FPGrowth;
 //import org.apache.mahout.math.map.OpenLongObjectHashMap;

 import org.apache.mahout.common.Pair;

 public class DellFPGrowth {

    public static void main(String[] args) throws IOException {

        Set<String> features = new HashSet<String>();
        String input =
"/mnt/hgfs/Hadoop-automation/new-delltransaction.txt";
        int minSupport = 1;
        int maxHeapSize = 50;//top-k
        String pattern = " \"[ ,\\t]*[,|\\t][ ,\\t]*\" ";
        Charset encoding = Charset.forName("UTF-8");
        FPGrowth<String> fp = new FPGrowth<String>();
        String output = "/tmp/output.txt";
        Path path = new Path(output);
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);


        SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path,
Text.class, TopKStringPatterns.class);


 fp.generateTopKFrequentPatterns(
                new StringRecordIterator(new FileLineIterable(new
File(input), encoding, false), pattern),
                fp.generateFList(
                    new StringRecordIterator(new FileLineIterable(new
File(input), encoding, false), pattern),
                    minSupport),
                minSupport,
                maxHeapSize,
                features,
                new StringOutputConverter(new
SequenceFileOutputCollector<Text,TopKStringPatterns>(writer)),
                new ContextStatusUpdater(null));

        writer.close();

        List<Pair<String,TopKStringPatterns>> frequentPatterns =
FPGrowth.readFrequentPattern(fs, conf, path);
        for (Pair<String,TopKStringPatterns> entry : frequentPatterns) {
              System.out.println(entry.getSecond());
        }
        System.out.print("\nthe end! ");
    }

}


1. I am able to compile and run this code from eclipse, so I took the .class
file from eclipse target folder. Put it in some other directory and make a
simple jar file using jar -cvf command.

2. Since I am using mahout 0.4 and MAHOUT_CONF_DIR is default pointed to
$MAHOUT_HOME/conf so I just added my jar directly to $MAHOUT_HOME/conf
folder, added the entry of my class in drivers.classes.props file.

I added the following line at the end
com.musigma.hpc.CallFPGrowth = callfpgrowth : Calls fpgrowth

com.musigma.hpc.CallFPGrowth is my class that I want to run from cmd and its
in the jar.

3. Now when I am running bin/mahout, I am getting the following exception

hadoop@ubuntu:/tmp/mahout-distribution-0.4$ bin/mahout

Running on hadoop, using HADOOP_HOME=/usr/local/hadoop/hadoop
HADOOP_CONF_DIR=/usr/local/hadoop/hadoop/conf
11/09/25 00:40:07 WARN driver.MahoutDriver: Unable to add class:
com.musigma.hpc.CallFPGrowth
java.lang.ClassNotFoundException: com.musigma.hpc.CallFPGrowth
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:207)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:117)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


How can I resolve this issue ?


On Sat, Sep 24, 2011 at 2:55 PM, Lance Norskog <[email protected]> wrote:

> Ah! That is all off in Maven-land. There is a maven feature called "exec".
>
> http://mojo.codehaus.org/exec-maven-plugin/
>
> There are examples for this in the Mahout wiki. Search for "exec:java".
>
> On Sat, Sep 24, 2011 at 2:42 AM, praveenesh kumar <[email protected]
> >wrote:
>
> > Which mahout jars are required to run this code and where I can find them
> ?
> > I have this src downloaded .. but there are no jars in the src ?
> >
> >
> > On Sat, Sep 24, 2011 at 2:35 AM, Paritosh Ranjan <[email protected]>
> > wrote:
> >
> > > Just add the mahout jars in the class path while compiling/executing.
> > > Search "java jar in classpath" on google.
> > >
> > >
> > > On 24-09-2011 15:01, praveenesh kumar wrote:
> > >
> > >> I mean to say..
> > >>
> > >> I have this code ..
> > >>
> > >>  import java.io.File;
> > >>  import java.io.IOException;
> > >>  import java.nio.charset.Charset;
> > >>  import java.util.ArrayList;
> > >>  import java.util.Arrays;
> > >>  import java.util.Collection;
> > >>  import java.util.HashSet;
> > >>  import java.util.Map;
> > >>  import java.util.Set;
> > >>  import java.util.List;
> > >>
> > >>  import org.apache.hadoop.conf.**Configuration;
> > >>  import org.apache.hadoop.fs.**FileSystem;
> > >>  import org.apache.hadoop.fs.Path;
> > >>  import org.apache.hadoop.io.**SequenceFile;
> > >>  import org.apache.hadoop.io.Text;
> > >>  //import org.apache.lucene.util.**Attribute;
> > >>  import org.apache.mahout.common.**FileLineIterable;
> > >>  import org.apache.mahout.common.**StringRecordIterator;
> > >>
> > >>  import org.apache.mahout.fpm.**pfpgrowth.convertors.**
> > >> ContextStatusUpdater;
> > >>  import
> > >> org.apache.mahout.fpm.**pfpgrowth.convertors.**
> > >> SequenceFileOutputCollector;
> > >>  import
> > >> org.apache.mahout.fpm.**pfpgrowth.convertors.string.**
> > >> StringOutputConverter;
> > >>
> > >>
> > >>
> > >>  import
> > >>
> > org.apache.mahout.fpm.**pfpgrowth.convertors.string.**TopKStringPatterns;
> > >>  import org.apache.mahout.fpm.**pfpgrowth.fpgrowth.FPGrowth;
> > >>  //import org.apache.mahout.math.map.**OpenLongObjectHashMap;
> > >>
> > >>  import org.apache.mahout.common.Pair;
> > >>
> > >>  public class DellFPGrowth {
> > >>
> > >>     public static void main(String[] args) throws IOException {
> > >>
> > >>         Set<String>  features = new HashSet<String>();
> > >>         String input =
> > >> "/mnt/hgfs/Hadoop-automation/**new-delltransaction.txt";
> > >>         int minSupport = 1;
> > >>         int maxHeapSize = 50;//top-k
> > >>         String pattern = " \"[ ,\\t]*[,|\\t][ ,\\t]*\" ";
> > >>         Charset encoding = Charset.forName("UTF-8");
> > >>         FPGrowth<String>  fp = new FPGrowth<String>();
> > >>         String output = "/tmp/output.txt";
> > >>         Path path = new Path(output);
> > >>         Configuration conf = new Configuration();
> > >>         FileSystem fs = FileSystem.get(conf);
> > >>
> > >>
> > >>         SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
> > >> path,
> > >> Text.class, TopKStringPatterns.class);
> > >>
> > >>
> > >> fp.**generateTopKFrequentPatterns(
> > >>                 new StringRecordIterator(new FileLineIterable(new
> > >> File(input), encoding, false), pattern),
> > >>                 fp.generateFList(
> > >>                     new StringRecordIterator(new FileLineIterable(new
> > >> File(input), encoding, false), pattern),
> > >>                     minSupport),
> > >>                 minSupport,
> > >>                 maxHeapSize,
> > >>                 features,
> > >>                 new StringOutputConverter(new
> > >> SequenceFileOutputCollector<**Text,TopKStringPatterns>(**writer)),
> > >>                 new ContextStatusUpdater(null));
> > >>
> > >>         writer.close();
> > >>
> > >>         List<Pair<String,**TopKStringPatterns>>  frequentPatterns =
> > >> FPGrowth.readFrequentPattern(**fs, conf, path);
> > >>         for (Pair<String,**TopKStringPatterns>  entry :
> > frequentPatterns)
> > >> {
> > >>               System.out.println(entry.**getSecond());
> > >>         }
> > >>         System.out.print("\nthe end! ");
> > >>     }
> > >>
> > >> }
> > >>
> > >>
> > >> How should I compile and run using command line..
> > >> I don't have eclipse on my system. How can I run this code  ?
> > >>
> > >> Thanks,
> > >> Praveenesh
> > >>
> > >> On Sat, Sep 24, 2011 at 12:40 PM, Danny Bickson<danny.bickson@gmail.
> > **com<[email protected]>
> > >> >wrote:
> > >>
> > >>  It is very simple: in the root folder you run (for example for
> > k-means:)
> > >>> ./bin/mahout kmeans -i ~/usr7/small_netflix_mahout/ -o
> > >>> ~/usr7/small_netflix_mahout_**output/ --numClusters
> > >>> 10 -c ~/usr7/small_netflix_mahout/ -x 10
> > >>>
> > >>> where ./bin/mahout is used for any mahout application, and the next
> > >>> keyword
> > >>> (kmeans in this case) defines the algorithm type.
> > >>> The rest of the inputs are algorithm specific.
> > >>>
> > >>> If you want to add a new application to the existing ones, you need
> to
> > >>> edit
> > >>> conf/driver.classes.props
> > >>> file and point into your main class.
> > >>>
> > >>> Best,
> > >>>
> > >>> - Danny Bickson
> > >>>
> > >>> On Sat, Sep 24, 2011 at 9:59 AM, praveenesh kumar<
> [email protected]
> > >>>
> > >>>> wrote:
> > >>>> Hey,
> > >>>> I have this code written using mahout libraries. I am able to run
> the
> > >>>>
> > >>> code
> > >>>
> > >>>> from eclipse
> > >>>> How can I run the code written in mahout from command line ?
> > >>>>
> > >>>> My question is do I have to make a jar file and run it as hadoop jar
> > >>>> jarfilename.jar class
> > >>>> or shall I run it using simple java command ?
> > >>>>
> > >>>> Can anyone solve my confusion ?
> > >>>> I am not able to run this code.
> > >>>>
> > >>>> Thanks,
> > >>>> Praveenesh
> > >>>>
> > >>>>
> > >>
> > >> -----
> > >> No virus found in this message.
> > >> Checked by AVG - www.avg.com
> > >> Version: 10.0.1410 / Virus Database: 1520/3915 - Release Date:
> 09/23/11
> > >>
> > >
> > >
> >
>
>
>
> --
> Lance Norskog
> [email protected]
>

Re: How to run Mahout java code from commandline ?

Reply via email to