Web Service Interface for triggering a Hadoop Job

Nikolaos Romanos Katsipoulakis Tue, 05 Jun 2012 00:50:58 -0700

Hello everybody.

I want to trigger the execution of an ItemSimilarityJob (mahout 0.7snapshot) from a web serviceinterface. Hence, I want to implement a class that will contain anItemSimilarityJob object and wheneverI get a WS request, it will invoke the ItemSimilarityJob object's runmethod. Is this possible?

And how is it done?
I am posting the code that I have written below:


public class Main {

    public static void main(String[] args) throws IOException {
        Configuration jobConf = new Configuration();
        jobConf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
        jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
        jobConf.addResource(new Path("/etc/hadoop/conf/mapred-site.xml"));
        ItemSimilarityJob myJob = new ItemSimilarityJob();

String[] args1 = { "-Dmapred.input.dir=input/input.txt","-Dmapred.output.dir=output", "--similarityClassname","SIMILARITY_COOCCURRENCE" };

        try {
            myJob.main(args1);
        }catch(Exception e) {
            System.err.println(e.getMessage());
        }
    }

}

The output I get is:

Jun 5, 2012 9:14:46 AM org.apache.mahout.common.AbstractJob parseArguments

SEVERE: Unexpected mapred.output.dir=output while processingJob-Specific Options:

usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
 -archives <paths>              comma separated archives to be unarchived
                                on the compute machines.
 -conf <configuration file>     specify an application configuration file
 -D <property=value>            use value for given property
 -files <paths>                 comma separated files to be copied to the
                                map reduce cluster
 -fs <local|namenode:port>      specify a namenode
 -jt <local|jobtracker:port>    specify a job tracker
 -libjars <paths>               comma separated jar files to include in
                                the classpath.
 -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected mapred.output.dir=output while processing Job-Specific Options:
Usage:

[--input <input> --output <output> --similarityClassname<similarityClassname>

--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefsPerUser
<maxPrefsPerUser> --minPrefsPerUser <minPrefsPerUser> --booleanData

<booleanData> --threshold <threshold> --help --tempDir <tempDir>--startPhase

<startPhase> --endPhase <endPhase>]
Job-Specific Options:

--input (-i) input Path to jobinput

                                                          directory.
  --output (-o) output                                    The directory

pathname foroutput.--similarityClassname (-s) similarityClassname Name ofdistributedsimilaritymeasuresclass toinstantiate,alternativelyuse oneof thepredefined

                                                          similarities

([SIMILARITY_COOCCURRE

                                                          NCE,

SIMILARITY_LOGLIKELIHO

OD,

SIMILARITY_TANIMOTO_CO

                                                          EFFICIENT,

SIMILARITY_CITY_BLOCK,SIMILARITY_COSINE,SIMILARITY_PEARSON_COR

                                                          RELATION,

SIMILARITY_EUCLIDEAN_D

                                                          ISTANCE])

--maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem try to capthe numberof similaritems peritem to thisnumber

                                                          (default: 100)
  --maxPrefsPerUser (-mppu) maxPrefsPerUser               max number of
                                                          preferences to

consider peruser,

                                                          users with more

preferenceswill be

                                                          sampled down
                                                          (default: 1000)

--minPrefsPerUser (-mp) minPrefsPerUser ignore userswithlesspreferences thanthis(default: 1)

  --booleanData (-b) booleanData                          Treat input as

without prefvalues--threshold (-tr) threshold discard itempairswith asimilarity

                                                          value below this
  --help (-h)                                             Print out help

--tempDir tempDir Intermediateoutput

                                                          directory

--startPhase startPhase First phaseto run--endPhase endPhase Last phase torun


Why do I get the above output?

Thank you in advance.

Nick K.

Web Service Interface for triggering a Hadoop Job

Reply via email to