Re: Web Service Interface for triggering a Hadoop Job

Sean Owen Tue, 05 Jun 2012 00:58:11 -0700

-D arguments are to the JVM. You can't pass args to the JVM here and you
are passing them to the program instead.


These are really just setting key value pairs in the Configuration object.
So just do that instead.

In general I don't think this is a good design for a long running Hadoop
job to be babysat by a web server.
On Jun 5, 2012 8:50 AM, "Nikolaos Romanos Katsipoulakis" <[email protected]>
wrote:

> Hello everybody.
> I want to trigger the execution of an ItemSimilarityJob (mahout 0.7
> snapshot) from a web service
> interface. Hence, I want to implement a class that will contain an
> ItemSimilarityJob object and whenever
> I get a WS request, it will invoke the ItemSimilarityJob object's run
> method. Is this possible?
> And how is it done?
> I am posting the code that I have written below:
>
> public class Main {
>
>    public static void main(String[] args) throws IOException {
>        Configuration jobConf = new Configuration();
>        jobConf.addResource(new Path("/etc/hadoop/conf/core-**site.xml"));
>        jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-**site.xml"));
>        jobConf.addResource(new Path("/etc/hadoop/conf/mapred-**
> site.xml"));
>        ItemSimilarityJob myJob = new ItemSimilarityJob();
>        String[] args1 = { "-Dmapred.input.dir=input/**input.txt",
> "-Dmapred.output.dir=output", "--similarityClassname",
> "SIMILARITY_COOCCURRENCE" };
>        try {
>            myJob.main(args1);
>        }catch(Exception e) {
>            System.err.println(e.**getMessage());
>        }
>    }
>
> }
>
> The output I get is:
>
> Jun 5, 2012 9:14:46 AM org.apache.mahout.common.**AbstractJob
> parseArguments
> SEVERE: Unexpected mapred.output.dir=output while processing Job-Specific
> Options:
> usage: <command> [Generic Options] [Job-Specific Options]
> Generic Options:
>  -archives <paths>              comma separated archives to be unarchived
>                                on the compute machines.
>  -conf <configuration file>     specify an application configuration file
>  -D <property=value>            use value for given property
>  -files <paths>                 comma separated files to be copied to the
>                                map reduce cluster
>  -fs <local|namenode:port>      specify a namenode
>  -jt <local|jobtracker:port>    specify a job tracker
>  -libjars <paths>               comma separated jar files to include in
>                                the classpath.
>  -tokenCacheFile <tokensFile>   name of the file with the tokens
> Unexpected mapred.output.dir=output while processing Job-Specific Options:
> Usage:
>  [--input <input> --output <output> --similarityClassname
> <similarityClassname>
> --maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefsPerUser
> <maxPrefsPerUser> --minPrefsPerUser <minPrefsPerUser> --booleanData
> <booleanData> --threshold <threshold> --help --tempDir <tempDir>
> --startPhase
> <startPhase> --endPhase <endPhase>]
> Job-Specific Options:
>  --input (-i) input                                      Path to job input
>                                                          directory.
>  --output (-o) output                                    The directory
>                                                          pathname for
> output.
>  --similarityClassname (-s) similarityClassname          Name of
> distributed
>                                                          similarity
> measures
>                                                          class to
> instantiate,
>                                                          alternatively use
> one
>                                                          of the predefined
>                                                          similarities
>
>  ([SIMILARITY_COOCCURRE
>                                                          NCE,
>
>  SIMILARITY_LOGLIKELIHO
>                                                          OD,
>
>  SIMILARITY_TANIMOTO_CO
>                                                          EFFICIENT,
>
>  SIMILARITY_CITY_BLOCK,
>                                                          SIMILARITY_COSINE,
>
>  SIMILARITY_PEARSON_COR
>                                                          RELATION,
>
>  SIMILARITY_EUCLIDEAN_D
>                                                          ISTANCE])
>  --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem    try to cap the
> number
>                                                          of similar items
> per
>                                                          item to this
> number
>                                                          (default: 100)
>  --maxPrefsPerUser (-mppu) maxPrefsPerUser               max number of
>                                                          preferences to
>                                                          consider per user,
>                                                          users with more
>                                                          preferences will
> be
>                                                          sampled down
>                                                          (default: 1000)
>  --minPrefsPerUser (-mp) minPrefsPerUser                 ignore users with
>                                                          less preferences
> than
>                                                          this (default: 1)
>  --booleanData (-b) booleanData                          Treat input as
>                                                          without pref
> values
>  --threshold (-tr) threshold                             discard item pairs
>                                                          with a similarity
>                                                          value below this
>  --help (-h)                                             Print out help
>  --tempDir tempDir                                       Intermediate
> output
>                                                          directory
>  --startPhase startPhase                                 First phase to run
>  --endPhase endPhase                                     Last phase to run
>
> Why do I get the above output?
>
> Thank you in advance.
>
> Nick K.
>

Re: Web Service Interface for triggering a Hadoop Job

Reply via email to