Hello everybody.
I want to trigger the execution of an ItemSimilarityJob (mahout 0.7 snapshot) from a web service interface. Hence, I want to implement a class that will contain an ItemSimilarityJob object and whenever I get a WS request, it will invoke the ItemSimilarityJob object's run method. Is this possible?
And how is it done?
I am posting the code that I have written below:

public class Main {

    public static void main(String[] args) throws IOException {
        Configuration jobConf = new Configuration();
        jobConf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
        jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
        jobConf.addResource(new Path("/etc/hadoop/conf/mapred-site.xml"));
        ItemSimilarityJob myJob = new ItemSimilarityJob();
String[] args1 = { "-Dmapred.input.dir=input/input.txt", "-Dmapred.output.dir=output", "--similarityClassname", "SIMILARITY_COOCCURRENCE" };
        try {
            myJob.main(args1);
        }catch(Exception e) {
            System.err.println(e.getMessage());
        }
    }

}

The output I get is:

Jun 5, 2012 9:14:46 AM org.apache.mahout.common.AbstractJob parseArguments
SEVERE: Unexpected mapred.output.dir=output while processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
 -archives <paths>              comma separated archives to be unarchived
                                on the compute machines.
 -conf <configuration file>     specify an application configuration file
 -D <property=value>            use value for given property
 -files <paths>                 comma separated files to be copied to the
                                map reduce cluster
 -fs <local|namenode:port>      specify a namenode
 -jt <local|jobtracker:port>    specify a job tracker
 -libjars <paths>               comma separated jar files to include in
                                the classpath.
 -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected mapred.output.dir=output while processing Job-Specific Options:
Usage:
[--input <input> --output <output> --similarityClassname <similarityClassname>
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefsPerUser
<maxPrefsPerUser> --minPrefsPerUser <minPrefsPerUser> --booleanData
<booleanData> --threshold <threshold> --help --tempDir <tempDir> --startPhase
<startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input
                                                          directory.
  --output (-o) output                                    The directory
pathname for output. --similarityClassname (-s) similarityClassname Name of distributed similarity measures class to instantiate, alternatively use one of the predefined
                                                          similarities
([SIMILARITY_COOCCURRE
                                                          NCE,
SIMILARITY_LOGLIKELIHO
                                                          OD,
SIMILARITY_TANIMOTO_CO
                                                          EFFICIENT,
SIMILARITY_CITY_BLOCK, SIMILARITY_COSINE, SIMILARITY_PEARSON_COR
                                                          RELATION,
SIMILARITY_EUCLIDEAN_D
                                                          ISTANCE])
--maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem try to cap the number of similar items per item to this number
                                                          (default: 100)
  --maxPrefsPerUser (-mppu) maxPrefsPerUser               max number of
                                                          preferences to
consider per user,
                                                          users with more
preferences will be
                                                          sampled down
                                                          (default: 1000)
--minPrefsPerUser (-mp) minPrefsPerUser ignore users with less preferences than this (default: 1)
  --booleanData (-b) booleanData                          Treat input as
without pref values --threshold (-tr) threshold discard item pairs with a similarity
                                                          value below this
  --help (-h)                                             Print out help
--tempDir tempDir Intermediate output
                                                          directory
--startPhase startPhase First phase to run --endPhase endPhase Last phase to run

Why do I get the above output?

Thank you in advance.

Nick K.

Reply via email to