Hello everybody.
I want to trigger the execution of an ItemSimilarityJob (mahout 0.7
snapshot) from a web service
interface. Hence, I want to implement a class that will contain an
ItemSimilarityJob object and whenever
I get a WS request, it will invoke the ItemSimilarityJob object's run
method. Is this possible?
And how is it done?
I am posting the code that I have written below:
public class Main {
public static void main(String[] args) throws IOException {
Configuration jobConf = new Configuration();
jobConf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
jobConf.addResource(new Path("/etc/hadoop/conf/mapred-site.xml"));
ItemSimilarityJob myJob = new ItemSimilarityJob();
String[] args1 = { "-Dmapred.input.dir=input/input.txt",
"-Dmapred.output.dir=output", "--similarityClassname",
"SIMILARITY_COOCCURRENCE" };
try {
myJob.main(args1);
}catch(Exception e) {
System.err.println(e.getMessage());
}
}
}
The output I get is:
Jun 5, 2012 9:14:46 AM org.apache.mahout.common.AbstractJob parseArguments
SEVERE: Unexpected mapred.output.dir=output while processing
Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
-archives <paths> comma separated archives to be unarchived
on the compute machines.
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-files <paths> comma separated files to be copied to the
map reduce cluster
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-libjars <paths> comma separated jar files to include in
the classpath.
-tokenCacheFile <tokensFile> name of the file with the tokens
Unexpected mapred.output.dir=output while processing Job-Specific Options:
Usage:
[--input <input> --output <output> --similarityClassname
<similarityClassname>
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefsPerUser
<maxPrefsPerUser> --minPrefsPerUser <minPrefsPerUser> --booleanData
<booleanData> --threshold <threshold> --help --tempDir <tempDir>
--startPhase
<startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job
input
directory.
--output (-o) output The directory
pathname for
output.
--similarityClassname (-s) similarityClassname Name of
distributed
similarity
measures
class to
instantiate,
alternatively
use one
of the
predefined
similarities
([SIMILARITY_COOCCURRE
NCE,
SIMILARITY_LOGLIKELIHO
OD,
SIMILARITY_TANIMOTO_CO
EFFICIENT,
SIMILARITY_CITY_BLOCK,
SIMILARITY_COSINE,
SIMILARITY_PEARSON_COR
RELATION,
SIMILARITY_EUCLIDEAN_D
ISTANCE])
--maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem try to cap
the number
of similar
items per
item to this
number
(default: 100)
--maxPrefsPerUser (-mppu) maxPrefsPerUser max number of
preferences to
consider per
user,
users with more
preferences
will be
sampled down
(default: 1000)
--minPrefsPerUser (-mp) minPrefsPerUser ignore users
with
less
preferences than
this
(default: 1)
--booleanData (-b) booleanData Treat input as
without pref
values
--threshold (-tr) threshold discard item
pairs
with a
similarity
value below this
--help (-h) Print out help
--tempDir tempDir Intermediate
output
directory
--startPhase startPhase First phase
to run
--endPhase endPhase Last phase to
run
Why do I get the above output?
Thank you in advance.
Nick K.