Re: Creating vectors from lucene index on EMR via the CLI

hellen maziku Wed, 12 Dec 2012 04:38:17 -0800

Also, what do you mean by " don't know much about this particular job", does 
the type of the job jar file matter? I thought as long as I could locate the 
org.apache.mahout.utils.vectors.lucene.Driver class then I was good to use that 
job jar file.


Btw, whenever I installed and compiled mahout 0.7 (both from sorce and trunk), 
I couldnot locate the mahout-utils-0.7 jar. Why is this so?

Thank you again.



________________________________
 From: Sean Owen <[email protected]>
To: Mahout User List <[email protected]>; hellen maziku 
<[email protected]> 
Sent: Wednesday, December 12, 2012 6:05 AM
Subject: Re: Creating vectors from lucene index on EMR via the CLI
 
I don't know much about this particular job, but the general problem
here is that you are passing arguments to a binary called
elastic-mapreduce, and not to the Java program. There is likely some
mechanism to package up arguments that need to be sent to the program,
as an argument to the elastic-mapreduce binary.

On Wed, Dec 12, 2012 at 11:55 AM, hellen maziku <[email protected]> wrote:
> Hi,
> I installed mahout and solr.
>
> I created an index from the dictionary.txt using the command below
>
> curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true"; 
> -F "[email protected]"
>
> To create the vectors from my index
>
> I needed the org.apache.mahout.utils.vectors.lucene.Driver class. I
> couldnot locate this class in mahout-core-o.7-job.jar. I could only
> locate it from mahout-examples-0.7-job.jar, so I uploaded the
> mahout-examples-0.7-job.jar on an s3 bucket.
>
> I also uploaded the dictionary index on a separete s3 bucket. I created
> another bucket with two folders to store my dictOut and vectors.
>
> I created a job flow on the CLI
>
> /elastic-mapreduce --create --alive    --log-uri s3n://mahout-output/logs/  
> --name dict_vectorize
>
> I added the step to vectorize my index using the following command
> ./elastic-mapreduce -j j-2NSJRI6N9EQJ4  --jar
> s3n://mahout-bucket/jars/mahout-examples-0.7-job.jar  --main-class
> org.apache.mahout.utils.vectors.lucene.Driver --arg --dir
> s3n://mahout-input/input1/index/ --arg --field doc1 --arg --dictOut
> s3n://mahout-output/solr-dict-out/dict.txt --arg --output
> s3n://mahout-output/solr-vect-out/vectors
>
>
> But in the logs I get the following error
>
> 2012-12-12 09:37:17,883 ERROR org.apache.mahout.utils.vectors.lucene.Driver 
> (main): Exception
> org.apache.commons.cli2.OptionException: Missing value(s) --dir
>     at 
>org.apache.commons.cli2.option.ArgumentImpl.validate(ArgumentImpl.java:241)
>     at org.apache.commons.cli2.option.ParentImpl.validate(ParentImpl.java:124)
>     at 
>org.apache.commons.cli2.option.DefaultOption.validate(DefaultOption.java:176)
>     at org.apache.commons.cli2.option.GroupImpl.validate(GroupImpl.java:265)
>     at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:104)
>     at
>  org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:197)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
>
>
> What am I doing wrong?
> Another question: what is the correct value of the --field argument, is it 
> doc1 (the id) or dictionary(from the filename dictionary.txt). I am asking
> this becasue when I issue the querry with q=doc1 on solr I get no
> results. But when I issue the query with q=dictionary, I see my content.
>
> Thank you so much for help. I am a newbie, so please excuse my being too 
> verbal.

Re: Creating vectors from lucene index on EMR via the CLI

Reply via email to