Sure:

When I run the command:

./mahout-0.3/bin/mahout lucene.vector --dir ./example/solr/data/index/ --output ./tv1.5/part-out.vec --field body --idField itemId --dictOut dict1.5.out --norm 2 --max 50000 2>err.log

I receive the error:

Aug 6, 2010 1:56:59 PM org.slf4j.impl.JCLLoggerAdapter warn
WARNING: No lucene.vector.props found on classpath, will use command-line arguments only
Aug 6, 2010 1:56:59 PM org.slf4j.impl.JCLLoggerAdapter error
SEVERE: MahoutDriver failed with args: [--dir, ./example/solr/data/index/, --output, ./tv1.5/part-out.vec, --field, body, --idField, itemId, --dictOut, dict1.5.out, --norm, 2, --max, 50000, null]
Unknown format version: -10
Exception in thread "main" org.apache.lucene.index.CorruptIndexException: Unknown format version: -10
        at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:247)
        at 
org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:72)
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
        at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:476)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:314)
        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:144)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:172)

In that case, "example" contained a copy of the Solr1.5 index.

When I run with the command pointing to a Solr1.4 index, there was no problem.

For this reason, I investigated the IndexReader occurrences in Mahout to try and rewrite methods based on what was available in the Lucene trunk JavaDocs. With my changes, it compiled, but with some caveats in the new IndexReader implementations It read the index fine:

./trunk/bin/mahout lucene.vector --dir ../sandbox/example/solr/data/index/ --output ./tv1.5/part-out.vec --field body --idField itemId --dictOut dict1.5.out --norm 2 --max 500 2>err.log

This returns a legible dict1.5.out file. I did not run any mahout algorithms to process the vector data; I normally run clustering with the Solr1.4 data. I did run a --sizeonly vectordump of the resulting vector, with results.

Just now, I tried running my new code on the Solr1.4 index, which also seemed to work.

Thank you.

Steve



Quoting Ted Dunning <[email protected]>:

Trunk is still pretty close to 3.0.  The latest release is 3.0.1.  The next
release is 3.0.2 as far as I can tell.

The fact that the build says 4.0 is an artifact of the release process which
increments the version number willy nilly.

Can you say a little bit more about the failure?

Are you getting a compile failure?

On Fri, Aug 6, 2010 at 10:38 AM, <[email protected]> wrote:

Ted,

I am using the trunk version of Solr (
http://hudson.zones.apache.org/hudson/job/Solr-trunk/), which, when I
build it, includes "lucene-*-4.0-dev" jars (I am assuming this is trunk
Lucene, and I use the JavaDocs accordingly -
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/all/index.html
).

I understand that this is very new software, and there are API breaks in
many places.  Specifically, the IndexReader has changed.  I was hoping to
see if any Mahout users had tried importing vectors from this
Solr-trunk/Lucene-trunk combination.

Thanks!

Steve



Quoting Ted Dunning <[email protected]>:

 Lucene 4.0?  3.0 just came out.


http://hudson.zones.apache.org/hudson/view/Lucene/job/Lucene-trunk/lastSuccessfulBuild/artifact/lucene/build/docs/changes/Changes.html#older

On Fri, Aug 6, 2010 at 8:59 AM, <[email protected]> wrote:

 Hello,

I am trying to import an index from Solr 1.5, which uses the new Lucene
4.0
index and API.  I was wondering if there was any support for doing this
in
the trunk version of Mahout.

I have already modified my local copy of Mahout to read the index
partially, but wanted to see if there was any work underway to support
the
new API.  I can also try running the new code to import a Lucene 3.x
index
to see if that works.

Thank you!

Sincerely,
Steve McGill











Reply via email to