I’m having some trouble getting this to work with my own data. I issue the
following command:

mahout lucene.vector –dir
/home/markr/shgs/apache-solr-3.4.0/example/solr/data/index/ –output
/tmp/part-out.vec –field content_encoded –idField id –dictOut /tmp/dict.out
–norm 2

My intent is to generate term vectors for the content_encoded field whose
schema.xml entry has the termVectors=”true” attribute setting. There is also
a field named ‘id’. My data was imported into a sqlite3 db, and id is ‘not
null’, but content_encoded may be null. When I run, I get the SLF4J multiple
binding warning (just a warning?), and then the following exception:

Exception in thread “main” org.apache.lucene.index.CorruptIndexException:
unrecognized format -3 in file “_b.fnm”
at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:351)
at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71)
at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:72)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:114)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:92)
at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:113)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:29)
at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:750)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:428)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:288)
at org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:84)
at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:250)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)

Advise on how to debug this problem would be greatly appreciated.

Mark


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-in-thread-main-org-apache-lucene-index-CorruptIndexException-unrecognized-format-3-in-file-tp3438539p3438539.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to