I’m having some trouble getting this to work with my own data. I issue the following command:
mahout lucene.vector –dir /home/markr/shgs/apache-solr-3.4.0/example/solr/data/index/ –output /tmp/part-out.vec –field content_encoded –idField id –dictOut /tmp/dict.out –norm 2 My intent is to generate term vectors for the content_encoded field whose schema.xml entry has the termVectors=”true” attribute setting. There is also a field named ‘id’. My data was imported into a sqlite3 db, and id is ‘not null’, but content_encoded may be null. When I run, I get the SLF4J multiple binding warning (just a warning?), and then the following exception: Exception in thread “main” org.apache.lucene.index.CorruptIndexException: unrecognized format -3 in file “_b.fnm” at org.apache.lucene.index.FieldInfos.read(FieldInfos.java:351) at org.apache.lucene.index.FieldInfos.(FieldInfos.java:71) at org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:72) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:114) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:92) at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:113) at org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:29) at org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:81) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:750) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:75) at org.apache.lucene.index.IndexReader.open(IndexReader.java:428) at org.apache.lucene.index.IndexReader.open(IndexReader.java:288) at org.apache.mahout.utils.vectors.lucene.Driver.dumpVectors(Driver.java:84) at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) Advise on how to debug this problem would be greatly appreciated. Mark -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-in-thread-main-org-apache-lucene-index-CorruptIndexException-unrecognized-format-3-in-file-tp3438539p3438539.html Sent from the Mahout User List mailing list archive at Nabble.com.
