Hello all, We are looking at utilizing LDA for some topic trending off some pre-built Lucene indexes. I've put the command(s) and output below. While searching, it seems a lot of people are unable to get this to work properly. Most answers tell the user to review the example "build-reuters.sh" but that doesn't utilize a Lucene index for the input.
The dictionary is created (on local disk) and an attempt at vector creation is done on HDFS, however no vectors are written out. I'm interested to know if anyone has actually gotten this to work on Mahout 0.4. I have (just for testing purposes) then tried to run the actual LDA on the created directories, however I wouldn't expect it to work since there are no vectors created. Thanks, Chris bin/mahout lucene.vector --dir /home/index_for_mahout/ --output /user/vectored_lucene_index --dictOut /home/vectored_lucene_index/dict.out --weight TF --field content 11/05/02 17:23:57 INFO lucene.Driver: Output File: /user/vectored_lucene_index 11/05/02 17:23:57 INFO util.NativeCodeLoader: Loaded the native-hadoop library 11/05/02 17:23:57 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 11/05/02 17:23:57 INFO compress.CodecPool: Got brand-new compressor 11/05/02 17:23:58 INFO lucene.Driver: Wrote: 0 vectors 11/05/02 17:23:58 INFO lucene.Driver: Dictionary Output file: /home/vectored_lucene_index/dict.out 11/05/02 17:23:58 INFO driver.MahoutDriver: Program took 578 ms
