use the sequence file dumper to inspect the files bin/mahout seqdumper --help
On Tue, Sep 6, 2011 at 10:03 AM, Walter Chang <[email protected]>wrote: > i ended up add a default SmartChineseAnalyzer constructor to get around > with > the issue. I have another question. Right now, I can see the following > directories created but it seems to be they are encoded using some binary > format. Is there any tool to double check the generated contents as well as > TF-IDF score calculated ? > > df-count dictionary.file-0 frequency.file-0 tfidf-vectors tf-vectors > tokenized-documents wordcount > > Thanks a lot, > > Weide > > On Mon, Sep 5, 2011 at 9:03 PM, Jake Mannix <[email protected]> wrote: > > > On Mon, Sep 5, 2011 at 8:36 PM, Lance Norskog <[email protected]> wrote: > > > > > > > > > A Lucene expert could change SparseVectors to handle this case. (There > > > might > > > be other problems.) > > > > > > > I don't think we need a Lucene expert, we just need to change the logic > of > > "instantiate > > Analyzer via no-arg constructor" to "if no-arg constructor exist for the > > Analyzer, use it, > > else try the single-arg constructor which takes a LuceneUtil.VERSION as > the > > argument". > > And possibly let the client specify the lucene version (making sure to > swap > > out all the > > lucene jars which might be needed of that exact version) on the command > > line. > > > > -jake > > >
