i ended up add a default SmartChineseAnalyzer constructor to get around with the issue. I have another question. Right now, I can see the following directories created but it seems to be they are encoded using some binary format. Is there any tool to double check the generated contents as well as TF-IDF score calculated ?
df-count dictionary.file-0 frequency.file-0 tfidf-vectors tf-vectors tokenized-documents wordcount Thanks a lot, Weide On Mon, Sep 5, 2011 at 9:03 PM, Jake Mannix <[email protected]> wrote: > On Mon, Sep 5, 2011 at 8:36 PM, Lance Norskog <[email protected]> wrote: > > > > > > A Lucene expert could change SparseVectors to handle this case. (There > > might > > be other problems.) > > > > I don't think we need a Lucene expert, we just need to change the logic of > "instantiate > Analyzer via no-arg constructor" to "if no-arg constructor exist for the > Analyzer, use it, > else try the single-arg constructor which takes a LuceneUtil.VERSION as the > argument". > And possibly let the client specify the lucene version (making sure to swap > out all the > lucene jars which might be needed of that exact version) on the command > line. > > -jake >
