use the sequence file dumper to inspect the files

bin/mahout seqdumper --help

On Tue, Sep 6, 2011 at 10:03 AM, Walter Chang <[email protected]>wrote:

> i ended up add a default SmartChineseAnalyzer constructor to get around
> with
> the issue. I have another question. Right now, I can see the following
> directories created but it seems to be they are encoded using some binary
> format. Is there any tool to double check the generated contents as well as
> TF-IDF score calculated ?
>
> df-count  dictionary.file-0  frequency.file-0  tfidf-vectors  tf-vectors
>  tokenized-documents  wordcount
>
> Thanks a lot,
>
> Weide
>
> On Mon, Sep 5, 2011 at 9:03 PM, Jake Mannix <[email protected]> wrote:
>
> > On Mon, Sep 5, 2011 at 8:36 PM, Lance Norskog <[email protected]> wrote:
> > >
> > >
> > > A Lucene expert could change SparseVectors to handle this case. (There
> > > might
> > > be other problems.)
> > >
> >
> > I don't think we need a Lucene expert, we just need to change the logic
> of
> > "instantiate
> > Analyzer via no-arg constructor" to "if no-arg constructor exist for the
> > Analyzer, use it,
> > else try the single-arg constructor which takes a LuceneUtil.VERSION as
> the
> > argument".
> > And possibly let the client specify the lucene version (making sure to
> swap
> > out all the
> > lucene jars which might be needed of that exact version) on the command
> > line.
> >
> >  -jake
> >
>

Reply via email to