Re: Classification with data from Lucene

Lance Norskog Tue, 05 Apr 2011 13:19:48 -0700

The Lucene intake does not support searches on the index.

If you can make a copies of the index, here's a trick: delete the
documents you don't want, then optimize the index. You will need a
Lucene program to do this.
Use this to separate the big index into training and test indexes.


On Mon, Apr 4, 2011 at 6:51 PM, David Croley <[email protected]> wrote:
> I have a large Lucene index (with TermFreq vectors). I do not have easy
> access to the original source docs that the index was made from. I have
> identified a set of docs in the index as Category X. Is there a way to
> run Mahout's Bayesian classification algorithm, trained on the docs in
> Category X, on the remaining docs in the index to better indentify
> category matches?
>
>
>
> I have also exported the Lucene data into a Vector file in prep to run
> some clustering experiments (as per the wiki examples) and also wondered
> if that data could be used to feed the CBayes code. From what I can
> tell, the classification code in Mahout takes a completely different
> form of input compared to the clustering algorithms.
>
>
>
> Thanks for any pointers.
>
>
>
>
>
> David Croley
>
> Lead Engineer
>
> RenewData
>
> 512.351.0198 BlackBerry
>
> 512.276.5518 Desk
>
> [email protected]
>
> www.renewdata.com <http://www.renewdata.com/>
>
>
>
> Global in reach. Local in focus.
>
>
>
>
>
> Confidentiality Notice: This electronic communication contained in this 
> e-mail from [email protected] (including any attachments) may contain 
> privileged and/or confidential information. This communication is intended 
> only for the use of indicated e-mail addressees. Please be advised that any 
> disclosure, dissemination, distribution, copying, or other use of this 
> communication or any attached document other than for the purpose intended by 
> the sender is strictly prohibited. If you have received this communication in 
> error, please notify the sender immediately by reply e-mail and promptly 
> destroy all electronic and printed copies of this communication and any 
> attached document. Thank you in advance for your cooperation.
>



-- 
Lance Norskog
[email protected]

Re: Classification with data from Lucene

Reply via email to