Hi.

I'm trying to extract expressions from the terms position information, i.e.,
if two words appears frequently side-by-side, then we can consider that the
two words are only one. For instance, 'Object' and 'Oriented' appears
side-by-side 9 times out of 10. It allows us to define a new expression,
'Object_Oriented'.
Does anyone knows the statistical method to detect such expressions ?

Thanks.

Gilles Moyse

-----Message d'origine-----
De : Eric Jain [mailto:[EMAIL PROTECTED]
Envoyé : mardi 21 octobre 2003 09:24
À : Lucene Users List
Objet : Re: Lucene on Windows


> The CVS version of Lucene has a patch that allows one to use a
> 'Compound Index' instead of the traditional one.  This reduces the
> number of open files.  For more info, see/make the Javadocs for
> IndexWriter.

Interesting option. Do you have a rough idea of what the performance
impact of using this setting is?

--
Eric Jain


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to