it
should be straightforward to identify those phrases and index them as
Tokens. Just remember to set the start and end offset of the Token to
correct values.
You can have a look at this thesis here:
http://asbjorn.fellinghaug.com/filer/master/Master_thesis.pdf
And
://daim.idi.ntnu.no/show.php?type=vedlegg&id=3429
Hope this helps.
Hayes, Peter:
> Thanks for your input. I will try and apply your suggestion.
>
> Thanks,
> Peter
>
> -Original Message-----
> From: Asbjørn A. Fellinghaug [mailto:asbj...@fellinghaug.com]
> Sent: Thu
than get them all. As I remember, WildCardTermEnum
> is
> faster than RegexTermEnum, but don't hold me to that. So I'd try
> WildCardTermEnum
> first, I think you'll find it much more suitable than forming
>
> Best
> Erick
--
Asbjørn A. Fellinghaug
asbj...@fellinghaug.com
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
ates "bigrams" (n-gram of size 2) in my master thesis.
Feel free to download it from this page:
http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/
Also, have a look at the package org.apache.lucene.analysis.ngram:
http://lucene.apache.org/java/2_3_2/api
ch are frequently typed
into your searcher. Typically queries with stopwords is good to use an a
warmup phase.
--
Asbjørn A. Fellinghaug
[EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Also, have a look at
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed, which provides
a range of helping advices in terms of enhanced indexing speed.
--
Asbjørn A. Fellinghaug
[EMAIL PROTECTED]
-
To unsubscribe, e-mail: [
d
how it works.
[1] http://asbjorn.fellinghaug.com/filer/master/Master_thesis.pdf
--
Asbjørn A. Fellinghaug
[EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]