Hi Teko,
sure - I use Lucene though elasticsearch, but I suppose that doesnt make a
difference in this situation. I needed something like what you were trying
to accomplish - basically to search any substring... wildcarded queries
worked but were kind of slow.
This is my analyzer that works for
True, for the first two use cases, but as I indicated, the third use case is
problematic since the token needs to be split. The n-gram solution does seem
to cover it though, sort of.
The n-gram solution doesn't cover good morning, john or good morning -
john, but that could be handled by
Emanuel Buzek,
Well, I tried using the method 'ShingleFilter' first, and I thought it
worked well, but, at last, it still did not work like I want..
So, I tried use NGram... I created a new analyzer to use it, and, I did a
test... Well, it works, but, I still need do some manually validation to
Wow man!!
Forget what I said before!! I did tries using your method... well, to
generate the index, really, it's still a bit more slow (1/2 minutes more),
but, in query... man, It's work very well, and, fast, very fast!!
Really, here is so fast that what generate the bottleneck, is the write
*Emanuel Buzek,
Can you explain how you use NGram?? Did you create a Analyzer? is it??
Sorry, but, I really don't have a great knowledge about Lucene...
Thank advance!
--
View this message in context:
wow!! Thanks Michael!! It's works perfectly! thanks man!!
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-locate-a-Phrase-inside-text-like-a-Browser-text-searcher-tp4135075p4135449.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
I was trying to solve pretty much the same thing few weeks back and I ended
up using the NGram tokenizer. Although it made my index much larger (the
index grew 15x), the fulltext queries are pretty fast and I don't have to
use wildcards in queries.
ShingleFilter can help with this; it concatenates neighboring tokens.
So a search for good morning john becomes a search for
goodmorning john OR
good morningjohn OR
good morning john
it makes your index much bigger because of all the terms, but you may
find it's worth the cost
-Mike
On
Hi, someone can help me with it??
I need do a search to locate a phrase inside text, but, I need locate this
phrase on texts like that:
'John Mail' - phrase I want locate
' Good Morning John Mail how are you? ' I need find this phrase here
' Good MorningJohn Mail how are you? ' here too
'
try to use the lucene wildcard. *John*Mail*
The analyzer is just how you want the segment terms on your index. the
query parser is how you tokenize the terms that that you want to query
against the index (something like that). But lucene allows you use the wild
card to handle with other cases
The word delimiter filter can help for MorningJohn by setting its option
to split on case change.
You might be able to handle Mailhow using the
DictionaryCompoundWordTokenFilter, but that requires that you create a
complete dictionary of terms that can split off. That's not very practical.
11 matches
Mail list logo