Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread Emanuel Buzek
Hi Teko, sure - I use Lucene though elasticsearch, but I suppose that doesnt make a difference in this situation. I needed something like what you were trying to accomplish - basically to search any substring... wildcarded queries worked but were kind of slow. This is my analyzer that works for

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread Jack Krupansky
True, for the first two use cases, but as I indicated, the third use case is problematic since the token needs to be split. The n-gram solution does seem to cover it though, sort of. The n-gram solution doesn't cover good morning, john or good morning - john, but that could be handled by

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread teko
Emanuel Buzek, Well, I tried using the method 'ShingleFilter' first, and I thought it worked well, but, at last, it still did not work like I want.. So, I tried use NGram... I created a new analyzer to use it, and, I did a test... Well, it works, but, I still need do some manually validation to

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-16 Thread teko
Wow man!! Forget what I said before!! I did tries using your method... well, to generate the index, really, it's still a bit more slow (1/2 minutes more), but, in query... man, It's work very well, and, fast, very fast!! Really, here is so fast that what generate the bottleneck, is the write

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-15 Thread teko
*Emanuel Buzek, Can you explain how you use NGram?? Did you create a Analyzer? is it?? Sorry, but, I really don't have a great knowledge about Lucene... Thank advance! -- View this message in context:

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-13 Thread teko
wow!! Thanks Michael!! It's works perfectly! thanks man!! -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-locate-a-Phrase-inside-text-like-a-Browser-text-searcher-tp4135075p4135449.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-13 Thread Emanuel Buzek
I was trying to solve pretty much the same thing few weeks back and I ended up using the NGram tokenizer. Although it made my index much larger (the index grew 15x), the fulltext queries are pretty fast and I don't have to use wildcards in queries.

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-13 Thread Michael Sokolov
ShingleFilter can help with this; it concatenates neighboring tokens. So a search for good morning john becomes a search for goodmorning john OR good morningjohn OR good morning john it makes your index much bigger because of all the terms, but you may find it's worth the cost -Mike On

How to locate a Phrase inside text (like a Browser text searcher)

2014-05-11 Thread teko
Hi, someone can help me with it?? I need do a search to locate a phrase inside text, but, I need locate this phrase on texts like that: 'John Mail' - phrase I want locate ' Good Morning John Mail how are you? ' I need find this phrase here ' Good MorningJohn Mail how are you? ' here too '

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-11 Thread Jose Carlos Canova
try to use the lucene wildcard. *John*Mail* The analyzer is just how you want the segment terms on your index. the query parser is how you tokenize the terms that that you want to query against the index (something like that). But lucene allows you use the wild card to handle with other cases

Re: How to locate a Phrase inside text (like a Browser text searcher)

2014-05-11 Thread Jack Krupansky
The word delimiter filter can help for MorningJohn by setting its option to split on case change. You might be able to handle Mailhow using the DictionaryCompoundWordTokenFilter, but that requires that you create a complete dictionary of terms that can split off. That's not very practical.