How search code files for words which contains a given substrings?

2018-06-26 Thread Gordin, Ira
Hi all, I started to work on project which currently search code files for words which contains a given substrings. Currently it uses WhitespaceTokenizerand use regex query which wraps the searched substring with '.*'. For example, if one search for 'a', the query will be '/.*a.*/'. In this way i

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
Hello, Ira. Note the difference between offset https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/analysis/tokenattributes/OffsetAttribute.html and position https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/analysis/tokenattributes/PositionIncrementAttribute.html in Lucene termin

Efficient way to define large Boolean Occur.FILTER clause in Lucene 6

2018-06-26 Thread Hasenberger, Josef
Hi, I want to filter a result of a query by Long values (applicable for specific field, actually DocValue field) in Lucene 6 (as replacement for Filters which are removed in Lucene 6). The amount of allowed Long values can range from just a few up to hundred thousands. What I do now is to crea

RE: How search code files for words which contains a given substrings?

2018-06-26 Thread Gordin, Ira
Hello Mikhail, I see in the link you sent that PositionIncrementAttribute determines the position of this token relative to the previous Token in a TokenStream, used in phrase searching. I am not in phrase searching. Would you mind to explain how it can help me? Thanks, Ira -Original Messa

Re: How search code files for words which contains a given substrings?

2018-06-26 Thread Mikhail Khludnev
I mean, you'd rather need offsets not positions, but I don't have something definite to suggest. On Tue, Jun 26, 2018 at 1:29 PM Gordin, Ira wrote: > Hello Mikhail, > > I see in the link you sent that PositionIncrementAttribute determines the > position of this token relative to the previous Tok

Re: Lucene same search result for worlds with and without spaces

2018-06-26 Thread Ahmet Arslan
Hi Egorlex, Shingle filter won't turn "similarissues" into "similar issues". But it can do the reverse. It is like a sliding window. Think about what indexed tokens would be if you set token separator to "" Ahmet On Wednesday, June 20, 2018, 12:42:22 PM GMT+3, egorlex wrote: Tha

Re: Efficient way to define large Boolean Occur.FILTER clause in Lucene 6

2018-06-26 Thread Trejkaz
On Tue, Jun 26, 2018 at 7:02 PM, Hasenberger, Josef wrote: > However, I have a feeling that the conversion from Long values to Terms is > rather inefficient for large collections and also uses a lot of memory. > To ease conversion overhead somewhat, I created a class that converts a > Long value d