Re: StackOverflowError in ControlledRealTimeReopenThread.run

2016-04-28 Thread Michael McCandless
Hmm, the disturbing part of this stack trace is how many nested
IndexReader.reportCloseToParentReaders there are: that's not right.

It looks like you have deeply nested IndexReaders.

Why did you need to use this WrappableSearcherManager?  It seems to disable
an important check from SearcherManager.

Mike McCandless

http://blog.mikemccandless.com

On Wed, Apr 27, 2016 at 11:00 AM, Saša Živkov  wrote:

> We use Lucene 5.3.0 in Gerrit 2.12 [1].
>
> Recently we are seeing the Lucene index writer thread throwing
> StackOverflowError
> like [2]. I included the line numbers from the log file just that you can
> see that it is about 19K lines in the stack trace.
>
> The WrappableSearcherManager manager class [3] was copied from [4].
>
> Do you have an idea what could cause the (almost) infinite recursion?
> What can we do about this issue?
>
>
> [1] https://www.gerritcodereview.com/
> [2]
> 65890 [2016-04-27 13:30:20,599] [NRT open] ERROR
> com.google.gerrit.pgm.Daemon : Thread NRT open threw exception
> 65891 java.lang.StackOverflowError
> 65892 at java.util.WeakHashMap.size(WeakHashMap.java:434)
> 65893 at java.util.WeakHashMap.isEmpty(WeakHashMap.java:445)
> 65894 at
> java.util.WeakHashMap$HashIterator.(WeakHashMap.java:861)
> 65895 at
> java.util.WeakHashMap$KeyIterator.(WeakHashMap.java:919)
> 65896 at
> java.util.WeakHashMap$KeyIterator.(WeakHashMap.java:919)
> 65897 at
> java.util.WeakHashMap$KeySet.iterator(WeakHashMap.java:955)
> 65898 at
> java.util.Collections$SetFromMap.iterator(Collections.java:3904)
> 65899 at
>
> java.util.Collections$SynchronizedCollection.iterator(Collections.java:1632)
> 65900 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:160)
> 65901 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65902 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65903 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65904 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65905 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65906 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65907 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65908 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65909 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 65910 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> .
> 70742 at
>
> org.apache.lucene.index.IndexReader.reportCloseToParentReaders(IndexReader.java:165)
> 70743 at
> org.apache.lucene.index.IndexReader.decRef(IndexReader.java:258)
> 70744 at
>
> org.apache.lucene.index.StandardDirectoryReader.doClose(StandardDirectoryReader.java:359)
> 70745 at
> org.apache.lucene.index.IndexReader.decRef(IndexReader.java:253)
> 70746 at
> org.apache.lucene.index.IndexReader.close(IndexReader.java:403)
> 70747 at
>
> org.apache.lucene.index.FilterDirectoryReader.doClose(FilterDirectoryReader.java:134)
> 70748 at
> org.apache.lucene.index.IndexReader.decRef(IndexReader.java:253)
> 70749 at
> org.apache.lucene.index.IndexReader.close(IndexReader.java:403)
> 70750 at
>
> org.apache.lucene.index.FilterDirectoryReader.doClose(FilterDirectoryReader.java:134)
> 70751 at
> org.apache.lucene.index.IndexReader.decRef(IndexReader.java:253)
> 70752 at
> org.apache.lucene.index.IndexReader.close(IndexReader.java:403)
> 70753 at
>
> org.apache.lucene.index.FilterDirectoryReader.doClose(FilterDirectoryReader.java:134)
> .
> 85306 at
>
> org.apache.lucene.index.FilterDirectoryReader.doClose(FilterDirectoryReader.java:134)
> 85307 at
> org.apache.lucene.index.IndexReader.decRef(IndexReader.java:253)
> 85308 at
>
> com.google.gerrit.lucene.WrappableSearcherManager.decRef(WrappableSearcherManager.java:140)
> 85309 at
>
> com.google.gerrit.lucene.WrappableSearcherManager.decRef(WrappableSearcherManager.java:68)
> 85310 at
>
> org.apache.lucene.search.ReferenceManager.release(ReferenceManager.java:274)
> 85311 at
>
> org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:189)
> 85312 at
>
> org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
> 85313 at
>
> org.apache.lucene.search.ControlledRealTimeReopenThread.run(ControlledRealTimeReopenThread.java:245)
>
> [3]
>
> https://gerrit.googlesource.com/gerrit/+/refs/heads/stable-2.12/gerrit-lucene/src/mai

QueryParser with CustomAnalyzer wrongly uses PatternReplaceCharFilter

2016-04-28 Thread Bahaa Eldesouky
 I am using org.apache.lucene.queryparser.classic.QueryParser in lucene
6.0.0 to parse queries using a CustomAnalyzer as shown below:

public static void testFilmAnalyzer() throws IOException, ParseException {
CustomAnalyzer nameAnalyzer = CustomAnalyzer.builder()
.addCharFilter("patternreplace",
"pattern", "(movie|film|picture).*",
"replacement", "")
.withTokenizer("standard")
.build();

QueryParser qp = new QueryParser("name", nameAnalyzer);
qp.setDefaultOperator(QueryParser.Operator.AND);
String[] strs = {"avatar film fiction", "avatar-film fiction",
"avatar-film-fiction"};

for (String str : strs) {
System.out.println("Analyzing \"" + str + "\":");
showTokens(str, nameAnalyzer);
Query q = qp.parse(str);
System.out.println("Parsed query of \"" + str + "\":");
System.out.println(q + "\n");
}}
private static void showTokens(String text, Analyzer analyzer) throws
IOException {
StringReader reader = new StringReader(text);
TokenStream stream = analyzer.tokenStream("name", reader);
CharTermAttribute term = stream.addAttribute(CharTermAttribute.class);
stream.reset();
while (stream.incrementToken()) {
System.out.print("[" + term.toString() + "]");
}
stream.close();
System.out.println();}




I get the following output, when I invoke testFilmAnalyzer():

Analyzing "avatar film fiction":[avatar]Parsed query of "avatar film
fiction":+name:avatar +name:fiction
Analyzing "avatar-film fiction":[avatar]Parsed query of "avatar-film
fiction":+name:avatar +name:fiction
Analyzing "avatar-film-fiction":[avatar]Parsed query of "avatar-film-fiction":
name:avatar


It seems like the analyzer uses the PatternReplaceCharFilter in its correct
intended order (i.e. before tokenization), while the QueryParser does so
afterwards. Does anyone have an explanation for that? Isn't that a bug?


Storing numeric fields in Apache 6

2016-04-28 Thread j . Pardos
Hello all,

I need to index some numeric fields, search with numeric range queries, and 
store the data to retrieve it afterwards. 
If I understand correctly, the recommended way to do this in Lucene 6 is with 
the DoublePoint/LongPoint/XxxPoint field types. I have already implemented 
this, extending QueryParser for the numeric range queries, but I can't find a 
way to store the data.

For example, for double values, I'm doing:
doc.add(new DoublePoint(name, Double.parseDouble(value)));

DoublePoint doesn't have a "stored" argument in its constructor (as does, for 
example, StringField), or a property to specify it afterwards. 

What's the "right" way to do this?




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Storing numeric fields in Apache 6

2016-04-28 Thread j . Pardos
Hello all,

I need to index some numeric fields, search with numeric range queries, and 
store the data to retrieve it afterwards. 
If I understand correctly, the recommended way to do this in Lucene 6 is with 
the DoublePoint/LongPoint/XxxPoint field types. I have already implemented 
this, extending QueryParser for the numeric range queries, but I can't find a 
way to store the data.

For example, for double values, I'm doing:
doc.add(new DoublePoint(name, Double.parseDouble(value)));

DoublePoint doesn't have a "stored" argument in its constructor (as does, for 
example, StringField), or a property to specify it afterwards. 

What's the "right" way to do this?





-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Storing numeric fields in Apache 6

2016-04-28 Thread Andres de la Peña
If I'm right, you should use a StoredField to save the value, you can give
it the same name as the used for the DoublePoint field.

El jueves, 28 de abril de 2016, j.Pardos  escribió:

> Hello all,
>
> I need to index some numeric fields, search with numeric range queries,
> and store the data to retrieve it afterwards.
> If I understand correctly, the recommended way to do this in Lucene 6 is
> with the DoublePoint/LongPoint/XxxPoint field types. I have already
> implemented this, extending QueryParser for the numeric range queries, but
> I can't find a way to store the data.
>
> For example, for double values, I'm doing:
> doc.add(new DoublePoint(name, Double.parseDouble(value)));
>
> DoublePoint doesn't have a "stored" argument in its constructor (as does,
> for example, StringField), or a property to specify it afterwards.
>
> What's the "right" way to do this?
>
>
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> 
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 
>
>

-- 
Andrés de la Peña

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
*


Re: Storing numeric fields in Apache 6

2016-04-28 Thread Alan Woodward
You should add a StoredField with the same name containing the value:

doc.add(new DoublePoint(name, Double.parseDouble(value));
doc.add(new StoredField(name, Double.parseDouble(value));

Alan Woodward
www.flax.co.uk


On 28 Apr 2016, at 13:10, j.Pardos wrote:

> Hello all,
> 
> I need to index some numeric fields, search with numeric range queries, and 
> store the data to retrieve it afterwards. 
> If I understand correctly, the recommended way to do this in Lucene 6 is with 
> the DoublePoint/LongPoint/XxxPoint field types. I have already implemented 
> this, extending QueryParser for the numeric range queries, but I can't find a 
> way to store the data.
> 
> For example, for double values, I'm doing:
> doc.add(new DoublePoint(name, Double.parseDouble(value)));
> 
> DoublePoint doesn't have a "stored" argument in its constructor (as does, for 
> example, StringField), or a property to specify it afterwards. 
> 
> What's the "right" way to do this?
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
> 



Re: QueryParser with CustomAnalyzer wrongly uses PatternReplaceCharFilter

2016-04-28 Thread Steve Rowe
Classic QueryParser splits on whitespace and then sends the chunks to the 
analyzer one at a time.  See 
.

--
Steve
www.lucidworks.com

> On Apr 28, 2016, at 5:54 AM, Bahaa Eldesouky  wrote:
> 
> I am using org.apache.lucene.queryparser.classic.QueryParser in lucene
> 6.0.0 to parse queries using a CustomAnalyzer as shown below:
> 
> public static void testFilmAnalyzer() throws IOException, ParseException {
>CustomAnalyzer nameAnalyzer = CustomAnalyzer.builder()
>.addCharFilter("patternreplace",
>"pattern", "(movie|film|picture).*",
>"replacement", "")
>.withTokenizer("standard")
>.build();
> 
>QueryParser qp = new QueryParser("name", nameAnalyzer);
>qp.setDefaultOperator(QueryParser.Operator.AND);
>String[] strs = {"avatar film fiction", "avatar-film fiction",
> "avatar-film-fiction"};
> 
>for (String str : strs) {
>System.out.println("Analyzing \"" + str + "\":");
>showTokens(str, nameAnalyzer);
>Query q = qp.parse(str);
>System.out.println("Parsed query of \"" + str + "\":");
>System.out.println(q + "\n");
>}}
> private static void showTokens(String text, Analyzer analyzer) throws
> IOException {
>StringReader reader = new StringReader(text);
>TokenStream stream = analyzer.tokenStream("name", reader);
>CharTermAttribute term = stream.addAttribute(CharTermAttribute.class);
>stream.reset();
>while (stream.incrementToken()) {
>System.out.print("[" + term.toString() + "]");
>}
>stream.close();
>System.out.println();}
> 
> 
> 
> 
> I get the following output, when I invoke testFilmAnalyzer():
> 
> Analyzing "avatar film fiction":[avatar]Parsed query of "avatar film
> fiction":+name:avatar +name:fiction
> Analyzing "avatar-film fiction":[avatar]Parsed query of "avatar-film
> fiction":+name:avatar +name:fiction
> Analyzing "avatar-film-fiction":[avatar]Parsed query of "avatar-film-fiction":
> name:avatar
> 
> 
> It seems like the analyzer uses the PatternReplaceCharFilter in its correct
> intended order (i.e. before tokenization), while the QueryParser does so
> afterwards. Does anyone have an explanation for that? Isn't that a bug?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: QueryParser with CustomAnalyzer wrongly uses PatternReplaceCharFilter

2016-04-28 Thread Uwe Schindler
Hi,

this is a general problem of using Analyzers in combination with QueryParser. 
Query Parsing is done *before* the terms are tokenized: QueryParser uses a 
JavaCC grammar to parse the query. This involves some query-parsing specific 
tokenization. Once the query parser has analyzed the syntax, it sends the 
syntactic parts through the analyzer (unfortunately - for english text - this 
is tokens only).

You have 2 possibilities:

- Move the pattern replacement as a tokenfilter. This is more likely to help 
for query parsing where the tokenization is done by the parser. For your 
example a StopFilter would be good (removes some tokens from a list)
- In many cases people use query parsing when it is not applicable. If your 
users only enter terms but you don't need any syntax then query parsing is the 
wrong thing to do. What you need more is a simplified analysis process that 
just creates a query out of the tokens emitted by the Analyzer. Lucene has the 
QueryBuilder class for that. Query Builder takes an Analyzer and you can pass 
in a string that gets tokenized and converted into a query. You have the option 
to create simple term queries in a booleanquery or alternatively parse them as 
a phrase. If you use this component, the whole analyzer would be used on the 
input string and Analyzer's output used to build the query - without any syntax.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Bahaa Eldesouky [mailto:bahaab...@gmail.com]
> Sent: Thursday, April 28, 2016 11:54 AM
> To: java-user@lucene.apache.org
> Subject: QueryParser with CustomAnalyzer wrongly uses
> PatternReplaceCharFilter
> 
>  I am using org.apache.lucene.queryparser.classic.QueryParser in lucene
> 6.0.0 to parse queries using a CustomAnalyzer as shown below:
> 
> public static void testFilmAnalyzer() throws IOException, ParseException {
> CustomAnalyzer nameAnalyzer = CustomAnalyzer.builder()
> .addCharFilter("patternreplace",
> "pattern", "(movie|film|picture).*",
> "replacement", "")
> .withTokenizer("standard")
> .build();
> 
> QueryParser qp = new QueryParser("name", nameAnalyzer);
> qp.setDefaultOperator(QueryParser.Operator.AND);
> String[] strs = {"avatar film fiction", "avatar-film fiction",
> "avatar-film-fiction"};
> 
> for (String str : strs) {
> System.out.println("Analyzing \"" + str + "\":");
> showTokens(str, nameAnalyzer);
> Query q = qp.parse(str);
> System.out.println("Parsed query of \"" + str + "\":");
> System.out.println(q + "\n");
> }}
> private static void showTokens(String text, Analyzer analyzer) throws
> IOException {
> StringReader reader = new StringReader(text);
> TokenStream stream = analyzer.tokenStream("name", reader);
> CharTermAttribute term = stream.addAttribute(CharTermAttribute.class);
> stream.reset();
> while (stream.incrementToken()) {
> System.out.print("[" + term.toString() + "]");
> }
> stream.close();
> System.out.println();}
> 
> 
> 
> 
> I get the following output, when I invoke testFilmAnalyzer():
> 
> Analyzing "avatar film fiction":[avatar]Parsed query of "avatar film
> fiction":+name:avatar +name:fiction
> Analyzing "avatar-film fiction":[avatar]Parsed query of "avatar-film
> fiction":+name:avatar +name:fiction
> Analyzing "avatar-film-fiction":[avatar]Parsed query of "avatar-film-fiction":
> name:avatar
> 
> 
> It seems like the analyzer uses the PatternReplaceCharFilter in its correct
> intended order (i.e. before tokenization), while the QueryParser does so
> afterwards. Does anyone have an explanation for that? Isn't that a bug?


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Storing numeric fields in Apache 6

2016-04-28 Thread j . Pardos
Thank you very much, and sorry for the double post.

-Mensaje original- 
> De: "Alan Woodward"  
> A: java-user@lucene.apache.org 
> Fecha: 28/04/2016 14:30 
> Asunto: Re: Storing numeric fields in Apache 6 
> 
> You should add a StoredField with the same name containing the value:
> 
> doc.add(new DoublePoint(name, Double.parseDouble(value));
> doc.add(new StoredField(name, Double.parseDouble(value));
> 
> Alan Woodward
> www.flax.co.uk
> 
> 
> On 28 Apr 2016, at 13:10, j.Pardos wrote:
> 
> > Hello all,
> > 
> > I need to index some numeric fields, search with numeric range queries, and 
> > store the data to retrieve it afterwards. 
> > If I understand correctly, the recommended way to do this in Lucene 6 is 
> > with the DoublePoint/LongPoint/XxxPoint field types. I have already 
> > implemented this, extending QueryParser for the numeric range queries, but 
> > I can't find a way to store the data.
> > 
> > For example, for double values, I'm doing:
> > doc.add(new DoublePoint(name, Double.parseDouble(value)));
> > 
> > DoublePoint doesn't have a "stored" argument in its constructor (as does, 
> > for example, StringField), or a property to specify it afterwards. 
> > 
> > What's the "right" way to do this?
> > 
> > 
> > 
> > 
> > -
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >




-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Query Expansion for Synonyms

2016-04-28 Thread Daniel Bigham

I'm investigating various ways of supporting synonyms in Lucene.

One such approach that looks potentially interesting is to do a kind of 
"query expansion".


For example, if the user searches for "us 1888", one might expand the 
query as follows:


SpanNearQuery query =
new SpanNearQuery(
new SpanQuery[]
{
new SpanOrQuery(
new SpanTermQuery(new Term("Plaintext", "us")),
new SpanNearQuery(
new SpanQuery[]
{
new SpanTermQuery(new Term("Plaintext", "united")),
new SpanTermQuery(new Term("Plaintext", "states"))
},
0,
true
)
),
new SpanTermQuery(new Term("Plaintext", "1888"))
},
0,
true
);

A couple of questions:

- Is this approach in use within the community?
- Are there "gotchas" with this approach that make it undesirable?

I've done a few quick tests wrt query performance on a test index and 
found that a query can indeed take 10x longer if enough synonyms are 
used, but if the baseline search time is around 1 ms, then 10 ms is 
still plently fast enough. (that said, my test was on a 70 MB index, so 
my 10 ms might turn into something nasty with a 7 GB index)


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Query Expansion for Synonyms

2016-04-28 Thread Ahmet Arslan
Hi Daniel,

Since you are restricting inOrder=true and proximity=0 in the top level query, 
there is no problem in your particular example.

If you weren't restricting, injecting synonyms with plain OR, sometimes cause 
'query drift': injection/addition of one term changes result list drastically.

When there is a big term statistics (document frequency, collection frequency, 
etc) difference between the injected term and the original term, there can be 
unexpected results.

BlendedTermQuery and SynonymQuery implementations could be used.

Ahmet

On Thursday, April 28, 2016 6:26 PM, Daniel Bigham  wrote:
I'm investigating various ways of supporting synonyms in Lucene.

One such approach that looks potentially interesting is to do a kind of 
"query expansion".

For example, if the user searches for "us 1888", one might expand the 
query as follows:

 SpanNearQuery query =
 new SpanNearQuery(
 new SpanQuery[]
 {
 new SpanOrQuery(
 new SpanTermQuery(new Term("Plaintext", "us")),
 new SpanNearQuery(
 new SpanQuery[]
 {
 new SpanTermQuery(new Term("Plaintext", "united")),
 new SpanTermQuery(new Term("Plaintext", "states"))
 },
 0,
 true
 )
 ),
 new SpanTermQuery(new Term("Plaintext", "1888"))
 },
 0,
 true
 );

A couple of questions:

- Is this approach in use within the community?
- Are there "gotchas" with this approach that make it undesirable?

I've done a few quick tests wrt query performance on a test index and 
found that a query can indeed take 10x longer if enough synonyms are 
used, but if the baseline search time is around 1 ms, then 10 ms is 
still plently fast enough. (that said, my test was on a 70 MB index, so 
my 10 ms might turn into something nasty with a 7 GB index)

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org