from:"Christoph Kaser $JIRA$"

[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2019-03-13 Thread Christoph Kaser (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791424#comment-16791424
 ] 

Christoph Kaser commented on LUCENE-8542:
-

{quote}Right I get how it can help with small slices, but at the same time I'm 
seeing small slices as something that should be avoided in order to limit 
context switching so I don't think we should design for small slices?
{quote}
Small slices are the default: The default implementation of 
IndexSearcher.slices() returns one slice per segment. Since the search runs in 
an Executor, this may not cause a lot of context switching depending on the 
thread pool parameters. But you are right, the default implementation of 
slices() may not be optimal.

 

 

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice that can be 
> used to determine the total count of documents in the LeafSlice
>  * Maybe, in order to remain backwards compatible, it would be possible to 
> introduce this as a new method with a default implementation that calls the 
> old method - otherwise, it probably has to wait for Lucene 8?
>  * This can then be used to cap numHits for each TopScoreDocCollector to the 
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2019-03-12 Thread Christoph Kaser (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790611#comment-16790611
 ] 

Christoph Kaser commented on LUCENE-8542:
-

While it's true the slice size is a bad upper bound, the change does help: As 
you can see in the [table in my 
comment|https://issues.apache.org/jira/browse/LUCENE-8542?focusedCommentId=16704391=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16704391],
 it reduces the heap requirement by 90% in my use case, due to the large number 
of small slices.

Making PriorityQueue growable would certainly be a better solution, however it 
is much harder to do this without affecting the "sane" use case performance.

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice that can be 
> used to determine the total count of documents in the LeafSlice
>  * Maybe, in order to remain backwards compatible, it would be possible to 
> introduce this as a new method with a default implementation that calls the 
> old method - otherwise, it probably has to wait for Lucene 8?
>  * This can then be used to cap numHits for each TopScoreDocCollector to the 
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2019-03-12 Thread Christoph Kaser (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790560#comment-16790560
 ] 

Christoph Kaser commented on LUCENE-8542:
-

That's too bad, given that this is only a minor change to an experimental API 
(and does not cause extra work in the reasonable use case). But I understand 
your reasons.

I may try to build such a collector when I find the time (though I suspect this 
may involve quite a lot of code duplication if no changes to the core should be 
made) - for now we simply limit the amount of concurrent queries with huge 
values of numHits so they fit into the heap.

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice that can be 
> used to determine the total count of documents in the LeafSlice
>  * Maybe, in order to remain backwards compatible, it would be possible to 
> introduce this as a new method with a default implementation that calls the 
> old method - otherwise, it probably has to wait for Lucene 8?
>  * This can then be used to cap numHits for each TopScoreDocCollector to the 
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2019-03-12 Thread Christoph Kaser (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790485#comment-16790485
 ] 

Christoph Kaser commented on LUCENE-8542:
-

Is there anything I can change / add to get this committed? Or do you think it 
makes no sense for the general use case of lucene?

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice that can be 
> used to determine the total count of documents in the LeafSlice
>  * Maybe, in order to remain backwards compatible, it would be possible to 
> introduce this as a new method with a default implementation that calls the 
> old method - otherwise, it probably has to wait for Lucene 8?
>  * This can then be used to cap numHits for each TopScoreDocCollector to the 
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2018-11-30 Thread Christoph Kaser (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704391#comment-16704391
 ] 

Christoph Kaser edited comment on LUCENE-8542 at 11/30/18 8:28 AM:
---

I think it would be nice to have the option to grow the heap dynamically. 
However the way _TopScoreDocCollector_ and _TopDocsCollector_ are currently 
built, for a lucene user that would mean copying the complete source code for 
those classes and adopting them to use a _java.util.PriorityQueue_ (probably 
with worse performance than _org.apache.lucene.util.PriorityQueue_).

This is certainly possible, but would mean a lot of code duplication (from the 
perspective of a lucene user, because the used priority queue can't be changed 
easily),

I think that this patch makes sense anyway: The size of segments has a very 
wide range in a typical index, and usually there are a lot more small segments 
than large ones. Given that the default implementation of 
IndexSearcher.slices() returns one slice per segment, that means a lot of 
wasted memory for all queries that have a _numHits_ greater than the typical 
size of a small segment. I don't think it has any negative impact on queries 
with a small value of numHits, because it only adds one Math.min per segment.

It also helps with my problem: for an index with 28 segments and 13,360,068 
documents and a search with numhits=5,000,000, it makes the difference between 
creating priority queues with a combined size of 140,000,000 vs 13,360,068. As 
you can see in the following table, there are benefits for searches with a more 
reasonable numHits value as well (all against my index):

 
||numHits||Combined size w/o patch||Combined size with patch||
|10,000,000|280,000,000|13,360,068|
|5,000,000|140,000,000|13,360,068|
|1,000,000|28,000,000|6,870,854|
|100,000|2,800,000|1,632,997|
|50,000|1,400,000|1,015,274|
|10,000|280,000|252,528|

 


was (Author: christophk):
I think it would be nice to have the option to grow the heap dynamically. 
However the way _TopScoreDocCollector_ and _TopDocsCollector_ are currently 
built, for a lucene user that would mean copying the complete source code for 
those classes and adopting them to use a _java.util.PriorityQueue_ (probably 
with worse performance than _org.apache.lucene.util.PriorityQueue_).

This is certainly possible, but would mean a lot of code duplication (from the 
perspective of a lucene user, because the used priority queue can't be changed 
easily),

I think that this patch makes sense anyway: The size of segments has a very 
wide range in a typical index, and usually there are a lot more small segments 
than large ones. Given that the default implementation of 
IndexSearcher.slices() returns one slice per segment, that means a lot of 
wasted memory for all queries that have a _numHits_ greater than the typical 
size of a small segment. I don't think it has any negative impact on queries 
with a small value of numHits, because it only adds one Math.min per segment.

It also helps with my problem: for an index with 28 segments and 13,360,068 
documents and a search with numhits=5,000,000, it makes the difference between 
creating priority queues with a combined size of 140,000,000 vs 13,360,068.

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice

[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2018-11-30 Thread Christoph Kaser (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704391#comment-16704391
 ] 

Christoph Kaser commented on LUCENE-8542:
-

I think it would be nice to have the option to grow the heap dynamically. 
However the way _TopScoreDocCollector_ and _TopDocsCollector_ are currently 
built, for a lucene user that would mean copying the complete source code for 
those classes and adopting them to use a _java.util.PriorityQueue_ (probably 
with worse performance than _org.apache.lucene.util.PriorityQueue_).

This is certainly possible, but would mean a lot of code duplication (from the 
perspective of a lucene user, because the used priority queue can't be changed 
easily),

I think that this patch makes sense anyway: The size of segments has a very 
wide range in a typical index, and usually there are a lot more small segments 
than large ones. Given that the default implementation of 
IndexSearcher.slices() returns one slice per segment, that means a lot of 
wasted memory for all queries that have a _numHits_ greater than the typical 
size of a small segment. I don't think it has any negative impact on queries 
with a small value of numHits, because it only adds one Math.min per segment.

It also helps with my problem: for an index with 28 segments and 13,360,068 
documents and a search with numhits=5,000,000, it makes the difference between 
creating priority queues with a combined size of 140,000,000 vs 13,360,068.

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice that can be 
> used to determine the total count of documents in the LeafSlice
>  * Maybe, in order to remain backwards compatible, it would be possible to 
> introduce this as a new method with a default implementation that calls the 
> old method - otherwise, it probably has to wait for Lucene 8?
>  * This can then be used to cap numHits for each TopScoreDocCollector to the 
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2018-11-29 Thread Christoph Kaser (JIRA)



[ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703283#comment-16703283
 ] 

Christoph Kaser commented on LUCENE-8542:
-

I attached a patch - hopefully this shows more clearly what I meant.

Since CollectorManager is marked as experimental, I think it might be possible 
to port this patch against Lucene 7 as well without providing a default 
implementation of the new method and keeping the old method.

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice that can be 
> used to determine the total count of documents in the LeafSlice
>  * Maybe, in order to remain backwards compatible, it would be possible to 
> introduce this as a new method with a default implementation that calls the 
> old method - otherwise, it probably has to wait for Lucene 8?
>  * This can then be used to cap numHits for each TopScoreDocCollector to the 
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2018-11-29 Thread Christoph Kaser (JIRA)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-8542:

Attachment: LUCENE-8542.patch

> Provide the LeafSlice to CollectorManager.newCollector to save memory on 
> small index slices
> ---
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Christoph Kaser
>Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments. 
> When I run a query against this index with a huge number of results requested 
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch 
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use 
> paging and searchAfter, but our architecture does not allow this at the 
> moment.)
> The reason for the huge memory requirement is that the search [will create a 
> TopScoreDocCollector for each 
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
>  each one with numHits = 5 million. This is fine for the large segments, but 
> many of those segments are fairly small and only contain several thousand 
> documents. This wastes a huge amount of memory for queries with large values 
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the 
> following way:
>  * change the method newCollector to accept a parameter LeafSlice that can be 
> used to determine the total count of documents in the LeafSlice
>  * Maybe, in order to remain backwards compatible, it would be possible to 
> introduce this as a new method with a default implementation that calls the 
> old method - otherwise, it probably has to wait for Lucene 8?
>  * This can then be used to cap numHits for each TopScoreDocCollector to the 
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a 
> patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices

2018-10-24 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-8542:
---

 Summary: Provide the LeafSlice to CollectorManager.newCollector to 
save memory on small index slices
 Key: LUCENE-8542
 URL: https://issues.apache.org/jira/browse/LUCENE-8542
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Christoph Kaser


I have an index consisting of 44 million documents spread across 60 segments. 
When I run a query against this index with a huge number of results requested 
(e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch was 
configured to use an ExecutorService.

(I know this kind of query is fairly unusual and it would be better to use 
paging and searchAfter, but our architecture does not allow this at the moment.)

The reason for the huge memory requirement is that the search [will create a 
TopScoreDocCollector for each 
segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
 each one with numHits = 5 million. This is fine for the large segments, but 
many of those segments are fairly small and only contain several thousand 
documents. This wastes a huge amount of memory for queries with large values of 
numHits on indices with many segments.

Therefore, I propose to change the CollectorManager - interface in the 
following way:
 * change the method newCollector to accept a parameter LeafSlice that can be 
used to determine the total count of documents in the LeafSlice
 * Maybe, in order to remain backwards compatible, it would be possible to 
introduce this as a new method with a default implementation that calls the old 
method - otherwise, it probably has to wait for Lucene 8?
 * This can then be used to cap numHits for each TopScoreDocCollector to the 
leafslice-size.

If this is something that would make sense for you, I can try to provide a 
patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-7861) Hidden assumption that return value of IndexSearcher.slices is an array of continous sequential slices of the index

2017-06-01 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-7861:
---

 Summary: Hidden assumption that return value of 
IndexSearcher.slices is an array of continous sequential slices of the index
 Key: LUCENE-7861
 URL: https://issues.apache.org/jira/browse/LUCENE-7861
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 6.5.1, 6.0
Reporter: Christoph Kaser


The IndexSearcher-method 
{code:java}protected LeafSlice[] slices(List leaves){code}
can be overwritten to customize how the index is searched with multipe threads. 
However, the IndexSearcher assumes the result is an ordered array of continuous 
slices of the index. If the result is "interleaved" or unordered, searchAfter 
may skip results.

The issue seems to be how searchAfter works vs how TopDocs.merge works:

searchAfter skips every document with a higher score than the "after" document. 
In case of equal scores, it uses the document id and skips every document with 
a <= document id (see PagingFieldCollector).

TopDocs.merge uses the score to determine which hits should be part of the 
merged TopDocs. In case of equal scores, it uses the shard index (this 
corresponds to the slices the IndexSearcher uses) to break ties (see 
ScoreMergeSortQueue.lessThan)

So if the shards are noncontinuous/unordered, searchAfter uses a different way 
of sorting the documents than TopDocs.merge, and therefore hits are skipped.

On the mailing list, Michael McCandless suggested either improving 
TopDocs.merge to optionally use the docID for tie breaking (optionally as 
apparently the docId is not always global for every call of TopDocs.merge) or 
at least documenting the requirement on the return value of 
IndexSearcher.slices().

In my use case (generating a fixed amount of slices of approximately equal 
size), the requirement of ordered slices will result in a less optimal result - 
but I am not sure whether this has a real impact on performance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter

2017-05-11 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006475#comment-16006475
 ] 

Christoph Kaser commented on LUCENE-7817:
-

Perfect, thank you! :)

> LRUQueryCache.onQueryCache is always called with null as first parameter
> 
>
> Key: LUCENE-7817
> URL: https://issues.apache.org/jira/browse/LUCENE-7817
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: master (7.0), 6.4.1, 6.5.1
>Reporter: Christoph Kaser
> Fix For: master (7.0), 6.6
>
>
> According to the javadocs, LRUQueryCache.onQueryCache can be used to track 
> usage statistics on cached queries. Unfortunately, due to a bug, the query 
> parameter is always passed as null, making the method practically useless.
> This PR fixes the problem:
> https://github.com/apache/lucene-solr/pull/199



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter

2017-05-10 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004418#comment-16004418
 ] 

Christoph Kaser commented on LUCENE-7817:
-

Is there anything else missing I can add? 
If possible (and sensible), i would really like to get this into the next 
lucene version because it causes problems in our code which I solve by manually 
patching the LRUQueryCache.

> LRUQueryCache.onQueryCache is always called with null as first parameter
> 
>
> Key: LUCENE-7817
> URL: https://issues.apache.org/jira/browse/LUCENE-7817
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: master (7.0), 6.4.1, 6.5.1
>Reporter: Christoph Kaser
>
> According to the javadocs, LRUQueryCache.onQueryCache can be used to track 
> usage statistics on cached queries. Unfortunately, due to a bug, the query 
> parameter is always passed as null, making the method practically useless.
> This PR fixes the problem:
> https://github.com/apache/lucene-solr/pull/199



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter

2017-05-05 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998361#comment-15998361
 ] 

Christoph Kaser commented on LUCENE-7817:
-

Thanks for the review! I added a test for nullness to 
TestLRUQueryCache.testFineGrainedStats and pushed it into the PR.

> LRUQueryCache.onQueryCache is always called with null as first parameter
> 
>
> Key: LUCENE-7817
> URL: https://issues.apache.org/jira/browse/LUCENE-7817
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: master (7.0), 6.4.1, 6.5.1
>Reporter: Christoph Kaser
>
> According to the javadocs, LRUQueryCache.onQueryCache can be used to track 
> usage statistics on cached queries. Unfortunately, due to a bug, the query 
> parameter is always passed as null, making the method practically useless.
> This PR fixes the problem:
> https://github.com/apache/lucene-solr/pull/199



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter

2017-05-05 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-7817:
---

 Summary: LRUQueryCache.onQueryCache is always called with null as 
first parameter
 Key: LUCENE-7817
 URL: https://issues.apache.org/jira/browse/LUCENE-7817
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 6.5.1, 6.4.1, master (7.0)
Reporter: Christoph Kaser


According to the javadocs, LRUQueryCache.onQueryCache can be used to track 
usage statistics on cached queries. Unfortunately, due to a bug, the query 
parameter is always passed as null, making the method practically useless.

This PR fixes the problem:
https://github.com/apache/lucene-solr/pull/199



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-6326) MultiCollector does not handle CollectionTerminatedException correctly

2015-09-07 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser resolved LUCENE-6326.
-
   Resolution: Duplicate
Lucene Fields:   (was: New)

> MultiCollector does not handle CollectionTerminatedException correctly
> --
>
> Key: LUCENE-6326
> URL: https://issues.apache.org/jira/browse/LUCENE-6326
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.0
>Reporter: Christoph Kaser
>Priority: Minor
>
> The javadoc of the *collect*-method of LeafCollector states:
> bq. Note: The collection of the current segment can be terminated by throwing 
> a CollectionTerminatedException.
> However, the Multicollector does not catch this exception, so if one of the 
> wrapped collectors terminates the current segment, it is terminated for every 
> collector.
> The same is true for the *getLeafCollector*-method (even though this is not 
> documented in the JavaDoc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6586) There is a typo in GermanStemmer that can lead to wrong stemming

2015-06-26 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602716#comment-14602716
 ] 

Christoph Kaser commented on LUCENE-6586:
-

Hi Michael,

I tried to write a small test case and realized that there is no input that 
leads to a wrong token.
substCount is only used to decide how large the original input was, because 
some suffixes are only stripped if the token has a minimum length.

{code}
if ( ( buffer.length() + substCount  5 ) 
  buffer.substring( buffer.length() - 2, buffer.length() ).equals( nd ) )
{
  buffer.delete( buffer.length() - 2, buffer.length() );
}
{code}

However, every substitution leaves at least one character. For the bug to take 
effect, there has to be a substitution before the one that sets substCount to 2 
(instead of incrementing it by 2).
So we have
- 2 characters that where left by the (at least 2) substitutions
- the suffix  nd 
- substCount, which was set to 2
That sums up to 6 , which is greater than 5

The other conditions that check on substCount work the same, except they check 
for greater than 4.

Therefore, there is no token that triggers any wrong behaviour.

Still, I think the typo should be fixed, because it might be copied to a place 
where it has an effect.

 There is a typo in GermanStemmer that can lead to wrong stemming
 

 Key: LUCENE-6586
 URL: https://issues.apache.org/jira/browse/LUCENE-6586
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 5.2.1
Reporter: Christoph Kaser
Priority: Minor

 There is a small typo in GermanStemmer that leads to a wrong calclulation of 
 the substCount in line 203:
 {code}substCount =+ 2;{code}
 should be
 {code}substCount += 2;{code}
 I created a Pull Request for this some time ago, but it was apprently 
 overlooked:
 https://github.com/apache/lucene-solr/pull/141



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-23 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597661#comment-14597661
 ] 

Christoph Kaser commented on LUCENE-6588:
-

Thank you! :)

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Fix For: 5.3

 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 
 0003-implements-ToChildBlockJoinQuery.explain.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-22 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596057#comment-14596057
 ] 

Christoph Kaser commented on LUCENE-6588:
-

Okay, if you prefer I can change the test to use a FilteredQuery instead of 
deleting child documents

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 
 0003-implements-ToChildBlockJoinQuery.explain.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-22 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595751#comment-14595751
 ] 

Christoph Kaser commented on LUCENE-6588:
-

When I encountered this bug, there was no deleted document in the index - I 
think acceptDocs was set due to a filter. So the bug is relevant whether or not 
deleting single children is a supported use case.
However, the easiest way to reproduce the bug was by deleting child documents, 
so that's what I used.

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 
 0003-implements-ToChildBlockJoinQuery.explain.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-19 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6588:

Attachment: 0003-implements-ToChildBlockJoinQuery.explain.patch

This patch implements ToChildBlockJoinQuery.explain(), which helped finding and 
debugging this issue

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 
 0003-implements-ToChildBlockJoinQuery.explain.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-19 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6588:

Lucene Fields: New,Patch Available  (was: New)
Flags: Patch

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 
 0003-implements-ToChildBlockJoinQuery.explain.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-19 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6588:

External issue URL: https://github.com/apache/lucene-solr/pull/155

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 
 0003-implements-ToChildBlockJoinQuery.explain.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-19 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6588:

Attachment: 0001-Test-score-calculation.patch

Test demonstrating the bug

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-19 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6588:

Attachment: 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch

Bugfix

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-19 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593356#comment-14593356
 ] 

Christoph Kaser edited comment on LUCENE-6588 at 6/19/15 11:54 AM:
---

Patch for the issue


was (Author: christophk):
Bugfix

 ToChildBlockJoinQuery does not calculate parent score if the first child is 
 not in acceptDocs
 -

 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser
 Attachments: 0001-Test-score-calculation.patch, 
 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch


 There is a bug in ToChildBlockJoinQuery that causes the score calculation to 
 be skipped if the first child of a new parent doc is not in acceptDocs.
 I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs

2015-06-19 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-6588:
---

 Summary: ToChildBlockJoinQuery does not calculate parent score if 
the first child is not in acceptDocs
 Key: LUCENE-6588
 URL: https://issues.apache.org/jira/browse/LUCENE-6588
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.2.1
Reporter: Christoph Kaser


There is a bug in ToChildBlockJoinQuery that causes the score calculation to be 
skipped if the first child of a new parent doc is not in acceptDocs.

I will attach test showing the failure and a patch to fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6586) There is a typo in GermanStemmer that can lead to wrong stemming

2015-06-19 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6586:

Summary: There is a typo in GermanStemmer that can lead to wrong stemming  
(was: There is a typo in GermanStemmer that can lead to wrong trimming)

 There is a typo in GermanStemmer that can lead to wrong stemming
 

 Key: LUCENE-6586
 URL: https://issues.apache.org/jira/browse/LUCENE-6586
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 5.2.1
Reporter: Christoph Kaser
Priority: Minor

 There is a small typo in GermanStemmer that leads to a wrong calclulation of 
 the substCount in line 203:
 {code}substCount =+ 2;{code}
 should be
 {code}substCount += 2;{code}
 I created a Pull Request for this some time ago, but it was apprently 
 overlooked:
 https://github.com/apache/lucene-solr/pull/141



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6586) There is a typo in GermanStemmer that can lead to wrong trimming

2015-06-19 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-6586:
---

 Summary: There is a typo in GermanStemmer that can lead to wrong 
trimming
 Key: LUCENE-6586
 URL: https://issues.apache.org/jira/browse/LUCENE-6586
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 5.2.1
Reporter: Christoph Kaser
Priority: Minor


There is a small typo in GermanStemmer that leads to a wrong calclulation of 
the substCount in line 203:

{code}substCount =+ 2;{code}
should be
{code}substCount += 2;{code}

I created a Pull Request for this some time ago, but it was apprently 
overlooked:
https://github.com/apache/lucene-solr/pull/141




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings

2015-03-13 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6358:

Attachment: LUCENE-6358-test.patch

Unit test

 UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when 
 for certain input strings
 --

 Key: LUCENE-6358
 URL: https://issues.apache.org/jira/browse/LUCENE-6358
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 5.0
Reporter: Christoph Kaser
Priority: Minor
 Attachments: LUCENE-6358-test.patch


 The static toLowerCase-method of UnescapedCharSequence does nto account for 
 locales in which the length of the result of String.toLowerCase is not the 
 same as the length of the input string. This causes an 
 ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are 
 not of the same length. 
 (See attached test and patch)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings

2015-03-13 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6358:

Attachment: LUCENE-6358-fix.patch

fix

 UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when 
 for certain input strings
 --

 Key: LUCENE-6358
 URL: https://issues.apache.org/jira/browse/LUCENE-6358
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 5.0
Reporter: Christoph Kaser
Priority: Minor
 Attachments: LUCENE-6358-fix.patch, LUCENE-6358-test.patch


 The static toLowerCase-method of UnescapedCharSequence does nto account for 
 locales in which the length of the result of String.toLowerCase is not the 
 same as the length of the input string. This causes an 
 ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are 
 not of the same length. 
 (See attached test and patch)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for certain input strings

2015-03-13 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6358:

Description: 
The static toLowerCase-method of UnescapedCharSequence does not account for 
locales in which the length of the result of String.toLowerCase is not the same 
as the length of the input string. This causes an 
ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not 
of the same length. 
(See attached test and patch)

  was:
The static toLowerCase-method of UnescapedCharSequence does nto account for 
locales in which the length of the result of String.toLowerCase is not the same 
as the length of the input string. This causes an 
ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not 
of the same length. 
(See attached test and patch)


 UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for 
 certain input strings
 -

 Key: LUCENE-6358
 URL: https://issues.apache.org/jira/browse/LUCENE-6358
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 5.0
Reporter: Christoph Kaser
Priority: Minor
 Attachments: LUCENE-6358-fix.patch, LUCENE-6358-test.patch


 The static toLowerCase-method of UnescapedCharSequence does not account for 
 locales in which the length of the result of String.toLowerCase is not the 
 same as the length of the input string. This causes an 
 ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are 
 not of the same length. 
 (See attached test and patch)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings

2015-03-13 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-6358:
---

 Summary: UnescapedCharSequence.toLowerCase throws 
ArrayIndexOutOfBoundsException when for certain input strings
 Key: LUCENE-6358
 URL: https://issues.apache.org/jira/browse/LUCENE-6358
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 5.0
Reporter: Christoph Kaser
Priority: Minor


The static toLowerCase-method of UnescapedCharSequence does nto account for 
locales in which the length of the result of String.toLowerCase is not the same 
as the length of the input string. This causes an 
ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not 
of the same length. 
(See attached test and patch)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for certain input strings

2015-03-13 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-6358:

Summary: UnescapedCharSequence.toLowerCase throws 
ArrayIndexOutOfBoundsException for certain input strings  (was: 
UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when 
for certain input strings)

 UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for 
 certain input strings
 -

 Key: LUCENE-6358
 URL: https://issues.apache.org/jira/browse/LUCENE-6358
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 5.0
Reporter: Christoph Kaser
Priority: Minor
 Attachments: LUCENE-6358-fix.patch, LUCENE-6358-test.patch


 The static toLowerCase-method of UnescapedCharSequence does nto account for 
 locales in which the length of the result of String.toLowerCase is not the 
 same as the length of the input string. This causes an 
 ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are 
 not of the same length. 
 (See attached test and patch)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6337) ToParentBlockJoinIndexSearcher does not handle CollectionTerminatedException correctly

2015-03-05 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348534#comment-14348534
 ] 

Christoph Kaser commented on LUCENE-6337:
-

We use ToParentBlockJoinCollector in production, so for us it would be a shame 
if it was removed without any replacement

 ToParentBlockJoinIndexSearcher does not handle CollectionTerminatedException 
 correctly
 --

 Key: LUCENE-6337
 URL: https://issues.apache.org/jira/browse/LUCENE-6337
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.0
Reporter: Christoph Kaser

 ToParentBlockJoinIndexSearcher overrides the search-method of IndexSearcher.
 However, unlike IndexSearcher, it does not catch the 
 CollectionTerminatedException, which would allow a Collector to permaturely 
 terminate the collection of a segment.
 This is an issue if this searcher is used for a search with a MultiCollector 
 oder a collector other than ToParentBlockJoinCollector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6337) ToParentBlockJoinIndexSearcher does not handle CollectionTerminatedException correctly

2015-03-04 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-6337:
---

 Summary: ToParentBlockJoinIndexSearcher does not handle 
CollectionTerminatedException correctly
 Key: LUCENE-6337
 URL: https://issues.apache.org/jira/browse/LUCENE-6337
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 5.0
Reporter: Christoph Kaser


ToParentBlockJoinIndexSearcher overrides the search-method of IndexSearcher.

However, unlike IndexSearcher, it does not catch the 
CollectionTerminatedException, which would allow a Collector to permaturely 
terminate the collection of a segment.

This is an issue if this searcher is used for a search with a MultiCollector 
oder a collector other than ToParentBlockJoinCollector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6326) MultiCollector does not handle CollectionTerminatedException correctly

2015-03-02 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-6326:
---

 Summary: MultiCollector does not handle 
CollectionTerminatedException correctly
 Key: LUCENE-6326
 URL: https://issues.apache.org/jira/browse/LUCENE-6326
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.0
Reporter: Christoph Kaser
Priority: Minor


The javadoc of the *collect*-method of LeafCollector states:
bq. Note: The collection of the current segment can be terminated by throwing a 
CollectionTerminatedException.
However, the Multicollector does not catch this exception, so if one of the 
wrapped collectors terminates the current segment, it is terminated for every 
collector.
The same is true for the *getLeafCollector*-method (even though this is not 
documented in the JavaDoc).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5805) QueryNodeImpl.removeFromParent does a lot of work without any effect

2014-07-04 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-5805:


Description: 
The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the 
parent and removes any occurrence of this from the result.

However, since a few releases, _getChildren_ returns a *copy* of the children 
list, so the code has no effect (except creating a copy of the children list 
which will then be thrown away). 
Even worse, since setChildren calls removeFromParent on any previous child, 
setChildren has a complexity of O(n^2) and creates a lot of throw-away copies 
of the children list (for nodes with a lot of children)

{code}
public void removeFromParent() {
if (this.parent != null) {
  ListQueryNode parentChildren = this.parent.getChildren();
  IteratorQueryNode it = parentChildren.iterator();
  
  while (it.hasNext()) {
if (it.next() == this) {
  it.remove();
}
  }
  
  this.parent = null;
}
  }
{code}

  was:
The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the 
parent and removes any occurrence of this from the result.

However, since a few releases, _getChildren_ returns a *copy* of the children 
list, so the code has no effect (except creating a copy of the children list 
which will then be thrown away). 
Even worse, since setChildren calls removeFromParent on any previous child, 
setChildren has a complexity of O(n^2) and creates a lot of throw-away copies 
of the children list (for nodes with a lot of children)

{code]
public void removeFromParent() {
if (this.parent != null) {
  ListQueryNode parentChildren = this.parent.getChildren();
  IteratorQueryNode it = parentChildren.iterator();
  
  while (it.hasNext()) {
if (it.next() == this) {
  it.remove();
}
  }
  
  this.parent = null;
}
  }
{code}


 QueryNodeImpl.removeFromParent does a lot of work without any effect
 

 Key: LUCENE-5805
 URL: https://issues.apache.org/jira/browse/LUCENE-5805
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 4.7.2, 4.9
Reporter: Christoph Kaser

 The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the 
 parent and removes any occurrence of this from the result.
 However, since a few releases, _getChildren_ returns a *copy* of the children 
 list, so the code has no effect (except creating a copy of the children list 
 which will then be thrown away). 
 Even worse, since setChildren calls removeFromParent on any previous child, 
 setChildren has a complexity of O(n^2) and creates a lot of throw-away copies 
 of the children list (for nodes with a lot of children)
 {code}
 public void removeFromParent() {
 if (this.parent != null) {
   ListQueryNode parentChildren = this.parent.getChildren();
   IteratorQueryNode it = parentChildren.iterator();
   
   while (it.hasNext()) {
 if (it.next() == this) {
   it.remove();
 }
   }
   
   this.parent = null;
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5805) QueryNodeImpl.removeFromParent does a lot of work without any effect

2014-07-04 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-5805:
---

 Summary: QueryNodeImpl.removeFromParent does a lot of work without 
any effect
 Key: LUCENE-5805
 URL: https://issues.apache.org/jira/browse/LUCENE-5805
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 4.9, 4.7.2
Reporter: Christoph Kaser


The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the 
parent and removes any occurrence of this from the result.

However, since a few releases, _getChildren_ returns a *copy* of the children 
list, so the code has no effect (except creating a copy of the children list 
which will then be thrown away). 
Even worse, since setChildren calls removeFromParent on any previous child, 
setChildren has a complexity of O(n^2) and creates a lot of throw-away copies 
of the children list (for nodes with a lot of children)

{code]
public void removeFromParent() {
if (this.parent != null) {
  ListQueryNode parentChildren = this.parent.getChildren();
  IteratorQueryNode it = parentChildren.iterator();
  
  while (it.hasNext()) {
if (it.next() == this) {
  it.remove();
}
  }
  
  this.parent = null;
}
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5805) QueryNodeImpl.removeFromParent does a lot of work without any effect

2014-07-04 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-5805:


Description: 
The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the 
parent and removes any occurrence of this from the result.

However, since a few releases, _getChildren_ returns a *copy* of the children 
list, so the code has no effect (except creating a copy of the children list 
which will then be thrown away). 
Even worse, since _setChildren_ calls _removeFromParent_ on any previous child, 
_setChildren_ now has a complexity of O(n^2) and creates a lot of throw-away 
copies of the children list (for nodes with a lot of children)

{code}
public void removeFromParent() {
if (this.parent != null) {
  ListQueryNode parentChildren = this.parent.getChildren();
  IteratorQueryNode it = parentChildren.iterator();
  
  while (it.hasNext()) {
if (it.next() == this) {
  it.remove();
}
  }
  
  this.parent = null;
}
  }
{code}

  was:
The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the 
parent and removes any occurrence of this from the result.

However, since a few releases, _getChildren_ returns a *copy* of the children 
list, so the code has no effect (except creating a copy of the children list 
which will then be thrown away). 
Even worse, since setChildren calls removeFromParent on any previous child, 
setChildren has a complexity of O(n^2) and creates a lot of throw-away copies 
of the children list (for nodes with a lot of children)

{code}
public void removeFromParent() {
if (this.parent != null) {
  ListQueryNode parentChildren = this.parent.getChildren();
  IteratorQueryNode it = parentChildren.iterator();
  
  while (it.hasNext()) {
if (it.next() == this) {
  it.remove();
}
  }
  
  this.parent = null;
}
  }
{code}


 QueryNodeImpl.removeFromParent does a lot of work without any effect
 

 Key: LUCENE-5805
 URL: https://issues.apache.org/jira/browse/LUCENE-5805
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/queryparser
Affects Versions: 4.7.2, 4.9
Reporter: Christoph Kaser

 The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the 
 parent and removes any occurrence of this from the result.
 However, since a few releases, _getChildren_ returns a *copy* of the children 
 list, so the code has no effect (except creating a copy of the children list 
 which will then be thrown away). 
 Even worse, since _setChildren_ calls _removeFromParent_ on any previous 
 child, _setChildren_ now has a complexity of O(n^2) and creates a lot of 
 throw-away copies of the children list (for nodes with a lot of children)
 {code}
 public void removeFromParent() {
 if (this.parent != null) {
   ListQueryNode parentChildren = this.parent.getChildren();
   IteratorQueryNode it = parentChildren.iterator();
   
   while (it.hasNext()) {
 if (it.next() == this) {
   it.remove();
 }
   }
   
   this.parent = null;
 }
   }
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5599) HttpReplicator uses a lot of CPU for large files

2014-04-14 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968239#comment-13968239
 ] 

Christoph Kaser commented on LUCENE-5599:
-

I don't think so. As far as I know, lucene replication and solr replication 
don't share any code at the moment, so this should only affect lucene.

 HttpReplicator uses a lot of CPU for large files
 

 Key: LUCENE-5599
 URL: https://issues.apache.org/jira/browse/LUCENE-5599
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/replicator
Affects Versions: 4.7.1
Reporter: Christoph Kaser
Priority: Minor
 Attachments: HttpClientBase.java.patch


 The method responseInputStream of HttpClientBase wraps an InputStream in 
 order to close it when it is done reading. However, the wrapper only 
 overwrites the single-byte read() method, every other method is delegated to 
 its parent (java.io.InputStream). Therefore, the more efficient read-methods 
 like read(byte[] b) are all implemented by reading one byte after the other.
 In my test, it took 20 minutes to copy  an index of 38 GB. With the provided 
 small patch, this was reduced to less than 10 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5597) HttpReplication currently does not support a tree topology

2014-04-11 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-5597:
---

 Summary: HttpReplication currently does not support a tree topology
 Key: LUCENE-5597
 URL: https://issues.apache.org/jira/browse/LUCENE-5597
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/replicator
Affects Versions: 4.7.1
Reporter: Christoph Kaser
Priority: Minor


At the moment, it is not possible to have a tree topology for replication. 
The reason is that in order to publish a IndexRevision on a non-root, non-leaf 
node, one would need to open an IndexWriter on the index. However, the 
replication directly modifies the index directory without using an IndexWriter, 
so the indexwriter would not see the changes the replication made.

IndexRevision uses the IndexWriter for deleting unused files when the
revision is released, as well as to obtain the SnapshotDeletionPolicy.

In order to implement this,  two things are needed:

* Revision, which doesn't use IndexWriter.
* Replicator which keeps track of how many refs a file has (basically what
IndexFileDeleter does)




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5599) HttpReplicator uses a lot of CPU for large files

2014-04-11 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-5599:
---

 Summary: HttpReplicator uses a lot of CPU for large files
 Key: LUCENE-5599
 URL: https://issues.apache.org/jira/browse/LUCENE-5599
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/replicator
Affects Versions: 4.7.1
Reporter: Christoph Kaser
Priority: Minor
 Attachments: HttpClientBase.java.patch

The method responseInputStream of HttpClientBase wraps an InputStream in order 
to close it when it is done reading. However, the wrapper only overwrites the 
single-byte read() method, every other method is delegated to its parent 
(java.io.InputStream). Therefore, the more efficient read-methods like 
read(byte[] b) are all implemented by reading one byte after the other.

In my test, it took 20 minutes to copy  an index of 38 GB. With the provided 
small patch, this was reduced to less than 10 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5599) HttpReplicator uses a lot of CPU for large files

2014-04-11 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-5599:


Attachment: HttpClientBase.java.patch

 HttpReplicator uses a lot of CPU for large files
 

 Key: LUCENE-5599
 URL: https://issues.apache.org/jira/browse/LUCENE-5599
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/replicator
Affects Versions: 4.7.1
Reporter: Christoph Kaser
Priority: Minor
 Attachments: HttpClientBase.java.patch


 The method responseInputStream of HttpClientBase wraps an InputStream in 
 order to close it when it is done reading. However, the wrapper only 
 overwrites the single-byte read() method, every other method is delegated to 
 its parent (java.io.InputStream). Therefore, the more efficient read-methods 
 like read(byte[] b) are all implemented by reading one byte after the other.
 In my test, it took 20 minutes to copy  an index of 38 GB. With the provided 
 small patch, this was reduced to less than 10 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5600) HttpReplicator does not properly handle server failures

2014-04-11 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-5600:
---

 Summary: HttpReplicator does not properly handle server failures
 Key: LUCENE-5600
 URL: https://issues.apache.org/jira/browse/LUCENE-5600
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/replicator
Affects Versions: 4.7.1
Reporter: Christoph Kaser


When ReplicationClient.updateNow() using an HttpReplicator encounters a server 
error (like Status Code 500), it throws a runtime exception instead of an 
IOException.
Furthermore, it does not close the HttpClient it used, which leads to an Error 
if a BasicClientConnectionManager is used



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5600) HttpReplicator does not properly handle server failures

2014-04-11 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-5600:


Attachment: HttpReplicatorTest.patch

Test

 HttpReplicator does not properly handle server failures
 ---

 Key: LUCENE-5600
 URL: https://issues.apache.org/jira/browse/LUCENE-5600
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/replicator
Affects Versions: 4.7.1
Reporter: Christoph Kaser
 Attachments: HttpReplicatorTest.patch


 When ReplicationClient.updateNow() using an HttpReplicator encounters a 
 server error (like Status Code 500), it throws a runtime exception instead of 
 an IOException.
 Furthermore, it does not close the HttpClient it used, which leads to an 
 Error if a BasicClientConnectionManager is used



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5600) HttpReplicator does not properly handle server failures

2014-04-11 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-5600:


Attachment: HttpClientBase-LUCENE-5600.java.patch

Fix

 HttpReplicator does not properly handle server failures
 ---

 Key: LUCENE-5600
 URL: https://issues.apache.org/jira/browse/LUCENE-5600
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/replicator
Affects Versions: 4.7.1
Reporter: Christoph Kaser
 Attachments: HttpClientBase-LUCENE-5600.java.patch, 
 HttpReplicatorTest.patch


 When ReplicationClient.updateNow() using an HttpReplicator encounters a 
 server error (like Status Code 500), it throws a runtime exception instead of 
 an IOException.
 Furthermore, it does not close the HttpClient it used, which leads to an 
 Error if a BasicClientConnectionManager is used



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4076) When doing nested (index-time) joins, ToParentBlockJoinCollector delivers incomplete information on the grand-children

2014-04-08 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-4076:


Affects Version/s: 4.7.1

 When doing nested (index-time) joins, ToParentBlockJoinCollector delivers 
 incomplete information on the grand-children
 --

 Key: LUCENE-4076
 URL: https://issues.apache.org/jira/browse/LUCENE-4076
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6, 4.7.1
Reporter: Christoph Kaser

 ToParentBlockJoinCollector.getTopGroups does not provide the correct answer 
 when a query with nested ToParentBlockJoinCollectors is performed.
 Given the following example query:
 {code}
 Query grandChildQuery=new TermQuery(new Term(color, red));
 Filter childFilter = new CachingWrapperFilter(new RawTermFilter(new 
 Term(type,child)), DeletesMode.IGNORE);
 ToParentBlockJoinQuery grandchildJoinQuery = new 
 ToParentBlockJoinQuery(grandChildQuery, childFilter, ScoreMode.Max);
 BooleanQuery childQuery= new BooleanQuery();
 childQuery.add(grandchildJoinQuery, Occur.MUST);
 childQuery.add(new TermQuery(new Term(shape, round)), Occur.MUST);
 Filter parentFilter = new CachingWrapperFilter(new RawTermFilter(new 
 Term(type,parent)), DeletesMode.IGNORE);
 ToParentBlockJoinQuery childJoinQuery = new 
 ToParentBlockJoinQuery(childQuery, parentFilter, ScoreMode.Max);
 parentQuery=new BooleanQuery();
 parentQuery.add(childJoinQuery, Occur.MUST);
 parentQuery.add(new TermQuery(new Term(name, test)), Occur.MUST);
 ToParentBlockJoinCollector parentCollector= new 
 ToParentBlockJoinCollector(Sort.RELEVANCE, 30, true, true);
 searcher.search(parentQuery, null, parentCollector);
 {code}
 This produces the correct results:
 {code}
 TopGroupsInteger childGroups = parentCollector.getTopGroups(childJoinQuery, 
 null, 0, 20, 0, false); 
 {code}
 However, this does not:
 {code}
 TopGroupsInteger grandChildGroups = 
 parentCollector.getTopGroups(grandchildJoinQuery, null, 0, 20, 0, false); 
 {code}
 The content of grandChildGroups is broken in the following ways:
 * The groupValue is not the document id of the child document (which is the 
 parent of a grandchild document), but the document id of the _previous_ 
 matching parent document
 * There are only as much GroupDocs as there are parent documents (not child 
 documents), and they only contain the children of the last child document 
 (but, as mentioned before, with the wrong groupValue). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4773) QueryParserBase should not throw ParseException in getPrefixQuery when termStr starts with *

2013-02-12 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-4773:
---

 Summary: QueryParserBase should not throw ParseException in 
getPrefixQuery when termStr starts with *
 Key: LUCENE-4773
 URL: https://issues.apache.org/jira/browse/LUCENE-4773
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.1, 4.0
Reporter: Christoph Kaser
Priority: Minor


The method getPrefixQuery of 
org.apache.lucene.queryparser.classic.QueryParserBase checks for leading 
*-wildcards:

{code:java}
protected Query getPrefixQuery(String field, String termStr) throws 
ParseException
  {
if (!allowLeadingWildcard  termStr.startsWith(*))
  throw new ParseException('*' not allowed as first character in 
PrefixQuery);
...
  }
{code}

However, the passed termStr is already unescaped in handleBareTokenQuery(...):
{code:java}
 q = getPrefixQuery(qfield,
  discardEscapeChar(term.image.substring
  (0, term.image.length()-1)));
{code}

Therefore, a search query like this one results in a ParseException, even 
though the first wildcard is escaped:
{noformat}
title:\*a*
{noformat}

I don't think there is any sense in checking for leading wildcards in 
getPrefixQuery, as the passed termStr is already used literally, without paying 
attention to special characters at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-31 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286389#comment-13286389
 ] 

Christoph Kaser commented on LUCENE-4077:
-

Thank you, now it works perfectly!

 ToParentBlockJoinCollector provides no way to access computed scores and the 
 maxScore
 -

 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Assignee: Michael McCandless
 Attachments: LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch, 
 LUCENE-4077.patch


 The constructor of ToParentBlockJoinCollector allows to turn on the tracking 
 of parent scores and the maximum parent score, however there is no way to 
 access those scores because:
 * maxScore is a private field, and there is no getter
 * TopGroups / GroupDocs does not provide access to the scores for the parent 
 documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight

2012-05-30 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285654#comment-13285654
 ] 

Christoph Kaser commented on LUCENE-4082:
-

Thank you, works perfectly!

 Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
 ---

 Key: LUCENE-4082
 URL: https://issues.apache.org/jira/browse/LUCENE-4082
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Priority: Minor
 Attachments: LUCENE-4082.patch


 At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an 
 UnsupportedOperationException. It would be useful if it could instead return 
 the score of parent document, even if the explanation on how that score was 
 calculated is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-30 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285702#comment-13285702
 ] 

Christoph Kaser commented on LUCENE-4077:
-

Hi Mike,

shouldn't TopGroups.maxScore contain the maximum parent score? If I am not 
mistaken, the way it is built now, it contains the maximum child score over all 
children.

This is due to this line in ToParentBlockJoinCollector.getTopGroups():
{code}
maxScore = Math.max(maxScore, topDocs.getMaxScore());
{code}

I think it should read:
{code}
totalMaxScore = Math.max(totalMaxScore, og.score);
{code}

Otherwise, topGroups.maxScore is different to 
ToParentBlockJoinCollector.getMaxScore()

 ToParentBlockJoinCollector provides no way to access computed scores and the 
 maxScore
 -

 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Assignee: Michael McCandless
 Attachments: LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch


 The constructor of ToParentBlockJoinCollector allows to turn on the tracking 
 of parent scores and the maximum parent score, however there is no way to 
 access those scores because:
 * maxScore is a private field, and there is no getter
 * TopGroups / GroupDocs does not provide access to the scores for the parent 
 documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-29 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284660#comment-13284660
 ] 

Christoph Kaser commented on LUCENE-4077:
-

Hello Mike,

thank you for the patch. There is one small problem: 
ToParentBlockJoinCollector.getMaxScore() always returns _NaN_. This happens 
because maxScore is initialized as 
{code}
private float maxScore = Float.NaN;
{code}
and then updated as
{code}
maxScore = Math.max(score, maxScore);
{code}
which is always _NaN_.

I hope I applied the patch to the correct revision and this is not caused by a 
version conflict.

 ToParentBlockJoinCollector provides no way to access computed scores and the 
 maxScore
 -

 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Assignee: Michael McCandless
 Attachments: LUCENE-4077.patch


 The constructor of ToParentBlockJoinCollector allows to turn on the tracking 
 of parent scores and the maximum parent score, however there is no way to 
 access those scores because:
 * maxScore is a private field, and there is no getter
 * TopGroups / GroupDocs does not provide access to the scores for the parent 
 documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight

2012-05-29 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-4082:
---

 Summary: Implement explain in 
ToParentBlockJoinQuery$BlockJoinWeight
 Key: LUCENE-4082
 URL: https://issues.apache.org/jira/browse/LUCENE-4082
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 3.6, 3.5, 3.4
Reporter: Christoph Kaser


At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an 
UnsupportedOperationException. It would be useful if it could instead return 
the score of parent document, even if the explanation on how that score was 
calculated is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight

2012-05-29 Thread Christoph Kaser (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christoph Kaser updated LUCENE-4082:


Priority: Minor  (was: Major)

 Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
 ---

 Key: LUCENE-4082
 URL: https://issues.apache.org/jira/browse/LUCENE-4082
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Priority: Minor

 At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an 
 UnsupportedOperationException. It would be useful if it could instead return 
 the score of parent document, even if the explanation on how that score was 
 calculated is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-29 Thread Christoph Kaser (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284780#comment-13284780
 ] 

Christoph Kaser commented on LUCENE-4077:
-

This patch works perfectly for my application. Thank you!

 ToParentBlockJoinCollector provides no way to access computed scores and the 
 maxScore
 -

 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.4, 3.5, 3.6
Reporter: Christoph Kaser
Assignee: Michael McCandless
 Attachments: LUCENE-4077.patch, LUCENE-4077.patch


 The constructor of ToParentBlockJoinCollector allows to turn on the tracking 
 of parent scores and the maximum parent score, however there is no way to 
 access those scores because:
 * maxScore is a private field, and there is no getter
 * TopGroups / GroupDocs does not provide access to the scores for the parent 
 documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4076) When doing nested (index-time) joins, ToParentBlockJoinCollector delivers incomplete information on the grand-children

2012-05-25 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-4076:
---

 Summary: When doing nested (index-time) joins, 
ToParentBlockJoinCollector delivers incomplete information on the grand-children
 Key: LUCENE-4076
 URL: https://issues.apache.org/jira/browse/LUCENE-4076
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.6, 3.5, 3.4
Reporter: Christoph Kaser


ToParentBlockJoinCollector.getTopGroups does not provide the correct answer 
when a query with nested ToParentBlockJoinCollectors is performed.

Given the following example query:
{code}
Query grandChildQuery=new TermQuery(new Term(color, red));
Filter childFilter = new CachingWrapperFilter(new RawTermFilter(new 
Term(type,child)), DeletesMode.IGNORE);
ToParentBlockJoinQuery grandchildJoinQuery = new 
ToParentBlockJoinQuery(grandChildQuery, childFilter, ScoreMode.Max);

BooleanQuery childQuery= new BooleanQuery();
childQuery.add(grandchildJoinQuery, Occur.MUST);
childQuery.add(new TermQuery(new Term(shape, round)), Occur.MUST);

Filter parentFilter = new CachingWrapperFilter(new RawTermFilter(new 
Term(type,parent)), DeletesMode.IGNORE);
ToParentBlockJoinQuery childJoinQuery = new ToParentBlockJoinQuery(childQuery, 
parentFilter, ScoreMode.Max);

parentQuery=new BooleanQuery();
parentQuery.add(childJoinQuery, Occur.MUST);
parentQuery.add(new TermQuery(new Term(name, test)), Occur.MUST);

ToParentBlockJoinCollector parentCollector= new 
ToParentBlockJoinCollector(Sort.RELEVANCE, 30, true, true);
searcher.search(parentQuery, null, parentCollector);
{code}

This produces the correct results:
{code}
TopGroupsInteger childGroups = parentCollector.getTopGroups(childJoinQuery, 
null, 0, 20, 0, false); 
{code}

However, this does not:
{code}
TopGroupsInteger grandChildGroups = 
parentCollector.getTopGroups(grandchildJoinQuery, null, 0, 20, 0, false); 
{code}

The content of grandChildGroups is broken in the following ways:
* The groupValue is not the document id of the child document (which is the 
parent of a grandchild document), but the document id of the _previous_ 
matching parent document
* There are only as much GroupDocs as there are parent documents (not child 
documents), and they only contain the children of the last child document (but, 
as mentioned before, with the wrong groupValue). 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore

2012-05-25 Thread Christoph Kaser (JIRA)

Christoph Kaser created LUCENE-4077:
---

 Summary: ToParentBlockJoinCollector provides no way to access 
computed scores and the maxScore
 Key: LUCENE-4077
 URL: https://issues.apache.org/jira/browse/LUCENE-4077
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/join
Affects Versions: 3.6, 3.5, 3.4
Reporter: Christoph Kaser


The constructor of ToParentBlockJoinCollector allows to turn on the tracking of 
parent scores and the maximum parent score, however there is no way to access 
those scores because:
* maxScore is a private field, and there is no getter
* TopGroups / GroupDocs does not provide access to the scores for the parent 
documents, only the children

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

57 matches

Mail list logo