[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791424#comment-16791424 ] Christoph Kaser commented on LUCENE-8542: - {quote}Right I get how it can help with small slices, but at the same time I'm seeing small slices as something that should be avoided in order to limit context switching so I don't think we should design for small slices? {quote} Small slices are the default: The default implementation of IndexSearcher.slices() returns one slice per segment. Since the search runs in an Executor, this may not cause a lot of context switching depending on the thread pool parameters. But you are right, the default implementation of slices() may not be optimal. > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790611#comment-16790611 ] Christoph Kaser commented on LUCENE-8542: - While it's true the slice size is a bad upper bound, the change does help: As you can see in the [table in my comment|https://issues.apache.org/jira/browse/LUCENE-8542?focusedCommentId=16704391=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16704391], it reduces the heap requirement by 90% in my use case, due to the large number of small slices. Making PriorityQueue growable would certainly be a better solution, however it is much harder to do this without affecting the "sane" use case performance. > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790560#comment-16790560 ] Christoph Kaser commented on LUCENE-8542: - That's too bad, given that this is only a minor change to an experimental API (and does not cause extra work in the reasonable use case). But I understand your reasons. I may try to build such a collector when I find the time (though I suspect this may involve quite a lot of code duplication if no changes to the core should be made) - for now we simply limit the amount of concurrent queries with huge values of numHits so they fit into the heap. > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790485#comment-16790485 ] Christoph Kaser commented on LUCENE-8542: - Is there anything I can change / add to get this committed? Or do you think it makes no sense for the general use case of lucene? > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704391#comment-16704391 ] Christoph Kaser edited comment on LUCENE-8542 at 11/30/18 8:28 AM: --- I think it would be nice to have the option to grow the heap dynamically. However the way _TopScoreDocCollector_ and _TopDocsCollector_ are currently built, for a lucene user that would mean copying the complete source code for those classes and adopting them to use a _java.util.PriorityQueue_ (probably with worse performance than _org.apache.lucene.util.PriorityQueue_). This is certainly possible, but would mean a lot of code duplication (from the perspective of a lucene user, because the used priority queue can't be changed easily), I think that this patch makes sense anyway: The size of segments has a very wide range in a typical index, and usually there are a lot more small segments than large ones. Given that the default implementation of IndexSearcher.slices() returns one slice per segment, that means a lot of wasted memory for all queries that have a _numHits_ greater than the typical size of a small segment. I don't think it has any negative impact on queries with a small value of numHits, because it only adds one Math.min per segment. It also helps with my problem: for an index with 28 segments and 13,360,068 documents and a search with numhits=5,000,000, it makes the difference between creating priority queues with a combined size of 140,000,000 vs 13,360,068. As you can see in the following table, there are benefits for searches with a more reasonable numHits value as well (all against my index): ||numHits||Combined size w/o patch||Combined size with patch|| |10,000,000|280,000,000|13,360,068| |5,000,000|140,000,000|13,360,068| |1,000,000|28,000,000|6,870,854| |100,000|2,800,000|1,632,997| |50,000|1,400,000|1,015,274| |10,000|280,000|252,528| was (Author: christophk): I think it would be nice to have the option to grow the heap dynamically. However the way _TopScoreDocCollector_ and _TopDocsCollector_ are currently built, for a lucene user that would mean copying the complete source code for those classes and adopting them to use a _java.util.PriorityQueue_ (probably with worse performance than _org.apache.lucene.util.PriorityQueue_). This is certainly possible, but would mean a lot of code duplication (from the perspective of a lucene user, because the used priority queue can't be changed easily), I think that this patch makes sense anyway: The size of segments has a very wide range in a typical index, and usually there are a lot more small segments than large ones. Given that the default implementation of IndexSearcher.slices() returns one slice per segment, that means a lot of wasted memory for all queries that have a _numHits_ greater than the typical size of a small segment. I don't think it has any negative impact on queries with a small value of numHits, because it only adds one Math.min per segment. It also helps with my problem: for an index with 28 segments and 13,360,068 documents and a search with numhits=5,000,000, it makes the difference between creating priority queues with a combined size of 140,000,000 vs 13,360,068. > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice
[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16704391#comment-16704391 ] Christoph Kaser commented on LUCENE-8542: - I think it would be nice to have the option to grow the heap dynamically. However the way _TopScoreDocCollector_ and _TopDocsCollector_ are currently built, for a lucene user that would mean copying the complete source code for those classes and adopting them to use a _java.util.PriorityQueue_ (probably with worse performance than _org.apache.lucene.util.PriorityQueue_). This is certainly possible, but would mean a lot of code duplication (from the perspective of a lucene user, because the used priority queue can't be changed easily), I think that this patch makes sense anyway: The size of segments has a very wide range in a typical index, and usually there are a lot more small segments than large ones. Given that the default implementation of IndexSearcher.slices() returns one slice per segment, that means a lot of wasted memory for all queries that have a _numHits_ greater than the typical size of a small segment. I don't think it has any negative impact on queries with a small value of numHits, because it only adds one Math.min per segment. It also helps with my problem: for an index with 28 segments and 13,360,068 documents and a search with numhits=5,000,000, it makes the difference between creating priority queues with a combined size of 140,000,000 vs 13,360,068. > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16703283#comment-16703283 ] Christoph Kaser commented on LUCENE-8542: - I attached a patch - hopefully this shows more clearly what I meant. Since CollectorManager is marked as experimental, I think it might be possible to port this patch against Lucene 7 as well without providing a default implementation of the new method and keeping the old method. > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-8542: Attachment: LUCENE-8542.patch > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > --- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Christoph Kaser >Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8542) Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices
Christoph Kaser created LUCENE-8542: --- Summary: Provide the LeafSlice to CollectorManager.newCollector to save memory on small index slices Key: LUCENE-8542 URL: https://issues.apache.org/jira/browse/LUCENE-8542 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Christoph Kaser I have an index consisting of 44 million documents spread across 60 segments. When I run a query against this index with a huge number of results requested (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch was configured to use an ExecutorService. (I know this kind of query is fairly unusual and it would be better to use paging and searchAfter, but our architecture does not allow this at the moment.) The reason for the huge memory requirement is that the search [will create a TopScoreDocCollector for each segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], each one with numHits = 5 million. This is fine for the large segments, but many of those segments are fairly small and only contain several thousand documents. This wastes a huge amount of memory for queries with large values of numHits on indices with many segments. Therefore, I propose to change the CollectorManager - interface in the following way: * change the method newCollector to accept a parameter LeafSlice that can be used to determine the total count of documents in the LeafSlice * Maybe, in order to remain backwards compatible, it would be possible to introduce this as a new method with a default implementation that calls the old method - otherwise, it probably has to wait for Lucene 8? * This can then be used to cap numHits for each TopScoreDocCollector to the leafslice-size. If this is something that would make sense for you, I can try to provide a patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7861) Hidden assumption that return value of IndexSearcher.slices is an array of continous sequential slices of the index
Christoph Kaser created LUCENE-7861: --- Summary: Hidden assumption that return value of IndexSearcher.slices is an array of continous sequential slices of the index Key: LUCENE-7861 URL: https://issues.apache.org/jira/browse/LUCENE-7861 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 6.5.1, 6.0 Reporter: Christoph Kaser The IndexSearcher-method {code:java}protected LeafSlice[] slices(List leaves){code} can be overwritten to customize how the index is searched with multipe threads. However, the IndexSearcher assumes the result is an ordered array of continuous slices of the index. If the result is "interleaved" or unordered, searchAfter may skip results. The issue seems to be how searchAfter works vs how TopDocs.merge works: searchAfter skips every document with a higher score than the "after" document. In case of equal scores, it uses the document id and skips every document with a <= document id (see PagingFieldCollector). TopDocs.merge uses the score to determine which hits should be part of the merged TopDocs. In case of equal scores, it uses the shard index (this corresponds to the slices the IndexSearcher uses) to break ties (see ScoreMergeSortQueue.lessThan) So if the shards are noncontinuous/unordered, searchAfter uses a different way of sorting the documents than TopDocs.merge, and therefore hits are skipped. On the mailing list, Michael McCandless suggested either improving TopDocs.merge to optionally use the docID for tie breaking (optionally as apparently the docId is not always global for every call of TopDocs.merge) or at least documenting the requirement on the return value of IndexSearcher.slices(). In my use case (generating a fixed amount of slices of approximately equal size), the requirement of ordered slices will result in a less optimal result - but I am not sure whether this has a real impact on performance. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter
[ https://issues.apache.org/jira/browse/LUCENE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16006475#comment-16006475 ] Christoph Kaser commented on LUCENE-7817: - Perfect, thank you! :) > LRUQueryCache.onQueryCache is always called with null as first parameter > > > Key: LUCENE-7817 > URL: https://issues.apache.org/jira/browse/LUCENE-7817 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: master (7.0), 6.4.1, 6.5.1 >Reporter: Christoph Kaser > Fix For: master (7.0), 6.6 > > > According to the javadocs, LRUQueryCache.onQueryCache can be used to track > usage statistics on cached queries. Unfortunately, due to a bug, the query > parameter is always passed as null, making the method practically useless. > This PR fixes the problem: > https://github.com/apache/lucene-solr/pull/199 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter
[ https://issues.apache.org/jira/browse/LUCENE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16004418#comment-16004418 ] Christoph Kaser commented on LUCENE-7817: - Is there anything else missing I can add? If possible (and sensible), i would really like to get this into the next lucene version because it causes problems in our code which I solve by manually patching the LRUQueryCache. > LRUQueryCache.onQueryCache is always called with null as first parameter > > > Key: LUCENE-7817 > URL: https://issues.apache.org/jira/browse/LUCENE-7817 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: master (7.0), 6.4.1, 6.5.1 >Reporter: Christoph Kaser > > According to the javadocs, LRUQueryCache.onQueryCache can be used to track > usage statistics on cached queries. Unfortunately, due to a bug, the query > parameter is always passed as null, making the method practically useless. > This PR fixes the problem: > https://github.com/apache/lucene-solr/pull/199 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter
[ https://issues.apache.org/jira/browse/LUCENE-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998361#comment-15998361 ] Christoph Kaser commented on LUCENE-7817: - Thanks for the review! I added a test for nullness to TestLRUQueryCache.testFineGrainedStats and pushed it into the PR. > LRUQueryCache.onQueryCache is always called with null as first parameter > > > Key: LUCENE-7817 > URL: https://issues.apache.org/jira/browse/LUCENE-7817 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: master (7.0), 6.4.1, 6.5.1 >Reporter: Christoph Kaser > > According to the javadocs, LRUQueryCache.onQueryCache can be used to track > usage statistics on cached queries. Unfortunately, due to a bug, the query > parameter is always passed as null, making the method practically useless. > This PR fixes the problem: > https://github.com/apache/lucene-solr/pull/199 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7817) LRUQueryCache.onQueryCache is always called with null as first parameter
Christoph Kaser created LUCENE-7817: --- Summary: LRUQueryCache.onQueryCache is always called with null as first parameter Key: LUCENE-7817 URL: https://issues.apache.org/jira/browse/LUCENE-7817 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 6.5.1, 6.4.1, master (7.0) Reporter: Christoph Kaser According to the javadocs, LRUQueryCache.onQueryCache can be used to track usage statistics on cached queries. Unfortunately, due to a bug, the query parameter is always passed as null, making the method practically useless. This PR fixes the problem: https://github.com/apache/lucene-solr/pull/199 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-6326) MultiCollector does not handle CollectionTerminatedException correctly
[ https://issues.apache.org/jira/browse/LUCENE-6326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser resolved LUCENE-6326. - Resolution: Duplicate Lucene Fields: (was: New) > MultiCollector does not handle CollectionTerminatedException correctly > -- > > Key: LUCENE-6326 > URL: https://issues.apache.org/jira/browse/LUCENE-6326 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 5.0 >Reporter: Christoph Kaser >Priority: Minor > > The javadoc of the *collect*-method of LeafCollector states: > bq. Note: The collection of the current segment can be terminated by throwing > a CollectionTerminatedException. > However, the Multicollector does not catch this exception, so if one of the > wrapped collectors terminates the current segment, it is terminated for every > collector. > The same is true for the *getLeafCollector*-method (even though this is not > documented in the JavaDoc). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6586) There is a typo in GermanStemmer that can lead to wrong stemming
[ https://issues.apache.org/jira/browse/LUCENE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14602716#comment-14602716 ] Christoph Kaser commented on LUCENE-6586: - Hi Michael, I tried to write a small test case and realized that there is no input that leads to a wrong token. substCount is only used to decide how large the original input was, because some suffixes are only stripped if the token has a minimum length. {code} if ( ( buffer.length() + substCount 5 ) buffer.substring( buffer.length() - 2, buffer.length() ).equals( nd ) ) { buffer.delete( buffer.length() - 2, buffer.length() ); } {code} However, every substitution leaves at least one character. For the bug to take effect, there has to be a substitution before the one that sets substCount to 2 (instead of incrementing it by 2). So we have - 2 characters that where left by the (at least 2) substitutions - the suffix nd - substCount, which was set to 2 That sums up to 6 , which is greater than 5 The other conditions that check on substCount work the same, except they check for greater than 4. Therefore, there is no token that triggers any wrong behaviour. Still, I think the typo should be fixed, because it might be copied to a place where it has an effect. There is a typo in GermanStemmer that can lead to wrong stemming Key: LUCENE-6586 URL: https://issues.apache.org/jira/browse/LUCENE-6586 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 5.2.1 Reporter: Christoph Kaser Priority: Minor There is a small typo in GermanStemmer that leads to a wrong calclulation of the substCount in line 203: {code}substCount =+ 2;{code} should be {code}substCount += 2;{code} I created a Pull Request for this some time ago, but it was apprently overlooked: https://github.com/apache/lucene-solr/pull/141 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597661#comment-14597661 ] Christoph Kaser commented on LUCENE-6588: - Thank you! :) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Fix For: 5.3 Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 0003-implements-ToChildBlockJoinQuery.explain.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14596057#comment-14596057 ] Christoph Kaser commented on LUCENE-6588: - Okay, if you prefer I can change the test to use a FilteredQuery instead of deleting child documents ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 0003-implements-ToChildBlockJoinQuery.explain.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595751#comment-14595751 ] Christoph Kaser commented on LUCENE-6588: - When I encountered this bug, there was no deleted document in the index - I think acceptDocs was set due to a filter. So the bug is relevant whether or not deleting single children is a supported use case. However, the easiest way to reproduce the bug was by deleting child documents, so that's what I used. ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 0003-implements-ToChildBlockJoinQuery.explain.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6588: Attachment: 0003-implements-ToChildBlockJoinQuery.explain.patch This patch implements ToChildBlockJoinQuery.explain(), which helped finding and debugging this issue ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 0003-implements-ToChildBlockJoinQuery.explain.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6588: Lucene Fields: New,Patch Available (was: New) Flags: Patch ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 0003-implements-ToChildBlockJoinQuery.explain.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6588: External issue URL: https://github.com/apache/lucene-solr/pull/155 ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch, 0003-implements-ToChildBlockJoinQuery.explain.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6588: Attachment: 0001-Test-score-calculation.patch Test demonstrating the bug ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6588: Attachment: 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch Bugfix ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
[ https://issues.apache.org/jira/browse/LUCENE-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593356#comment-14593356 ] Christoph Kaser edited comment on LUCENE-6588 at 6/19/15 11:54 AM: --- Patch for the issue was (Author: christophk): Bugfix ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs - Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser Attachments: 0001-Test-score-calculation.patch, 0002-ToChildBlockJoinQuery-score-calculation-bugfix.patch There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6588) ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs
Christoph Kaser created LUCENE-6588: --- Summary: ToChildBlockJoinQuery does not calculate parent score if the first child is not in acceptDocs Key: LUCENE-6588 URL: https://issues.apache.org/jira/browse/LUCENE-6588 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.2.1 Reporter: Christoph Kaser There is a bug in ToChildBlockJoinQuery that causes the score calculation to be skipped if the first child of a new parent doc is not in acceptDocs. I will attach test showing the failure and a patch to fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6586) There is a typo in GermanStemmer that can lead to wrong stemming
[ https://issues.apache.org/jira/browse/LUCENE-6586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6586: Summary: There is a typo in GermanStemmer that can lead to wrong stemming (was: There is a typo in GermanStemmer that can lead to wrong trimming) There is a typo in GermanStemmer that can lead to wrong stemming Key: LUCENE-6586 URL: https://issues.apache.org/jira/browse/LUCENE-6586 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 5.2.1 Reporter: Christoph Kaser Priority: Minor There is a small typo in GermanStemmer that leads to a wrong calclulation of the substCount in line 203: {code}substCount =+ 2;{code} should be {code}substCount += 2;{code} I created a Pull Request for this some time ago, but it was apprently overlooked: https://github.com/apache/lucene-solr/pull/141 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6586) There is a typo in GermanStemmer that can lead to wrong trimming
Christoph Kaser created LUCENE-6586: --- Summary: There is a typo in GermanStemmer that can lead to wrong trimming Key: LUCENE-6586 URL: https://issues.apache.org/jira/browse/LUCENE-6586 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 5.2.1 Reporter: Christoph Kaser Priority: Minor There is a small typo in GermanStemmer that leads to a wrong calclulation of the substCount in line 203: {code}substCount =+ 2;{code} should be {code}substCount += 2;{code} I created a Pull Request for this some time ago, but it was apprently overlooked: https://github.com/apache/lucene-solr/pull/141 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings
[ https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6358: Attachment: LUCENE-6358-test.patch Unit test UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings -- Key: LUCENE-6358 URL: https://issues.apache.org/jira/browse/LUCENE-6358 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 5.0 Reporter: Christoph Kaser Priority: Minor Attachments: LUCENE-6358-test.patch The static toLowerCase-method of UnescapedCharSequence does nto account for locales in which the length of the result of String.toLowerCase is not the same as the length of the input string. This causes an ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not of the same length. (See attached test and patch) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings
[ https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6358: Attachment: LUCENE-6358-fix.patch fix UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings -- Key: LUCENE-6358 URL: https://issues.apache.org/jira/browse/LUCENE-6358 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 5.0 Reporter: Christoph Kaser Priority: Minor Attachments: LUCENE-6358-fix.patch, LUCENE-6358-test.patch The static toLowerCase-method of UnescapedCharSequence does nto account for locales in which the length of the result of String.toLowerCase is not the same as the length of the input string. This causes an ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not of the same length. (See attached test and patch) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for certain input strings
[ https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6358: Description: The static toLowerCase-method of UnescapedCharSequence does not account for locales in which the length of the result of String.toLowerCase is not the same as the length of the input string. This causes an ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not of the same length. (See attached test and patch) was: The static toLowerCase-method of UnescapedCharSequence does nto account for locales in which the length of the result of String.toLowerCase is not the same as the length of the input string. This causes an ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not of the same length. (See attached test and patch) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for certain input strings - Key: LUCENE-6358 URL: https://issues.apache.org/jira/browse/LUCENE-6358 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 5.0 Reporter: Christoph Kaser Priority: Minor Attachments: LUCENE-6358-fix.patch, LUCENE-6358-test.patch The static toLowerCase-method of UnescapedCharSequence does not account for locales in which the length of the result of String.toLowerCase is not the same as the length of the input string. This causes an ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not of the same length. (See attached test and patch) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings
Christoph Kaser created LUCENE-6358: --- Summary: UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings Key: LUCENE-6358 URL: https://issues.apache.org/jira/browse/LUCENE-6358 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 5.0 Reporter: Christoph Kaser Priority: Minor The static toLowerCase-method of UnescapedCharSequence does nto account for locales in which the length of the result of String.toLowerCase is not the same as the length of the input string. This causes an ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not of the same length. (See attached test and patch) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-6358) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for certain input strings
[ https://issues.apache.org/jira/browse/LUCENE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-6358: Summary: UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for certain input strings (was: UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException when for certain input strings) UnescapedCharSequence.toLowerCase throws ArrayIndexOutOfBoundsException for certain input strings - Key: LUCENE-6358 URL: https://issues.apache.org/jira/browse/LUCENE-6358 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 5.0 Reporter: Christoph Kaser Priority: Minor Attachments: LUCENE-6358-fix.patch, LUCENE-6358-test.patch The static toLowerCase-method of UnescapedCharSequence does nto account for locales in which the length of the result of String.toLowerCase is not the same as the length of the input string. This causes an ArrayIndexOutOfBoundsException, because wasEscaped and the chars array are not of the same length. (See attached test and patch) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6337) ToParentBlockJoinIndexSearcher does not handle CollectionTerminatedException correctly
[ https://issues.apache.org/jira/browse/LUCENE-6337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348534#comment-14348534 ] Christoph Kaser commented on LUCENE-6337: - We use ToParentBlockJoinCollector in production, so for us it would be a shame if it was removed without any replacement ToParentBlockJoinIndexSearcher does not handle CollectionTerminatedException correctly -- Key: LUCENE-6337 URL: https://issues.apache.org/jira/browse/LUCENE-6337 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.0 Reporter: Christoph Kaser ToParentBlockJoinIndexSearcher overrides the search-method of IndexSearcher. However, unlike IndexSearcher, it does not catch the CollectionTerminatedException, which would allow a Collector to permaturely terminate the collection of a segment. This is an issue if this searcher is used for a search with a MultiCollector oder a collector other than ToParentBlockJoinCollector. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6337) ToParentBlockJoinIndexSearcher does not handle CollectionTerminatedException correctly
Christoph Kaser created LUCENE-6337: --- Summary: ToParentBlockJoinIndexSearcher does not handle CollectionTerminatedException correctly Key: LUCENE-6337 URL: https://issues.apache.org/jira/browse/LUCENE-6337 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 5.0 Reporter: Christoph Kaser ToParentBlockJoinIndexSearcher overrides the search-method of IndexSearcher. However, unlike IndexSearcher, it does not catch the CollectionTerminatedException, which would allow a Collector to permaturely terminate the collection of a segment. This is an issue if this searcher is used for a search with a MultiCollector oder a collector other than ToParentBlockJoinCollector. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-6326) MultiCollector does not handle CollectionTerminatedException correctly
Christoph Kaser created LUCENE-6326: --- Summary: MultiCollector does not handle CollectionTerminatedException correctly Key: LUCENE-6326 URL: https://issues.apache.org/jira/browse/LUCENE-6326 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 5.0 Reporter: Christoph Kaser Priority: Minor The javadoc of the *collect*-method of LeafCollector states: bq. Note: The collection of the current segment can be terminated by throwing a CollectionTerminatedException. However, the Multicollector does not catch this exception, so if one of the wrapped collectors terminates the current segment, it is terminated for every collector. The same is true for the *getLeafCollector*-method (even though this is not documented in the JavaDoc). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5805) QueryNodeImpl.removeFromParent does a lot of work without any effect
[ https://issues.apache.org/jira/browse/LUCENE-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-5805: Description: The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the parent and removes any occurrence of this from the result. However, since a few releases, _getChildren_ returns a *copy* of the children list, so the code has no effect (except creating a copy of the children list which will then be thrown away). Even worse, since setChildren calls removeFromParent on any previous child, setChildren has a complexity of O(n^2) and creates a lot of throw-away copies of the children list (for nodes with a lot of children) {code} public void removeFromParent() { if (this.parent != null) { ListQueryNode parentChildren = this.parent.getChildren(); IteratorQueryNode it = parentChildren.iterator(); while (it.hasNext()) { if (it.next() == this) { it.remove(); } } this.parent = null; } } {code} was: The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the parent and removes any occurrence of this from the result. However, since a few releases, _getChildren_ returns a *copy* of the children list, so the code has no effect (except creating a copy of the children list which will then be thrown away). Even worse, since setChildren calls removeFromParent on any previous child, setChildren has a complexity of O(n^2) and creates a lot of throw-away copies of the children list (for nodes with a lot of children) {code] public void removeFromParent() { if (this.parent != null) { ListQueryNode parentChildren = this.parent.getChildren(); IteratorQueryNode it = parentChildren.iterator(); while (it.hasNext()) { if (it.next() == this) { it.remove(); } } this.parent = null; } } {code} QueryNodeImpl.removeFromParent does a lot of work without any effect Key: LUCENE-5805 URL: https://issues.apache.org/jira/browse/LUCENE-5805 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 4.7.2, 4.9 Reporter: Christoph Kaser The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the parent and removes any occurrence of this from the result. However, since a few releases, _getChildren_ returns a *copy* of the children list, so the code has no effect (except creating a copy of the children list which will then be thrown away). Even worse, since setChildren calls removeFromParent on any previous child, setChildren has a complexity of O(n^2) and creates a lot of throw-away copies of the children list (for nodes with a lot of children) {code} public void removeFromParent() { if (this.parent != null) { ListQueryNode parentChildren = this.parent.getChildren(); IteratorQueryNode it = parentChildren.iterator(); while (it.hasNext()) { if (it.next() == this) { it.remove(); } } this.parent = null; } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5805) QueryNodeImpl.removeFromParent does a lot of work without any effect
Christoph Kaser created LUCENE-5805: --- Summary: QueryNodeImpl.removeFromParent does a lot of work without any effect Key: LUCENE-5805 URL: https://issues.apache.org/jira/browse/LUCENE-5805 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 4.9, 4.7.2 Reporter: Christoph Kaser The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the parent and removes any occurrence of this from the result. However, since a few releases, _getChildren_ returns a *copy* of the children list, so the code has no effect (except creating a copy of the children list which will then be thrown away). Even worse, since setChildren calls removeFromParent on any previous child, setChildren has a complexity of O(n^2) and creates a lot of throw-away copies of the children list (for nodes with a lot of children) {code] public void removeFromParent() { if (this.parent != null) { ListQueryNode parentChildren = this.parent.getChildren(); IteratorQueryNode it = parentChildren.iterator(); while (it.hasNext()) { if (it.next() == this) { it.remove(); } } this.parent = null; } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5805) QueryNodeImpl.removeFromParent does a lot of work without any effect
[ https://issues.apache.org/jira/browse/LUCENE-5805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-5805: Description: The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the parent and removes any occurrence of this from the result. However, since a few releases, _getChildren_ returns a *copy* of the children list, so the code has no effect (except creating a copy of the children list which will then be thrown away). Even worse, since _setChildren_ calls _removeFromParent_ on any previous child, _setChildren_ now has a complexity of O(n^2) and creates a lot of throw-away copies of the children list (for nodes with a lot of children) {code} public void removeFromParent() { if (this.parent != null) { ListQueryNode parentChildren = this.parent.getChildren(); IteratorQueryNode it = parentChildren.iterator(); while (it.hasNext()) { if (it.next() == this) { it.remove(); } } this.parent = null; } } {code} was: The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the parent and removes any occurrence of this from the result. However, since a few releases, _getChildren_ returns a *copy* of the children list, so the code has no effect (except creating a copy of the children list which will then be thrown away). Even worse, since setChildren calls removeFromParent on any previous child, setChildren has a complexity of O(n^2) and creates a lot of throw-away copies of the children list (for nodes with a lot of children) {code} public void removeFromParent() { if (this.parent != null) { ListQueryNode parentChildren = this.parent.getChildren(); IteratorQueryNode it = parentChildren.iterator(); while (it.hasNext()) { if (it.next() == this) { it.remove(); } } this.parent = null; } } {code} QueryNodeImpl.removeFromParent does a lot of work without any effect Key: LUCENE-5805 URL: https://issues.apache.org/jira/browse/LUCENE-5805 Project: Lucene - Core Issue Type: Bug Components: modules/queryparser Affects Versions: 4.7.2, 4.9 Reporter: Christoph Kaser The method _removeFromParent_ of _QueryNodeImpl_, calls _getChildren_ on the parent and removes any occurrence of this from the result. However, since a few releases, _getChildren_ returns a *copy* of the children list, so the code has no effect (except creating a copy of the children list which will then be thrown away). Even worse, since _setChildren_ calls _removeFromParent_ on any previous child, _setChildren_ now has a complexity of O(n^2) and creates a lot of throw-away copies of the children list (for nodes with a lot of children) {code} public void removeFromParent() { if (this.parent != null) { ListQueryNode parentChildren = this.parent.getChildren(); IteratorQueryNode it = parentChildren.iterator(); while (it.hasNext()) { if (it.next() == this) { it.remove(); } } this.parent = null; } } {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5599) HttpReplicator uses a lot of CPU for large files
[ https://issues.apache.org/jira/browse/LUCENE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13968239#comment-13968239 ] Christoph Kaser commented on LUCENE-5599: - I don't think so. As far as I know, lucene replication and solr replication don't share any code at the moment, so this should only affect lucene. HttpReplicator uses a lot of CPU for large files Key: LUCENE-5599 URL: https://issues.apache.org/jira/browse/LUCENE-5599 Project: Lucene - Core Issue Type: Bug Components: modules/replicator Affects Versions: 4.7.1 Reporter: Christoph Kaser Priority: Minor Attachments: HttpClientBase.java.patch The method responseInputStream of HttpClientBase wraps an InputStream in order to close it when it is done reading. However, the wrapper only overwrites the single-byte read() method, every other method is delegated to its parent (java.io.InputStream). Therefore, the more efficient read-methods like read(byte[] b) are all implemented by reading one byte after the other. In my test, it took 20 minutes to copy an index of 38 GB. With the provided small patch, this was reduced to less than 10 minutes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5597) HttpReplication currently does not support a tree topology
Christoph Kaser created LUCENE-5597: --- Summary: HttpReplication currently does not support a tree topology Key: LUCENE-5597 URL: https://issues.apache.org/jira/browse/LUCENE-5597 Project: Lucene - Core Issue Type: Improvement Components: modules/replicator Affects Versions: 4.7.1 Reporter: Christoph Kaser Priority: Minor At the moment, it is not possible to have a tree topology for replication. The reason is that in order to publish a IndexRevision on a non-root, non-leaf node, one would need to open an IndexWriter on the index. However, the replication directly modifies the index directory without using an IndexWriter, so the indexwriter would not see the changes the replication made. IndexRevision uses the IndexWriter for deleting unused files when the revision is released, as well as to obtain the SnapshotDeletionPolicy. In order to implement this, two things are needed: * Revision, which doesn't use IndexWriter. * Replicator which keeps track of how many refs a file has (basically what IndexFileDeleter does) -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5599) HttpReplicator uses a lot of CPU for large files
Christoph Kaser created LUCENE-5599: --- Summary: HttpReplicator uses a lot of CPU for large files Key: LUCENE-5599 URL: https://issues.apache.org/jira/browse/LUCENE-5599 Project: Lucene - Core Issue Type: Bug Components: modules/replicator Affects Versions: 4.7.1 Reporter: Christoph Kaser Priority: Minor Attachments: HttpClientBase.java.patch The method responseInputStream of HttpClientBase wraps an InputStream in order to close it when it is done reading. However, the wrapper only overwrites the single-byte read() method, every other method is delegated to its parent (java.io.InputStream). Therefore, the more efficient read-methods like read(byte[] b) are all implemented by reading one byte after the other. In my test, it took 20 minutes to copy an index of 38 GB. With the provided small patch, this was reduced to less than 10 minutes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5599) HttpReplicator uses a lot of CPU for large files
[ https://issues.apache.org/jira/browse/LUCENE-5599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-5599: Attachment: HttpClientBase.java.patch HttpReplicator uses a lot of CPU for large files Key: LUCENE-5599 URL: https://issues.apache.org/jira/browse/LUCENE-5599 Project: Lucene - Core Issue Type: Bug Components: modules/replicator Affects Versions: 4.7.1 Reporter: Christoph Kaser Priority: Minor Attachments: HttpClientBase.java.patch The method responseInputStream of HttpClientBase wraps an InputStream in order to close it when it is done reading. However, the wrapper only overwrites the single-byte read() method, every other method is delegated to its parent (java.io.InputStream). Therefore, the more efficient read-methods like read(byte[] b) are all implemented by reading one byte after the other. In my test, it took 20 minutes to copy an index of 38 GB. With the provided small patch, this was reduced to less than 10 minutes. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5600) HttpReplicator does not properly handle server failures
Christoph Kaser created LUCENE-5600: --- Summary: HttpReplicator does not properly handle server failures Key: LUCENE-5600 URL: https://issues.apache.org/jira/browse/LUCENE-5600 Project: Lucene - Core Issue Type: Bug Components: modules/replicator Affects Versions: 4.7.1 Reporter: Christoph Kaser When ReplicationClient.updateNow() using an HttpReplicator encounters a server error (like Status Code 500), it throws a runtime exception instead of an IOException. Furthermore, it does not close the HttpClient it used, which leads to an Error if a BasicClientConnectionManager is used -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5600) HttpReplicator does not properly handle server failures
[ https://issues.apache.org/jira/browse/LUCENE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-5600: Attachment: HttpReplicatorTest.patch Test HttpReplicator does not properly handle server failures --- Key: LUCENE-5600 URL: https://issues.apache.org/jira/browse/LUCENE-5600 Project: Lucene - Core Issue Type: Bug Components: modules/replicator Affects Versions: 4.7.1 Reporter: Christoph Kaser Attachments: HttpReplicatorTest.patch When ReplicationClient.updateNow() using an HttpReplicator encounters a server error (like Status Code 500), it throws a runtime exception instead of an IOException. Furthermore, it does not close the HttpClient it used, which leads to an Error if a BasicClientConnectionManager is used -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5600) HttpReplicator does not properly handle server failures
[ https://issues.apache.org/jira/browse/LUCENE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-5600: Attachment: HttpClientBase-LUCENE-5600.java.patch Fix HttpReplicator does not properly handle server failures --- Key: LUCENE-5600 URL: https://issues.apache.org/jira/browse/LUCENE-5600 Project: Lucene - Core Issue Type: Bug Components: modules/replicator Affects Versions: 4.7.1 Reporter: Christoph Kaser Attachments: HttpClientBase-LUCENE-5600.java.patch, HttpReplicatorTest.patch When ReplicationClient.updateNow() using an HttpReplicator encounters a server error (like Status Code 500), it throws a runtime exception instead of an IOException. Furthermore, it does not close the HttpClient it used, which leads to an Error if a BasicClientConnectionManager is used -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4076) When doing nested (index-time) joins, ToParentBlockJoinCollector delivers incomplete information on the grand-children
[ https://issues.apache.org/jira/browse/LUCENE-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-4076: Affects Version/s: 4.7.1 When doing nested (index-time) joins, ToParentBlockJoinCollector delivers incomplete information on the grand-children -- Key: LUCENE-4076 URL: https://issues.apache.org/jira/browse/LUCENE-4076 Project: Lucene - Core Issue Type: Bug Components: modules/join Affects Versions: 3.4, 3.5, 3.6, 4.7.1 Reporter: Christoph Kaser ToParentBlockJoinCollector.getTopGroups does not provide the correct answer when a query with nested ToParentBlockJoinCollectors is performed. Given the following example query: {code} Query grandChildQuery=new TermQuery(new Term(color, red)); Filter childFilter = new CachingWrapperFilter(new RawTermFilter(new Term(type,child)), DeletesMode.IGNORE); ToParentBlockJoinQuery grandchildJoinQuery = new ToParentBlockJoinQuery(grandChildQuery, childFilter, ScoreMode.Max); BooleanQuery childQuery= new BooleanQuery(); childQuery.add(grandchildJoinQuery, Occur.MUST); childQuery.add(new TermQuery(new Term(shape, round)), Occur.MUST); Filter parentFilter = new CachingWrapperFilter(new RawTermFilter(new Term(type,parent)), DeletesMode.IGNORE); ToParentBlockJoinQuery childJoinQuery = new ToParentBlockJoinQuery(childQuery, parentFilter, ScoreMode.Max); parentQuery=new BooleanQuery(); parentQuery.add(childJoinQuery, Occur.MUST); parentQuery.add(new TermQuery(new Term(name, test)), Occur.MUST); ToParentBlockJoinCollector parentCollector= new ToParentBlockJoinCollector(Sort.RELEVANCE, 30, true, true); searcher.search(parentQuery, null, parentCollector); {code} This produces the correct results: {code} TopGroupsInteger childGroups = parentCollector.getTopGroups(childJoinQuery, null, 0, 20, 0, false); {code} However, this does not: {code} TopGroupsInteger grandChildGroups = parentCollector.getTopGroups(grandchildJoinQuery, null, 0, 20, 0, false); {code} The content of grandChildGroups is broken in the following ways: * The groupValue is not the document id of the child document (which is the parent of a grandchild document), but the document id of the _previous_ matching parent document * There are only as much GroupDocs as there are parent documents (not child documents), and they only contain the children of the last child document (but, as mentioned before, with the wrong groupValue). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4773) QueryParserBase should not throw ParseException in getPrefixQuery when termStr starts with *
Christoph Kaser created LUCENE-4773: --- Summary: QueryParserBase should not throw ParseException in getPrefixQuery when termStr starts with * Key: LUCENE-4773 URL: https://issues.apache.org/jira/browse/LUCENE-4773 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 4.1, 4.0 Reporter: Christoph Kaser Priority: Minor The method getPrefixQuery of org.apache.lucene.queryparser.classic.QueryParserBase checks for leading *-wildcards: {code:java} protected Query getPrefixQuery(String field, String termStr) throws ParseException { if (!allowLeadingWildcard termStr.startsWith(*)) throw new ParseException('*' not allowed as first character in PrefixQuery); ... } {code} However, the passed termStr is already unescaped in handleBareTokenQuery(...): {code:java} q = getPrefixQuery(qfield, discardEscapeChar(term.image.substring (0, term.image.length()-1))); {code} Therefore, a search query like this one results in a ParseException, even though the first wildcard is escaped: {noformat} title:\*a* {noformat} I don't think there is any sense in checking for leading wildcards in getPrefixQuery, as the passed termStr is already used literally, without paying attention to special characters at all. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore
[ https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13286389#comment-13286389 ] Christoph Kaser commented on LUCENE-4077: - Thank you, now it works perfectly! ToParentBlockJoinCollector provides no way to access computed scores and the maxScore - Key: LUCENE-4077 URL: https://issues.apache.org/jira/browse/LUCENE-4077 Project: Lucene - Java Issue Type: Bug Components: modules/join Affects Versions: 3.4, 3.5, 3.6 Reporter: Christoph Kaser Assignee: Michael McCandless Attachments: LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch The constructor of ToParentBlockJoinCollector allows to turn on the tracking of parent scores and the maximum parent score, however there is no way to access those scores because: * maxScore is a private field, and there is no getter * TopGroups / GroupDocs does not provide access to the scores for the parent documents, only the children -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
[ https://issues.apache.org/jira/browse/LUCENE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285654#comment-13285654 ] Christoph Kaser commented on LUCENE-4082: - Thank you, works perfectly! Implement explain in ToParentBlockJoinQuery$BlockJoinWeight --- Key: LUCENE-4082 URL: https://issues.apache.org/jira/browse/LUCENE-4082 Project: Lucene - Java Issue Type: Improvement Components: modules/join Affects Versions: 3.4, 3.5, 3.6 Reporter: Christoph Kaser Priority: Minor Attachments: LUCENE-4082.patch At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an UnsupportedOperationException. It would be useful if it could instead return the score of parent document, even if the explanation on how that score was calculated is missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore
[ https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13285702#comment-13285702 ] Christoph Kaser commented on LUCENE-4077: - Hi Mike, shouldn't TopGroups.maxScore contain the maximum parent score? If I am not mistaken, the way it is built now, it contains the maximum child score over all children. This is due to this line in ToParentBlockJoinCollector.getTopGroups(): {code} maxScore = Math.max(maxScore, topDocs.getMaxScore()); {code} I think it should read: {code} totalMaxScore = Math.max(totalMaxScore, og.score); {code} Otherwise, topGroups.maxScore is different to ToParentBlockJoinCollector.getMaxScore() ToParentBlockJoinCollector provides no way to access computed scores and the maxScore - Key: LUCENE-4077 URL: https://issues.apache.org/jira/browse/LUCENE-4077 Project: Lucene - Java Issue Type: Bug Components: modules/join Affects Versions: 3.4, 3.5, 3.6 Reporter: Christoph Kaser Assignee: Michael McCandless Attachments: LUCENE-4077.patch, LUCENE-4077.patch, LUCENE-4077.patch The constructor of ToParentBlockJoinCollector allows to turn on the tracking of parent scores and the maximum parent score, however there is no way to access those scores because: * maxScore is a private field, and there is no getter * TopGroups / GroupDocs does not provide access to the scores for the parent documents, only the children -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore
[ https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284660#comment-13284660 ] Christoph Kaser commented on LUCENE-4077: - Hello Mike, thank you for the patch. There is one small problem: ToParentBlockJoinCollector.getMaxScore() always returns _NaN_. This happens because maxScore is initialized as {code} private float maxScore = Float.NaN; {code} and then updated as {code} maxScore = Math.max(score, maxScore); {code} which is always _NaN_. I hope I applied the patch to the correct revision and this is not caused by a version conflict. ToParentBlockJoinCollector provides no way to access computed scores and the maxScore - Key: LUCENE-4077 URL: https://issues.apache.org/jira/browse/LUCENE-4077 Project: Lucene - Java Issue Type: Bug Components: modules/join Affects Versions: 3.4, 3.5, 3.6 Reporter: Christoph Kaser Assignee: Michael McCandless Attachments: LUCENE-4077.patch The constructor of ToParentBlockJoinCollector allows to turn on the tracking of parent scores and the maximum parent score, however there is no way to access those scores because: * maxScore is a private field, and there is no getter * TopGroups / GroupDocs does not provide access to the scores for the parent documents, only the children -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
Christoph Kaser created LUCENE-4082: --- Summary: Implement explain in ToParentBlockJoinQuery$BlockJoinWeight Key: LUCENE-4082 URL: https://issues.apache.org/jira/browse/LUCENE-4082 Project: Lucene - Java Issue Type: Improvement Components: modules/join Affects Versions: 3.6, 3.5, 3.4 Reporter: Christoph Kaser At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an UnsupportedOperationException. It would be useful if it could instead return the score of parent document, even if the explanation on how that score was calculated is missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4082) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight
[ https://issues.apache.org/jira/browse/LUCENE-4082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Kaser updated LUCENE-4082: Priority: Minor (was: Major) Implement explain in ToParentBlockJoinQuery$BlockJoinWeight --- Key: LUCENE-4082 URL: https://issues.apache.org/jira/browse/LUCENE-4082 Project: Lucene - Java Issue Type: Improvement Components: modules/join Affects Versions: 3.4, 3.5, 3.6 Reporter: Christoph Kaser Priority: Minor At the moment, ToParentBlockJoinQuery$BlockJoinWeight.explain throws an UnsupportedOperationException. It would be useful if it could instead return the score of parent document, even if the explanation on how that score was calculated is missing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore
[ https://issues.apache.org/jira/browse/LUCENE-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284780#comment-13284780 ] Christoph Kaser commented on LUCENE-4077: - This patch works perfectly for my application. Thank you! ToParentBlockJoinCollector provides no way to access computed scores and the maxScore - Key: LUCENE-4077 URL: https://issues.apache.org/jira/browse/LUCENE-4077 Project: Lucene - Java Issue Type: Bug Components: modules/join Affects Versions: 3.4, 3.5, 3.6 Reporter: Christoph Kaser Assignee: Michael McCandless Attachments: LUCENE-4077.patch, LUCENE-4077.patch The constructor of ToParentBlockJoinCollector allows to turn on the tracking of parent scores and the maximum parent score, however there is no way to access those scores because: * maxScore is a private field, and there is no getter * TopGroups / GroupDocs does not provide access to the scores for the parent documents, only the children -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4076) When doing nested (index-time) joins, ToParentBlockJoinCollector delivers incomplete information on the grand-children
Christoph Kaser created LUCENE-4076: --- Summary: When doing nested (index-time) joins, ToParentBlockJoinCollector delivers incomplete information on the grand-children Key: LUCENE-4076 URL: https://issues.apache.org/jira/browse/LUCENE-4076 Project: Lucene - Java Issue Type: Bug Components: modules/join Affects Versions: 3.6, 3.5, 3.4 Reporter: Christoph Kaser ToParentBlockJoinCollector.getTopGroups does not provide the correct answer when a query with nested ToParentBlockJoinCollectors is performed. Given the following example query: {code} Query grandChildQuery=new TermQuery(new Term(color, red)); Filter childFilter = new CachingWrapperFilter(new RawTermFilter(new Term(type,child)), DeletesMode.IGNORE); ToParentBlockJoinQuery grandchildJoinQuery = new ToParentBlockJoinQuery(grandChildQuery, childFilter, ScoreMode.Max); BooleanQuery childQuery= new BooleanQuery(); childQuery.add(grandchildJoinQuery, Occur.MUST); childQuery.add(new TermQuery(new Term(shape, round)), Occur.MUST); Filter parentFilter = new CachingWrapperFilter(new RawTermFilter(new Term(type,parent)), DeletesMode.IGNORE); ToParentBlockJoinQuery childJoinQuery = new ToParentBlockJoinQuery(childQuery, parentFilter, ScoreMode.Max); parentQuery=new BooleanQuery(); parentQuery.add(childJoinQuery, Occur.MUST); parentQuery.add(new TermQuery(new Term(name, test)), Occur.MUST); ToParentBlockJoinCollector parentCollector= new ToParentBlockJoinCollector(Sort.RELEVANCE, 30, true, true); searcher.search(parentQuery, null, parentCollector); {code} This produces the correct results: {code} TopGroupsInteger childGroups = parentCollector.getTopGroups(childJoinQuery, null, 0, 20, 0, false); {code} However, this does not: {code} TopGroupsInteger grandChildGroups = parentCollector.getTopGroups(grandchildJoinQuery, null, 0, 20, 0, false); {code} The content of grandChildGroups is broken in the following ways: * The groupValue is not the document id of the child document (which is the parent of a grandchild document), but the document id of the _previous_ matching parent document * There are only as much GroupDocs as there are parent documents (not child documents), and they only contain the children of the last child document (but, as mentioned before, with the wrong groupValue). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4077) ToParentBlockJoinCollector provides no way to access computed scores and the maxScore
Christoph Kaser created LUCENE-4077: --- Summary: ToParentBlockJoinCollector provides no way to access computed scores and the maxScore Key: LUCENE-4077 URL: https://issues.apache.org/jira/browse/LUCENE-4077 Project: Lucene - Java Issue Type: Bug Components: modules/join Affects Versions: 3.6, 3.5, 3.4 Reporter: Christoph Kaser The constructor of ToParentBlockJoinCollector allows to turn on the tracking of parent scores and the maximum parent score, however there is no way to access those scores because: * maxScore is a private field, and there is no getter * TopGroups / GroupDocs does not provide access to the scores for the parent documents, only the children -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org