[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based
[ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287401#comment-15287401 ] Joel Bernstein commented on SOLR-9125: -- What I was thinking was to first run the query and get the cardinality. But this is really not fun as the CollapsingQParserPlugin would have to know the main query and all the filter queries. Doesn't sound like it would be fun to write or maintain. > CollapseQParserPlugin allocations are index based, not query based > -- > > Key: SOLR-9125 > URL: https://issues.apache.org/jira/browse/SOLR-9125 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Jeff Wartes >Priority: Minor > Labels: collapsingQParserPlugin > > Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates > space per-query for: > 1 int (doc id) per ordinal > 1 float (score) per ordinal > 1 bit (FixedBitSet) per document in the index > > So the higher the cardinality of the thing you’re grouping on, and the more > documents in the index, the more memory gets consumed per query. Since high > cardinality and large indexes are the use-cases CollapseQParserPlugin was > designed for, I thought I'd point this out. > My real issue is that this does not vary based on the number of results in > the query, either before or after collapsing, so a query that results in one > doc consumes the same amount of memory as one that returns all of them. All > of the Collectors suffer from this to some degree, but I think OrdScore is > the worst offender. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based
[ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287339#comment-15287339 ] Jeff Wartes commented on SOLR-9125: --- Isn't there a chicken-and-egg situation there? You need the set of matching docs to figure out the HLL.cardinality to specify the initial size of the map you're going to save the set of matching docs in? Or maybe collect() would just throw every doc in the FBS, and finish() would do all the finding group heads and collapsing? > CollapseQParserPlugin allocations are index based, not query based > -- > > Key: SOLR-9125 > URL: https://issues.apache.org/jira/browse/SOLR-9125 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Jeff Wartes >Priority: Minor > Labels: collapsingQParserPlugin > > Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates > space per-query for: > 1 int (doc id) per ordinal > 1 float (score) per ordinal > 1 bit (FixedBitSet) per document in the index > > So the higher the cardinality of the thing you’re grouping on, and the more > documents in the index, the more memory gets consumed per query. Since high > cardinality and large indexes are the use-cases CollapseQParserPlugin was > designed for, I thought I'd point this out. > My real issue is that this does not vary based on the number of results in > the query, either before or after collapsing, so a query that results in one > doc consumes the same amount of memory as one that returns all of them. All > of the Collectors suffer from this to some degree, but I think OrdScore is > the worst offender. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based
[ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287265#comment-15287265 ] Joel Bernstein commented on SOLR-9125: -- One approach that might work for switching to primitive maps, would be first to estimate the cardinality of the collapse values in the result set using hyperloglog, and then sizing the primitive map accordingly. But my guess is this approach is going really hurt performance quite a bit. > CollapseQParserPlugin allocations are index based, not query based > -- > > Key: SOLR-9125 > URL: https://issues.apache.org/jira/browse/SOLR-9125 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Jeff Wartes >Priority: Minor > Labels: collapsingQParserPlugin > > Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates > space per-query for: > 1 int (doc id) per ordinal > 1 float (score) per ordinal > 1 bit (FixedBitSet) per document in the index > > So the higher the cardinality of the thing you’re grouping on, and the more > documents in the index, the more memory gets consumed per query. Since high > cardinality and large indexes are the use-cases CollapseQParserPlugin was > designed for, I thought I'd point this out. > My real issue is that this does not vary based on the number of results in > the query, either before or after collapsing, so a query that results in one > doc consumes the same amount of memory as one that returns all of them. All > of the Collectors suffer from this to some degree, but I think OrdScore is > the worst offender. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based
[ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287208#comment-15287208 ] Joel Bernstein commented on SOLR-9125: -- Yeah, the CollapsingQParsePlugin can use a lot of memory. The original design goal was to increase performance for collapsing on high cardinality fields and large result sets, as opposed to large indexes. It was really designed to support fast collapse queries on large e-commerce catalogs which are still typically small compared to other data sets. If we can find a way to maintain the performance and shrink the memory usage this would be a great thing. > CollapseQParserPlugin allocations are index based, not query based > -- > > Key: SOLR-9125 > URL: https://issues.apache.org/jira/browse/SOLR-9125 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Jeff Wartes >Priority: Minor > Labels: collapsingQParserPlugin > > Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates > space per-query for: > 1 int (doc id) per ordinal > 1 float (score) per ordinal > 1 bit (FixedBitSet) per document in the index > > So the higher the cardinality of the thing you’re grouping on, and the more > documents in the index, the more memory gets consumed per query. Since high > cardinality and large indexes are the use-cases CollapseQParserPlugin was > designed for, I thought I'd point this out. > My real issue is that this does not vary based on the number of results in > the query, either before or after collapsing, so a query that results in one > doc consumes the same amount of memory as one that returns all of them. All > of the Collectors suffer from this to some degree, but I think OrdScore is > the worst offender. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9125) CollapseQParserPlugin allocations are index based, not query based
[ https://issues.apache.org/jira/browse/SOLR-9125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15286940#comment-15286940 ] Jeff Wartes commented on SOLR-9125: --- I messed around a little bit, but I don't have a solution for this. I thought I'd file the issue anyway just to shine some light. I had attempted to use CollapseQParserPlugin on a very large index using a collapse on a field whose cardinality was about 1/7th the doc count... it didn't go well. Worse, the issue didn't come up until pretty late in the game, because at low query rate and/or on smaller indexes, the problem isn't evident. I abandoned the attempt. Some stuff I tried: - I thought about replacing the FBS with a DocIdSetBuilder, but DelegatingCollector.finish() gets called twice, and you can't DocIdSetBuilder.build() twice on the same builder. We'd need to save the first build() result and use it to initialize a new builder for the second, but I wasn't convinced I understood the distinction between the two passes. - I did one quick test where I replaced the "ords" and "scores" arrays with an IntIntScatterMap IntFloatScatterMap, thinking those would work better for small result sets. That ended up being worse (from a total allocations standpoint) for the queries I was trying, probably due to the map resizing necessary. It might be possible to set initial size values from statistics and help this case that way. It would also be possible to encode the docId/score into a long and just use one IntLongScatterMap, but I didn't try that. > CollapseQParserPlugin allocations are index based, not query based > -- > > Key: SOLR-9125 > URL: https://issues.apache.org/jira/browse/SOLR-9125 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Jeff Wartes >Priority: Minor > Labels: collapsingQParserPlugin > > Among other things, CollapsingQParserPlugin’s OrdScoreCollector allocates > space per-query for: > 1 int (doc id) per ordinal > 1 float (score) per ordinal > 1 bit (FixedBitSet) per document in the index > > So the higher the cardinality of the thing you’re grouping on, and the more > documents in the index, the more memory gets consumed per query. Since high > cardinality and large indexes are the use-cases CollapseQParserPlugin was > designed for, I thought I'd point this out. > My real issue is that this does not vary based on the number of results in > the query, either before or after collapsing, so a query that results in one > doc consumes the same amount of memory as one that returns all of them. All > of the Collectors suffer from this to some degree, but I think OrdScore is > the worst offender. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org