[jira] [Commented] (SOLR-7954) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query
[ https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15087105#comment-15087105 ] Modassar Ather commented on SOLR-7954: -- {noformat}q=fl1:net*=fl=50=true={!cardinality=1.0}fl{noformat} Above query is returning cardinality around 15 million. It is taking around 4 minutes. Similar response time is seen with different queries which yields high cardinality. Kindly note that the cardinality=1.0 is the desired goal. Here in the above example the fl1 is a text field whereas fl is a docValue enabled non-stroed, non-indexed field. Kindly let me know if such response time is expected or I am missing something about this feature in my query. > ArrayIndexOutOfBoundsException from distributed HLL serialization logic when > using using stats.field={!cardinality=1.0} in a distributed query > -- > > Key: SOLR-7954 > URL: https://issues.apache.org/jira/browse/SOLR-7954 > Project: Solr > Issue Type: Bug >Affects Versions: 5.2.1 > Environment: SolrCloud 4 node cluster. > Ubuntu 12.04 > OS Type 64 bit >Reporter: Modassar Ather >Assignee: Hoss Man > Fix For: 5.4, Trunk > > Attachments: SOLR-7954.patch, SOLR-7954.patch, SOLR-7954.patch > > > User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a > field that has extremely high cardinality on a single shard (example: 150K > unique values) can lead to "ArrayIndexOutOfBoundsException: 3" on the shard > during serialization of the HLL values. > using "cardinality=0.9" (or lower) doesn't produce the same symptoms, > suggesting the problem is specific to large log2m and regwidth values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7954) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query
[ https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715188#comment-14715188 ] ASF subversion and git services commented on SOLR-7954: --- Commit 1697977 from hoss...@apache.org in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1697977 ] SOLR-7954: Fixed an integer overflow bug in the HyperLogLog code used by the 'cardinality' option of stats.field to prevent ArrayIndexOutOfBoundsException in a distributed search when a large precision is selected and a large number of values exist in each shard (merge r1697969) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query -- Key: SOLR-7954 URL: https://issues.apache.org/jira/browse/SOLR-7954 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: SolrCloud 4 node cluster. Ubuntu 12.04 OS Type 64 bit Reporter: Modassar Ather Assignee: Hoss Man Attachments: SOLR-7954.patch, SOLR-7954.patch, SOLR-7954.patch User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a field that has extremely high cardinality on a single shard (example: 150K unique values) can lead to ArrayIndexOutOfBoundsException: 3 on the shard during serialization of the HLL values. using cardinality=0.9 (or lower) doesn't produce the same symptoms, suggesting the problem is specific to large log2m and regwidth values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7954) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query
[ https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14714473#comment-14714473 ] ASF subversion and git services commented on SOLR-7954: --- Commit 1697969 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1697969 ] SOLR-7954: Fixed an integer overflow bug in the HyperLogLog code used by the 'cardinality' option of stats.field to prevent ArrayIndexOutOfBoundsException in a distributed search when a large precision is selected and a large number of values exist in each shard ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query -- Key: SOLR-7954 URL: https://issues.apache.org/jira/browse/SOLR-7954 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: SolrCloud 4 node cluster. Ubuntu 12.04 OS Type 64 bit Reporter: Modassar Ather Assignee: Hoss Man Attachments: SOLR-7954.patch, SOLR-7954.patch, SOLR-7954.patch User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a field that has extremely high cardinality on a single shard (example: 150K unique values) can lead to ArrayIndexOutOfBoundsException: 3 on the shard during serialization of the HLL values. using cardinality=0.9 (or lower) doesn't produce the same symptoms, suggesting the problem is specific to large log2m and regwidth values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7954) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query
[ https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711783#comment-14711783 ] Hoss Man commented on SOLR-7954: bq. Later I indexed 40 documents on which I could not reproduce it. All the shards had around 10 documents each. bq. There are 4 shards with no replica on my test environment. Modassar: as i tried to explain in my earlier comments, the number of shards / documents doesn't really affect the issue -- the root problem has to do with the number of unique _values_ in a single shard which are added to the underlying HyperLogLog data structure and then serialized. Doing more testing where you tweak the routing or doc counts may find _differnet_ bugs, but for this specific bug the core problem is reviewing the HLL serialization code related to the various precision options (which are set based on the cardinality local param) and the number of unique (hashed) values in each HLL. ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query -- Key: SOLR-7954 URL: https://issues.apache.org/jira/browse/SOLR-7954 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: SolrCloud 4 node cluster. Ubuntu 12.04 OS Type 64 bit Reporter: Modassar Ather Assignee: Hoss Man Attachments: SOLR-7954.patch, SOLR-7954.patch User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a field that has extremely high cardinality on a single shard (example: 150K unique values) can lead to ArrayIndexOutOfBoundsException: 3 on the shard during serialization of the HLL values. using cardinality=0.9 (or lower) doesn't produce the same symptoms, suggesting the problem is specific to large log2m and regwidth values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7954) ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query
[ https://issues.apache.org/jira/browse/SOLR-7954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710599#comment-14710599 ] Modassar Ather commented on SOLR-7954: -- To add to the summary and description. I changed the {noformat}doc.addField(colid, val!+i+!-+ref+i);{noformat} to {noformat}doc.addField(colid, val+i+!-+ref+i);{noformat} The documents got distributed to all the nodes. I indexed 1 million documents and was able to reproduce the issue. All the shards had around 20 documents each. Later I indexed 40 documents on which I could not reproduce it. All the shards had around 10 documents each. There are 4 shards with no replica on my test environment. ArrayIndexOutOfBoundsException from distributed HLL serialization logic when using using stats.field={!cardinality=1.0} in a distributed query -- Key: SOLR-7954 URL: https://issues.apache.org/jira/browse/SOLR-7954 Project: Solr Issue Type: Bug Affects Versions: 5.2.1 Environment: SolrCloud 4 node cluster. Ubuntu 12.04 OS Type 64 bit Reporter: Modassar Ather Assignee: Hoss Man Attachments: SOLR-7954.patch User reports indicate that using {{stats.field=\{!cardinality=1.0\}foo}} on a field that has extremely high cardinality on a single shard (example: 150K unique values) can lead to ArrayIndexOutOfBoundsException: 3 on the shard during serialization of the HLL values. using cardinality=0.9 (or lower) doesn't produce the same symptoms, suggesting the problem is specific to large log2m and regwidth values. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org