[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs
[ https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949918#comment-16949918 ] ASF subversion and git services commented on SOLR-13787: Commit 83c80376fa57f7218b45735dd39316684f68db4c in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=83c8037 ] SOLR-13787: Better error logging > An annotation based system to write v2 only APIs > > > Key: SOLR-13787 > URL: https://issues.apache.org/jira/browse/SOLR-13787 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: master (9.0), 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > example v2 API may look as follows > {code:java} > @V2EndPoint(method = POST, path = "/cluster/package", permission = > PermissionNameProvider.Name.ALL) > public static class ApiTest { > @Command(name = "add") > public void add(SolrQueryRequest req, SolrQueryResponse rsp, AddVersion > addVersion) { > } > @Command(name = "delete") > public void del(SolrQueryRequest req, SolrQueryResponse rsp, List > names) { > } > } > public static class AddVersion { > @JsonProperty(value = "package", required = true) > public String pkg; > @JsonProperty(value = "version", required = true) > public String version; > @JsonProperty(value = "files", required = true) > public List files; > } > {code} > This expects you to already hava a POJO annotated with jackson annotations > > The annotations are: > > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target({ElementType.TYPE}) > public @interface EndPoint { > /**The suoported http methods*/ > SolrRequest.METHOD[] method(); > /**supported paths*/ > String[] path(); > PermissionNameProvider.Name permission(); > } > {code} > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target(ElementType.METHOD) > public @interface Command { >/**if this is not a json command , leave it empty. >* Keep in mind that you cannot have duplicates. >* Only one method per name >* >*/ > String name() default ""; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs
[ https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949917#comment-16949917 ] ASF subversion and git services commented on SOLR-13787: Commit 84126ea0eae452ff3cebbd5eb2b7d94573eb841e in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=84126ea ] SOLR-13787: Better error logging > An annotation based system to write v2 only APIs > > > Key: SOLR-13787 > URL: https://issues.apache.org/jira/browse/SOLR-13787 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: master (9.0), 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > example v2 API may look as follows > {code:java} > @V2EndPoint(method = POST, path = "/cluster/package", permission = > PermissionNameProvider.Name.ALL) > public static class ApiTest { > @Command(name = "add") > public void add(SolrQueryRequest req, SolrQueryResponse rsp, AddVersion > addVersion) { > } > @Command(name = "delete") > public void del(SolrQueryRequest req, SolrQueryResponse rsp, List > names) { > } > } > public static class AddVersion { > @JsonProperty(value = "package", required = true) > public String pkg; > @JsonProperty(value = "version", required = true) > public String version; > @JsonProperty(value = "files", required = true) > public List files; > } > {code} > This expects you to already hava a POJO annotated with jackson annotations > > The annotations are: > > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target({ElementType.TYPE}) > public @interface EndPoint { > /**The suoported http methods*/ > SolrRequest.METHOD[] method(); > /**supported paths*/ > String[] path(); > PermissionNameProvider.Name permission(); > } > {code} > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target(ElementType.METHOD) > public @interface Command { >/**if this is not a json command , leave it empty. >* Keep in mind that you cannot have duplicates. >* Only one method per name >* >*/ > String name() default ""; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-13838) igain query parser generating invalid output
Peter Davie created SOLR-13838: -- Summary: igain query parser generating invalid output Key: SOLR-13838 URL: https://issues.apache.org/jira/browse/SOLR-13838 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: query parsers Affects Versions: 8.2 Environment: The issue is a generic Java defect and therefore will be independent of the operating system or software platform. Reporter: Peter Davie Fix For: 8.3 Attachments: IGainTermsQParserPlugin.java.patch Investigating the output from the "features()" stream source, terms are being returned with NaN for the score_f field: "docs": [ { "featureSet_s": "business", "score_f": "NaN", "term_s": "1,011.15", "idf_d": "-Infinity", "index_i": 1, "id": "business_1" }, { "featureSet_s": "business", "score_f": "NaN", "term_s": "10.3m", "idf_d": "-Infinity", "index_i": 2, "id": "business_2" }, { "featureSet_s": "business", "score_f": "NaN", "term_s": "01", "idf_d": "-Infinity", "index_i": 3, "id": "business_3" },... Looking into{{ org/apache/solr/search/IGainTermsQParserPlugin.java}}, it seems that when a term is not included in the positive or negative documents, the docFreq calculation (docFreq = xc + nc) is 0, which means that subsequent calculations result in NaN (division by 0). Attached is a patch which skips terms for which docFreq is 0 in the finish() method of IGainTermsQParserPlugin and this resolves the issues with NaN scores in the features() output. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949901#comment-16949901 ] Shalin Shekhar Mangar commented on SOLR-13815: -- Probably too late for this comment but... bq. Still... I don't think zookeeper can update multiple znodes at the same time, so we might still have a very small window where we see something like inactive/construction/construction. I'm not sure what the behavior of the current code would be in that case. Actually inactive/construction/construction is impossible because the shard state is coming from the clusterstate which is a single znode updated atomically. So the states will either be active/construction/construction or active/recovery/recovery or inactive/active/active. No other state is possible. > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949902#comment-16949902 ] Shalin Shekhar Mangar commented on SOLR-13815: -- Thanks for investigating and fixing the problem! > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs
[ https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949900#comment-16949900 ] ASF subversion and git services commented on SOLR-13787: Commit 4c67f1645ea4e21d0a2fbaaa084c5433faffc751 in lucene-solr's branch refs/heads/branch_8_3 from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4c67f16 ] SOLR-13787: Better error logging > An annotation based system to write v2 only APIs > > > Key: SOLR-13787 > URL: https://issues.apache.org/jira/browse/SOLR-13787 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: master (9.0), 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > example v2 API may look as follows > {code:java} > @V2EndPoint(method = POST, path = "/cluster/package", permission = > PermissionNameProvider.Name.ALL) > public static class ApiTest { > @Command(name = "add") > public void add(SolrQueryRequest req, SolrQueryResponse rsp, AddVersion > addVersion) { > } > @Command(name = "delete") > public void del(SolrQueryRequest req, SolrQueryResponse rsp, List > names) { > } > } > public static class AddVersion { > @JsonProperty(value = "package", required = true) > public String pkg; > @JsonProperty(value = "version", required = true) > public String version; > @JsonProperty(value = "files", required = true) > public List files; > } > {code} > This expects you to already hava a POJO annotated with jackson annotations > > The annotations are: > > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target({ElementType.TYPE}) > public @interface EndPoint { > /**The suoported http methods*/ > SolrRequest.METHOD[] method(); > /**supported paths*/ > String[] path(); > PermissionNameProvider.Name permission(); > } > {code} > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target(ElementType.METHOD) > public @interface Command { >/**if this is not a json command , leave it empty. >* Keep in mind that you cannot have duplicates. >* Only one method per name >* >*/ > String name() default ""; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13760) Date Math in "start" attribute of routed alias causes exception
[ https://issues.apache.org/jira/browse/SOLR-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949855#comment-16949855 ] Gus Heck commented on SOLR-13760: - I should add that the nature of the failure is one of the assert not seeing a change in zookeeper, despite the test having waited on a watch for changes to aliases.json. Clearly since it passes other times, this is a test timing issue. The seed does not reproduce. > Date Math in "start" attribute of routed alias causes exception > --- > > Key: SOLR-13760 > URL: https://issues.apache.org/jira/browse/SOLR-13760 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3 >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The start parameter (for Time Routed Aliases and 2-Dimensional Routed Aliases > using time components) is meant to accept date math as well as a timestamp. > However it seems that none of the tests actually test this, and my changes > for DRA forgot to account for it in one place, so an exception is thrown > adding a document to an alias with such a configuration. Will add a test and > a fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13760) Date Math in "start" attribute of routed alias causes exception
[ https://issues.apache.org/jira/browse/SOLR-13760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949853#comment-16949853 ] Gus Heck commented on SOLR-13760: - Noticed one test failure on fucit.org, for the test added in this ticket, which is very very irritating since I beasted the test with 40 simultaneous copies for 1000 tests before committing, completely saturating my cpu to the point where the machine was unusable for 2-3 hrs, and yet didn't have one failure. Will keep an eye on it. > Date Math in "start" attribute of routed alias causes exception > --- > > Key: SOLR-13760 > URL: https://issues.apache.org/jira/browse/SOLR-13760 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.3 >Reporter: Gus Heck >Assignee: Gus Heck >Priority: Major > Fix For: 8.3 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The start parameter (for Time Routed Aliases and 2-Dimensional Routed Aliases > using time components) is meant to accept date math as well as a timestamp. > However it seems that none of the tests actually test this, and my changes > for DRA forgot to account for it in one place, so an exception is thrown > adding a document to an alias with such a configuration. Will add a test and > a fix. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949755#comment-16949755 ] ASF subversion and git services commented on SOLR-13815: Commit 503fe7e9a9d5e80890fa7fe63c4fd56a161d0619 in lucene-solr's branch refs/heads/branch_8_3 from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=503fe7e ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949757#comment-16949757 ] ASF subversion and git services commented on SOLR-13815: Commit 503fe7e9a9d5e80890fa7fe63c4fd56a161d0619 in lucene-solr's branch refs/heads/branch_8_3 from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=503fe7e ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949756#comment-16949756 ] ASF subversion and git services commented on SOLR-13815: Commit 503fe7e9a9d5e80890fa7fe63c4fd56a161d0619 in lucene-solr's branch refs/heads/branch_8_3 from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=503fe7e ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949752#comment-16949752 ] ASF subversion and git services commented on SOLR-13815: Commit cc62b9fac2302b8db627490efb88482ff6bbde54 in lucene-solr's branch refs/heads/branch_8x from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cc62b9f ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949753#comment-16949753 ] ASF subversion and git services commented on SOLR-13815: Commit cc62b9fac2302b8db627490efb88482ff6bbde54 in lucene-solr's branch refs/heads/branch_8x from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cc62b9f ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949754#comment-16949754 ] ASF subversion and git services commented on SOLR-13815: Commit cc62b9fac2302b8db627490efb88482ff6bbde54 in lucene-solr's branch refs/heads/branch_8x from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cc62b9f ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949720#comment-16949720 ] ASF subversion and git services commented on SOLR-13815: Commit a057b0d159f669d28565f48c3ee2bee76ab3d821 in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a057b0d ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949721#comment-16949721 ] ASF subversion and git services commented on SOLR-13815: Commit a057b0d159f669d28565f48c3ee2bee76ab3d821 in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a057b0d ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949719#comment-16949719 ] ASF subversion and git services commented on SOLR-13815: Commit a057b0d159f669d28565f48c3ee2bee76ab3d821 in lucene-solr's branch refs/heads/master from Yonik Seeley [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a057b0d ] SOLR-13815: fix live split data loss due to cluster state change between checking current shard state and getting list of subShards (#920) * SOLR-13815: add simple live split test to help debugging possible issue * SOLR-13815: fix live split data loss due to cluster state change berween checking current shard state and getting list of subShards > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 20m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] yonik merged pull request #920: SOLR-13815: add simple live split test to help debugging possible issue
yonik merged pull request #920: SOLR-13815: add simple live split test to help debugging possible issue URL: https://github.com/apache/lucene-solr/pull/920 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13827) Fail on Unknown operation in Request Parameters API
[ https://issues.apache.org/jira/browse/SOLR-13827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949715#comment-16949715 ] Noble Paul commented on SOLR-13827: --- I guess we should just fix this opne issue and use the rewrite using annotations later > Fail on Unknown operation in Request Parameters API > --- > > Key: SOLR-13827 > URL: https://issues.apache.org/jira/browse/SOLR-13827 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: config-api >Reporter: Munendra S N >Assignee: Munendra S N >Priority: Minor > > Request Parameters API supports set, update and delete operations. For any > other operation, The API should fail and return error. > Currently, for unknown operation API returns 200 status > The config/overlay API fails on unknown operations -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13793) HTTPSolrCall makes cascading calls even when all replicas are down for a collection
[ https://issues.apache.org/jira/browse/SOLR-13793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949702#comment-16949702 ] Kesharee Nandan Vishwakarma commented on SOLR-13793: Sounds good, let me know if you need any changes or separate patch. > HTTPSolrCall makes cascading calls even when all replicas are down for a > collection > --- > > Key: SOLR-13793 > URL: https://issues.apache.org/jira/browse/SOLR-13793 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 6.6, master (9.0) >Reporter: Kesharee Nandan Vishwakarma >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13793.patch > > Time Spent: 10m > Remaining Estimate: 0h > > REMOTEQUERY action in HTTPSolrCall ends up making too many cascading > remoteQuery calls when all all the replicas of a collection are in down > state. > This results in increase in thread count, unresponsive solr nodes and > eventually node (one's which have this collection) going out of live nodes. > *Example scenario*: Consider a cluster with 3 nodes(solr1, solrw1, > solr-overseer1). A collection is present on solr1, solrw1 but both replicas > are in down state. When a search request is made to solr-overseer1, since > replica is not present locally a remote query is made to solr1 (we also > consider inactive slices/coreUrls), solr1 also doesn't see an active replica > present locally, it forwards to solrw1, again solrw1 will forward request to > solr1. This goes on till both of solr1, solrw1 become unresponsive. Attached > logs for this. > This is happening because we are considering [inactive > slices|https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L913 > ], [inactive coreUrl| > https://github.com/apache/lucene-solr/blob/68fa249034ba8b273955f20097700dc2fbb7a800/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L929] > while forwarding requests to nodes. > *Steps to reproduce*: > # Bring down all replicas of a collection but ensure nodes containing them > are up > # Make any search call to any of solr nodes for this collection. > > *Possible fixes*: > # Ensure we select only active slices/coreUrls before making remote queries > # Put a limit on cascading calls probably limit to number of replicas > > {noformat} > solrw1_1 | > solrw1_1 | 2019-09-24 09:35:14.458 ERROR (qtp762152757-8772) [ ] > o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: Error trying > to proxy request for url: http://solr1:8983/solr/kg3/select > solrw1_1 |at > org.apache.solr.servlet.HttpSolrCall.remoteQuery(HttpSolrCall.java:660) > solrw1_1 |at > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:514) > solrw1_1 |at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) > solrw1_1 |at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) > solrw1_1 |at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) > solrw1_1 |at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) > solrw1_1 |at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > solrw1_1 |at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > solrw1_1 |at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > solrw1_1 |at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > solrw1_1 |at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > solrw1_1 |at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > solrw1_1 |at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > solrw1_1 |at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > solrw1_1 |at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > solrw1_1 |at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) > solrw1_1 |at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > solrw1_1 |at > org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) > solrw1_1 |at >
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r334107604 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1691,4 +1954,180 @@ public void testBulkScorerLocking() throws Exception { t.start(); t.join(); } + + public void testRejectedExecution() throws IOException { +ExecutorService service = new TestIndexSearcher.RejectingMockExecutor(); +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); + +final Query red = new TermQuery(new Term("color", "red")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true); + +searcher.setQueryCache(queryCache); +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// To ensure that failing ExecutorService still allows query to run +// successfully + +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); + +reader.close(); +w.close(); +dir.close(); +service.shutdown(); + } + + public void testClosedReaderExecution() throws IOException { +CountDownLatch latch = new CountDownLatch(1); +ExecutorService service = new BlockedMockExecutor(latch); + +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +for (int i = 0; i < 100; i++) { + Document doc = new Document(); + StringField f = new StringField("color", "blue", Store.NO); + doc.add(f); + w.addDocument(doc); + f.setStringValue("red"); + w.addDocument(doc); + f.setStringValue("green"); + w.addDocument(doc); + + if (i % 10 == 0) { +w.commit(); + } +} + +final DirectoryReader reader = w.getReader(); + +final Query red = new TermQuery(new Term("color", "red")); + +IndexSearcher searcher = new IndexSearcher(reader, service) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true); + +searcher.setQueryCache(queryCache); +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// To ensure that failing ExecutorService still allows query to run +// successfully + +ExecutorService tempService = new ThreadPoolExecutor(2, 2, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +tempService.submit(new Runnable() { + @Override + public void run() { +try { + Thread.sleep(100); + reader.close(); +} catch (Exception e) { + throw new RuntimeException(e.getMessage()); +} + +latch.countDown(); + + } +}); + +searcher.search(new ConstantScoreQuery(red), 1); + +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); Review comment: Hmm, yeah, it is kind of strange, since the reader definitely gets closed before LRUQueryCache tries to cache the value -- but the SegmentReader still seems to be open when the caching is attempted. (I attached a debugger and jumped around). Do we need to go over all LeafReaderContext instances in the associated searcher and manually close them for this to work the way we expect? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed
[ https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949681#comment-16949681 ] Chris M. Hostetter commented on SOLR-13778: --- I just realized we're seeing a slightly _different_ SSLException from Uwe's java13 windows VMs... {noformat} [junit4]> Throwable #1: org.apache.solr.client.solrj.SolrServerException: IOException occurred when talking to server at: https://127.0.0.1:551 21/solr [junit4]>at __randomizedtesting.SeedInfo.seed([E2C1EFE3F69FB5C6:35E9A23BE77FFC28]:0) [junit4]>at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:679) [junit4]>at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:265) [junit4]>at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:248) [junit4]>at org.apache.solr.client.solrj.impl.LBSolrClient.doRequest(LBSolrClient.java:368) [junit4]>at org.apache.solr.client.solrj.impl.LBSolrClient.request(LBSolrClient.java:296) [junit4]>at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1128) [junit4]>at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.requestWithRetryOnStaleState(BaseCloudSolrClient.java:897) [junit4]>at org.apache.solr.client.solrj.impl.BaseCloudSolrClient.request(BaseCloudSolrClient.java:829) [junit4]>at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:211) [junit4]>at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:228) [junit4]>at org.apache.solr.cloud.MiniSolrCloudCluster.deleteAllCollections(MiniSolrCloudCluster.java:549) [junit4]>at org.apache.solr.cloud.TestCloudSearcherWarming.tearDown(TestCloudSearcherWarming.java:79) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4]>at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4]>at java.base/java.lang.reflect.Method.invoke(Method.java:567) [junit4]>at java.base/java.lang.Thread.run(Thread.java:830) [junit4]> Caused by: javax.net.ssl.SSLException: An established connection was aborted by the software in your host machine [junit4]>at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127) [junit4]>at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:324) [junit4]>at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:267) [junit4]>at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:262) [junit4]>at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1652) [junit4]>at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1038) [junit4]>at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) [junit4]>at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153) [junit4]>at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282) [junit4]>at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138) [junit4]>at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56) [junit4]>at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259) [junit4]>at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163) [junit4]>at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165) [junit4]>at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273) [junit4]>at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) [junit4]>at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272) [junit4]>at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185) [junit4]>at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89) [junit4]>at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110) [junit4]>at
[GitHub] [lucene-solr] atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
atris commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r334101248 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -244,6 +275,213 @@ public void testLRUEviction() throws Exception { dir.close(); } + public void testLRUConcurrentLoadAndEviction() throws Exception { +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); +ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final CountDownLatch[] latch = {new CountDownLatch(1)}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true) { + @Override + protected void onDocIdSetCache(Object readerCoreKey, long ramBytesUsed) { +super.onDocIdSetCache(readerCoreKey, ramBytesUsed); +latch[0].countDown(); + } +}; + +final Query blue = new TermQuery(new Term("color", "blue")); +final Query red = new TermQuery(new Term("color", "red")); +final Query green = new TermQuery(new Term("color", "green")); + +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCache(queryCache); +// the filter is not cached on any segment: no changes +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(green), 1); +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// First read should miss +searcher.search(new ConstantScoreQuery(red), 1); + + +// Let the cache load be completed +latch[0].await(); +searcher.search(new ConstantScoreQuery(red), 1); + +// Second read should hit +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); Review comment: The second search is there to test that once the value is loaded asynchronously -- it exists and does not trigger another load (hence the lack of a wait there). Removed the extra search. thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] atris commented on issue #923: LUCENE-8988: Introduce Global Feature Based Early Termination For Sorted Fields
atris commented on issue #923: LUCENE-8988: Introduce Global Feature Based Early Termination For Sorted Fields URL: https://github.com/apache/lucene-solr/pull/923#issuecomment-541154942 Any thoughts on this one? Seems useful enough? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949654#comment-16949654 ] Adrien Grand commented on LUCENE-8920: -- Right, this is what I had in mind, trying to reproduce the issue with values that look more real. > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949647#comment-16949647 ] Michael Sokolov edited comment on LUCENE-8920 at 10/11/19 5:14 PM: --- For posterity, this is the worst case test that spreads out terms {{ for (int i = 0; i < 100; ++i) { byte[] b = new byte[5]; random().nextBytes(b); for (int j = 0; j < b.length; ++j){ b[j] &= 0xfc; // make this byte a multiple of 4 } entries.add(new BytesRef(b)); } buildFST(entries).ramBytesUsed();}} was (Author: sokolov): {{For posterity, this is the worst case test that spreads out terms}} for (int i = 0; i < 100; ++i) { byte[] b = new byte[5]; random().nextBytes(b); for (int j = 0; j < b.length; ++j) { b[j] &= 0xfc; // make this byte a multiple of 4 } entries.add(new BytesRef(b)); } buildFST(entries).ramBytesUsed(); > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949647#comment-16949647 ] Michael Sokolov edited comment on LUCENE-8920 at 10/11/19 5:13 PM: --- {{For posterity, this is the worst case test that spreads out terms}} for (int i = 0; i < 100; ++i) { byte[] b = new byte[5]; random().nextBytes(b); for (int j = 0; j < b.length; ++j) { b[j] &= 0xfc; // make this byte a multiple of 4 } entries.add(new BytesRef(b)); } buildFST(entries).ramBytesUsed(); was (Author: sokolov): {{For posterity, this is the worst case test that spreads out terms}} {{}}for (int i = 0; i < 100; ++i) { byte[] b = new byte[5]; random().nextBytes(b); for (int j = 0; j < b.length; ++j) { b[j] &= 0xfc; // make this byte a multiple of 4 } entries.add(new BytesRef(b)); } buildFST(entries).ramBytesUsed(); > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949647#comment-16949647 ] Michael Sokolov commented on LUCENE-8920: - {{For posterity, this is the worst case test that spreads out terms}} {{}}for (int i = 0; i < 100; ++i) { byte[] b = new byte[5]; random().nextBytes(b); for (int j = 0; j < b.length; ++j) { b[j] &= 0xfc; // make this byte a multiple of 4 } entries.add(new BytesRef(b)); } buildFST(entries).ramBytesUsed(); > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949635#comment-16949635 ] Michael Sokolov commented on LUCENE-8920: - I think you had previously created a test case for this, [~jpountz] that demonstrated some larger memory usage than we wanted. I was referring to the fact that it was somewhat artificial data distribution, and the main issue that seems to arise is some regression tests at ES that may? have a more realistic distribution of terms. I'm just not convinced that we need to handle every adversarial case? > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13817) Deprecate legacy SolrCache implementations
[ https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949630#comment-16949630 ] Ben Manes commented on SOLR-13817: -- You might also want to review whether atomic computations (loading through the cache) would provide a performance benefit. This is supported by Caffeine (built on {{computeIfAbsent}}) and avoids performing costly redundant work. It probably isn't worth the effort to implement it in the other caches if they are eventually removed. For example {{RptWithGeometrySpatialField}} and {{BlockJoinParentQParser}} show classic patterns of a racy get-compute-put idiom: {code} SolrCache parentCache = request.getSearcher().getCache(CACHE_NAME); // lazily retrieve from solr cache Filter filter = null; if (parentCache != null) { filter = (Filter) parentCache.get(parentList); } BitDocIdSetFilterWrapper result; if (filter instanceof BitDocIdSetFilterWrapper) { result = (BitDocIdSetFilterWrapper) filter; } else { result = new BitDocIdSetFilterWrapper(createParentFilter(parentList)); if (parentCache != null) { parentCache.put(parentList, result); } } return result; {code} If multiple threads require the same key then they will observe have a cache miss, perform an expensive call (or else why cached?), and insert their results. By using {{computeIfAbsent}} style call, this will be performed by one thread under a striped lock (hashbin lock) and the others will wait patiently for the results. If it was present, in Caffeine's case, it will be a lock-free read so there is no locking overhead. This avoids cache stampedes and can have a performance impact under load. > Deprecate legacy SolrCache implementations > -- > > Key: SOLR-13817 > URL: https://issues.apache.org/jira/browse/SOLR-13817 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > Now that SOLR-8241 has been committed I propose to deprecate other cache > implementations in 8x and remove them altogether from 9.0, in order to reduce > confusion and maintenance costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13835) HttpSolrCall produces incorrect extra AuditEvent on AuthorizationResponse.PROMPT
[ https://issues.apache.org/jira/browse/SOLR-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949629#comment-16949629 ] Chris M. Hostetter commented on SOLR-13835: --- Jan: Maybe i'm missing something, but IIUC in the context of how simple those blocks were when the code was initially added, it was reasonable for the first block to fall through to the second: Back when the code was introduced in SOLR-7757: * If authResp.status == PROMPT: do some logging specific to the authResp, and add some HTTP response headers specified by the auth plugin * If authResp.status != OK: sendError(authResp.status) ** ie: it didn't matter if the authResp.status was PROMPT, or FORBIDDEN, or anything else ... it wasn't ok so send an error with ...it's only as a result of changes introduced since then (with the addition of audit logging to each of the conditionals) that we now have a bug in the form of multiple Audit Events when authResp is PROMPT. IIUC: from the perspective of the external client the behavior is still entirely correct either way, it's only if/how an AuditLogger plugin is used and what it expects that seems to be at risk. (particularly since the AuthorizationPlugin API seems open enough (ie: there is no fixed enum of authResponse.statusCode values) that a custom plugin could return a lot of diff non-200/202 error codes that the AuditLogger would all report as "UNAUTHORIZED") > HttpSolrCall produces incorrect extra AuditEvent on > AuthorizationResponse.PROMPT > > > Key: SOLR-13835 > URL: https://issues.apache.org/jira/browse/SOLR-13835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Authentication, Authorization >Reporter: Chris M. Hostetter >Priority: Major > > spinning this out of SOLR-13741... > {quote} > Wrt the REJECTED + UNAUTHORIZED events I see the same as you, and I believe > there is a code bug, not a test bug. In HttpSolrCall#471 in the > {{authorize()}} call, if authResponse == PROMPT, it will actually match both > blocks and emit two audit events: > [https://github.com/apache/lucene-solr/blob/26ede632e6259eb9d16861a3c0f782c9c8999762/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L475:L493] > > {code:java} > if (authResponse.statusCode == AuthorizationResponse.PROMPT.statusCode) {...} > if (!(authResponse.statusCode == HttpStatus.SC_ACCEPTED) && > !(authResponse.statusCode == HttpStatus.SC_OK)) {...} > {code} > When code==401, it is also true that code!=200. Intuitively there should be > both a sendErrora and return RETURN before line #484 in the first if block? > {quote} > This causes any and all {{REJECTED}} AuditEvent messages to be accompanied by > a coresponding {{UNAUTHORIZED}} AuditEvent. > It's not yet clear if, from the perspective of the external client, there are > any other bugs in behavior (TBD) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] chatman opened a new pull request #942: SOLR-13834: ZkController#getSolrCloudManager() now uses the same ZkStateReader
chatman opened a new pull request #942: SOLR-13834: ZkController#getSolrCloudManager() now uses the same ZkStateReader URL: https://github.com/apache/lucene-solr/pull/942 Details in the JIRA. All tests pass. (FYI, without the changes to AddShardCmd and SplitShardCmd, the CollectionsTooManyReplicasTest was failing.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13834) ZkController#getSolrCloudManager() creates a new instance of ZkStateReader
[ https://issues.apache.org/jira/browse/SOLR-13834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949619#comment-16949619 ] ASF subversion and git services commented on SOLR-13834: Commit 1a45b35baf765b4ab13bf2edf7fc664af7d6d6c4 in lucene-solr's branch refs/heads/jira/SOLR-13834 from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1a45b35 ] SOLR-13834: ZkController#getSolrCloudManager() now uses the same ZkStateReader instance instead of instantiating a new one ZkController#getSolrCloudManager() created a new instance of ZkStateReader, thereby causing mismatch in the visibility of the cluster state and, as a result, undesired race conditions. > ZkController#getSolrCloudManager() creates a new instance of ZkStateReader > -- > > Key: SOLR-13834 > URL: https://issues.apache.org/jira/browse/SOLR-13834 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Ishan Chattopadhyaya >Priority: Major > > It should be reusing the existing ZkStateReader instance . Multiple > ZkStateReader instance have different visibility to the ZK state and cause > race conditions -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13472) HTTP requests to a node that does not hold a core of the collection are unauthorized
[ https://issues.apache.org/jira/browse/SOLR-13472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949593#comment-16949593 ] ASF subversion and git services commented on SOLR-13472: Commit b4242a1bfb418e8b1f1cedf4cf9f97e20e4cd866 in lucene-solr's branch refs/heads/branch_7_7 from Ishan Chattopadhyaya [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b4242a1 ] SOLR-13472: Forwarded requests should skip authorization on receiving nodes > HTTP requests to a node that does not hold a core of the collection are > unauthorized > > > Key: SOLR-13472 > URL: https://issues.apache.org/jira/browse/SOLR-13472 > Project: Solr > Issue Type: Bug > Components: Authorization >Affects Versions: 7.7.1, 8.0 >Reporter: adfel >Assignee: Ishan Chattopadhyaya >Priority: Minor > Labels: security > Fix For: 8.2 > > Attachments: SOLR-13472.patch, SOLR-13472.patch > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When creating collection in SolrCloud, collection is available for queries > and updates through all Solr nodes, in particular nodes that does not hold > one of collection's cores. This is expected behaviour that works when using > SolrJ client or HTTP requests. > When enabling authorization rules it seems that this behaviour is broken for > HTTP requests: > - executing request to a node that holds part of the collection (core) obey > to authorization rules as expected. > - other nodes respond with code 403 - unauthorized request. > SolrJ still works as expected. > Tested both with BasicAuthPlugin and KerberosPlugin authentication plugins. > +Steps for reproduce:+ > 1. Create a cloud made of 2 nodes (node_1, node_2). > 2. Configure authentication and authorization by uploading following > security.json file to zookeeper: > > {code:java} > { > "authentication": { >"blockUnknown": true, >"class": "solr.BasicAuthPlugin", >"credentials": { > "solr": "'solr' user password_hash", > "indexer_app": "'indexer_app' password_hash", > "read_user": "'read_user' password_hash" >} > }, > "authorization": { >"class": "solr.RuleBasedAuthorizationPlugin", >"permissions": [ > { >"name": "read", >"role": "*" > }, > { >"name": "update", >"role": [ > "indexer", > "admin" >] > }, > { >"name": "all", >"role": "admin" > } >], >"user-role": { > "solr": "admin", > "indexer_app": "indexer" >} > } > }{code} > > 3. create 'test' collection with one shard on *node_1*. > -- > The following requests expected to succeed but return 403 status > (unauthorized request): > {code:java} > curl -u read_user:read_user "http://node_2/solr/test/select?q=*:*; > curl -u indexer_app:indexer_app "http://node_2/solr/test/select?q=*:*; > curl -u indexer_app:indexer_app "http://node_2/solr/test/update?commit=true; > {code} > > Authenticated '_solr_' user requests works as expected. My guess is due to > the special '_all_' role. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8928) BKDWriter could make splitting decisions based on the actual range of values
[ https://issues.apache.org/jira/browse/LUCENE-8928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949591#comment-16949591 ] ASF subversion and git services commented on LUCENE-8928: - Commit a9c77504023b3f1e0b81dbe52537fa19f4586200 in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=a9c7750 ] LUCENE-8928: Compute exact bounds every N splits (#926) When building a kd-tree for dimensions n > 2, compute exact bounds for an inner node every N splits to improve the quality of the tree. N is defined by SPLITS_BEFORE_EXACT_BOUNDS which is set to 4. > BKDWriter could make splitting decisions based on the actual range of values > > > Key: LUCENE-8928 > URL: https://issues.apache.org/jira/browse/LUCENE-8928 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Time Spent: 20m > Remaining Estimate: 0h > > Currently BKDWriter assumes that splitting on one dimension has no effect on > values in other dimensions. While this may be ok for geo points, this is > usually not true for ranges (or geo shapes, which are ranges too). Maybe we > could get better indexing by re-computing the range of values on each > dimension before making the choice of the split dimension? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase merged pull request #926: LUCENE-8928: Compute exact bounds every N splits
iverase merged pull request #926: LUCENE-8928: Compute exact bounds every N splits URL: https://github.com/apache/lucene-solr/pull/926 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949575#comment-16949575 ] Adrien Grand commented on LUCENE-8920: -- Ah sorry it was not clear to me this was blocking you. I should be able to make a standalone test that reproduces the memory usage increase. > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949568#comment-16949568 ] Michael Sokolov commented on LUCENE-8920: - Fine by me. I find it too difficult to iterate on a more refined solution given limited access to the benchmarking tools we are using for evaluation. > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13815) Live split can lose data
[ https://issues.apache.org/jira/browse/SOLR-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-13815: Fix Version/s: 8.3 > Live split can lose data > > > Key: SOLR-13815 > URL: https://issues.apache.org/jira/browse/SOLR-13815 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Fix For: 8.3 > > Attachments: fail.191004_053129, fail.191004_093307 > > Time Spent: 10m > Remaining Estimate: 0h > > This issue is to investigate potential data loss during a "live" split (i.e. > split happens while updates are flowing) > This was discovered during the shared storage work which was based on a > non-release branch_8x sometime before 8.3, hence the first steps are to try > and reproduce on the master branch without any shared storage changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13105) A visual guide to Solr Math Expressions and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-13105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949539#comment-16949539 ] ASF subversion and git services commented on SOLR-13105: Commit 4af0b9f46256b7ce1ce203aee6fe891f5693657f in lucene-solr's branch refs/heads/SOLR-13105-visual from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4af0b9f ] SOLR-13105: Improve ML docs 21 > A visual guide to Solr Math Expressions and Streaming Expressions > - > > Key: SOLR-13105 > URL: https://issues.apache.org/jira/browse/SOLR-13105 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Major > Attachments: Screen Shot 2019-01-14 at 10.56.32 AM.png, Screen Shot > 2019-02-21 at 2.14.43 PM.png, Screen Shot 2019-03-03 at 2.28.35 PM.png, > Screen Shot 2019-03-04 at 7.47.57 PM.png, Screen Shot 2019-03-13 at 10.47.47 > AM.png, Screen Shot 2019-03-30 at 6.17.04 PM.png > > > Visualization is now a fundamental element of Solr Streaming Expressions and > Math Expressions. This ticket will create a visual guide to Solr Math > Expressions and Solr Streaming Expressions that includes *Apache Zeppelin* > visualization examples. > It will also cover using the JDBC expression to *analyze* and *visualize* > results from any JDBC compliant data source. > Intro from the guide: > {code:java} > Streaming Expressions exposes the capabilities of Solr Cloud as composable > functions. These functions provide a system for searching, transforming, > analyzing and visualizing data stored in Solr Cloud collections. > At a high level there are four main capabilities that will be explored in the > documentation: > * Searching, sampling and aggregating results from Solr. > * Transforming result sets after they are retrieved from Solr. > * Analyzing and modeling result sets using probability and statistics and > machine learning libraries. > * Visualizing result sets, aggregations and statistical models of the data. > {code} > > A few sample visualizations are attached to the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding
[ https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949498#comment-16949498 ] Adrien Grand commented on LUCENE-8920: -- Changing the constant would work for me, I just wonder that it would maybe be easier to revert in order to have to deal with fewer version numbers of the FST class in the future. Maybe another way we could fix the worst-case memory usage while keeping the improved runtime would be to have the factor depend on how deep we are in the FST since this change is more useful on frequently accessed nodes, which are likely the nodes that are closer to the root? I wouldn't want to hold the release too long because of this change so I'm suggesting reverting from all branches on Monday, and we can work on some of the options that have been mentioned above to keep the worst-case scenario more contained. Any objections? > Reduce size of FSTs due to use of direct-addressing encoding > - > > Key: LUCENE-8920 > URL: https://issues.apache.org/jira/browse/LUCENE-8920 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Blocker > Fix For: 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Some data can lead to worst-case ~4x RAM usage due to this optimization. > Several ideas were suggested to combat this on the mailing list: > bq. I think we can improve thesituation here by tracking, per-FST instance, > the size increase we're seeing while building (or perhaps do a preliminary > pass before building) in order to decide whether to apply the encoding. > bq. we could also make the encoding a bit more efficient. For instance I > noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) > which make gaps very costly. Associating each label with a dense id and > having an intermediate lookup, ie. lookup label -> id and then id->arc offset > instead of doing label->arc directly could save a lot of space in some cases? > Also it seems that we are repeating the label in the arc metadata when > array-with-gaps is used, even though it shouldn't be necessary since the > label is implicit from the address? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs
[ https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949455#comment-16949455 ] ASF subversion and git services commented on SOLR-13787: Commit 5b6561eadb522150c8ea2954d60077ac445ad1d7 in lucene-solr's branch refs/heads/master from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5b6561e ] SOLR-13787: Support for Payload as 3rd param > An annotation based system to write v2 only APIs > > > Key: SOLR-13787 > URL: https://issues.apache.org/jira/browse/SOLR-13787 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: master (9.0), 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > example v2 API may look as follows > {code:java} > @V2EndPoint(method = POST, path = "/cluster/package", permission = > PermissionNameProvider.Name.ALL) > public static class ApiTest { > @Command(name = "add") > public void add(SolrQueryRequest req, SolrQueryResponse rsp, AddVersion > addVersion) { > } > @Command(name = "delete") > public void del(SolrQueryRequest req, SolrQueryResponse rsp, List > names) { > } > } > public static class AddVersion { > @JsonProperty(value = "package", required = true) > public String pkg; > @JsonProperty(value = "version", required = true) > public String version; > @JsonProperty(value = "files", required = true) > public List files; > } > {code} > This expects you to already hava a POJO annotated with jackson annotations > > The annotations are: > > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target({ElementType.TYPE}) > public @interface EndPoint { > /**The suoported http methods*/ > SolrRequest.METHOD[] method(); > /**supported paths*/ > String[] path(); > PermissionNameProvider.Name permission(); > } > {code} > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target(ElementType.METHOD) > public @interface Command { >/**if this is not a json command , leave it empty. >* Keep in mind that you cannot have duplicates. >* Only one method per name >* >*/ > String name() default ""; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs
[ https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949453#comment-16949453 ] ASF subversion and git services commented on SOLR-13787: Commit dcb7abfc0ee3e9ac8827bf7b0128f1249fb7fc7e in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=dcb7abf ] SOLR-13787: Added support for PayLoad as 3rd param > An annotation based system to write v2 only APIs > > > Key: SOLR-13787 > URL: https://issues.apache.org/jira/browse/SOLR-13787 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: master (9.0), 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > example v2 API may look as follows > {code:java} > @V2EndPoint(method = POST, path = "/cluster/package", permission = > PermissionNameProvider.Name.ALL) > public static class ApiTest { > @Command(name = "add") > public void add(SolrQueryRequest req, SolrQueryResponse rsp, AddVersion > addVersion) { > } > @Command(name = "delete") > public void del(SolrQueryRequest req, SolrQueryResponse rsp, List > names) { > } > } > public static class AddVersion { > @JsonProperty(value = "package", required = true) > public String pkg; > @JsonProperty(value = "version", required = true) > public String version; > @JsonProperty(value = "files", required = true) > public List files; > } > {code} > This expects you to already hava a POJO annotated with jackson annotations > > The annotations are: > > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target({ElementType.TYPE}) > public @interface EndPoint { > /**The suoported http methods*/ > SolrRequest.METHOD[] method(); > /**supported paths*/ > String[] path(); > PermissionNameProvider.Name permission(); > } > {code} > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target(ElementType.METHOD) > public @interface Command { >/**if this is not a json command , leave it empty. >* Keep in mind that you cannot have duplicates. >* Only one method per name >* >*/ > String name() default ""; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13787) An annotation based system to write v2 only APIs
[ https://issues.apache.org/jira/browse/SOLR-13787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949454#comment-16949454 ] ASF subversion and git services commented on SOLR-13787: Commit 71e9564e0d520449b6eeb52a6f67ede91ff091a7 in lucene-solr's branch refs/heads/branch_8x from Noble Paul [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=71e9564 ] SOLR-13787: Support for Payload as 3rd param > An annotation based system to write v2 only APIs > > > Key: SOLR-13787 > URL: https://issues.apache.org/jira/browse/SOLR-13787 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: master (9.0), 8.3 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > example v2 API may look as follows > {code:java} > @V2EndPoint(method = POST, path = "/cluster/package", permission = > PermissionNameProvider.Name.ALL) > public static class ApiTest { > @Command(name = "add") > public void add(SolrQueryRequest req, SolrQueryResponse rsp, AddVersion > addVersion) { > } > @Command(name = "delete") > public void del(SolrQueryRequest req, SolrQueryResponse rsp, List > names) { > } > } > public static class AddVersion { > @JsonProperty(value = "package", required = true) > public String pkg; > @JsonProperty(value = "version", required = true) > public String version; > @JsonProperty(value = "files", required = true) > public List files; > } > {code} > This expects you to already hava a POJO annotated with jackson annotations > > The annotations are: > > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target({ElementType.TYPE}) > public @interface EndPoint { > /**The suoported http methods*/ > SolrRequest.METHOD[] method(); > /**supported paths*/ > String[] path(); > PermissionNameProvider.Name permission(); > } > {code} > {code:java} > @Retention(RetentionPolicy.RUNTIME) > @Target(ElementType.METHOD) > public @interface Command { >/**if this is not a json command , leave it empty. >* Keep in mind that you cannot have duplicates. >* Only one method per name >* >*/ > String name() default ""; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13829) RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch
[ https://issues.apache.org/jira/browse/SOLR-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein resolved SOLR-13829. --- Fix Version/s: 8.3 Resolution: Resolved > RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing > mismatch > - > > Key: SOLR-13829 > URL: https://issues.apache.org/jira/browse/SOLR-13829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Trey Grainger >Priority: Major > Fix For: 8.3 > > Attachments: SOLR-13829.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In trying to use the "sort" streaming evaluator on float field (pfloat), I am > getting casting errors back based upon which values are calculated based upon > underlying values in a field. > Example: > *Docs:* (paste each into "Documents" pane in Solr Admin UI as type:"json") > > {code:java} > {"id": "1", "name":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]} > {"id": "2", "name":"cheese > pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}{code} > > *Streaming Expression:* > > {code:java} > sort(select(search(food_collection, q="*:*", fl="id,vector_fs", sort="id > asc"), cosineSimilarity(vector_fs, array(5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as > sim, id), by="sim desc"){code} > > *Response:* > > {code:java} > { > "result-set": { > "docs": [ > { > "EXCEPTION": "class java.lang.Double cannot be cast to class > java.lang.Long (java.lang.Double and java.lang.Long are in module java.base > of loader 'bootstrap')", > "EOF": true, > "RESPONSE_TIME": 13 > } > ] > } > }{code} > > > This is because in org.apache.solr.client.solrj.io.eval.RecursiveEvaluator, > there is a line which examines a numeric (BigDecimal) value and - regardless > of the type of the field the value originated from - converts it to a Long if > it looks like a whole number. This is the code in question from that class: > {code:java} > protected Object normalizeOutputType(Object value) { > if(null == value){ > return null; > } else if (value instanceof VectorFunction) { > return value; > } else if(value instanceof BigDecimal){ > BigDecimal bd = (BigDecimal)value; > if(bd.signum() == 0 || bd.scale() <= 0 || > bd.stripTrailingZeros().scale() <= 0){ > try{ > return bd.longValueExact(); > } > catch(ArithmeticException e){ > // value was too big for a long, so use a double which can handle > scientific notation > } > } > > return bd.doubleValue(); > } > ... [other type conversions] > {code} > Because of the *return bd.longValueExact()*; line, the calculated value for > "sim" in doc 1 is "Float(1)", whereas the calculated value for "sim" for doc > 2 is "Double(0.88938313). These are coming back as incompatible data types, > even though the source data is all of the same type and should be comparable. > Thus when the *sort* evaluator streaming expression (and probably others) > runs on these calculated values and the list should contain ["0.88938313", > "1.0"], an exception is thrown because the it's trying to compare > incompatible data types [Double("0.99"), Long(1)]. > This bug is occurring on master currently, but has probably existed in the > codebase since at least August 2017. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13829) RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch
[ https://issues.apache.org/jira/browse/SOLR-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949427#comment-16949427 ] ASF subversion and git services commented on SOLR-13829: Commit 30feba4045967a95820af670d4e8a9b02e57b536 in lucene-solr's branch refs/heads/branch_8_3 from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=30feba4 ] SOLR-13829: Update CHANGES.txt > RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing > mismatch > - > > Key: SOLR-13829 > URL: https://issues.apache.org/jira/browse/SOLR-13829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Trey Grainger >Priority: Major > Attachments: SOLR-13829.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In trying to use the "sort" streaming evaluator on float field (pfloat), I am > getting casting errors back based upon which values are calculated based upon > underlying values in a field. > Example: > *Docs:* (paste each into "Documents" pane in Solr Admin UI as type:"json") > > {code:java} > {"id": "1", "name":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]} > {"id": "2", "name":"cheese > pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}{code} > > *Streaming Expression:* > > {code:java} > sort(select(search(food_collection, q="*:*", fl="id,vector_fs", sort="id > asc"), cosineSimilarity(vector_fs, array(5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as > sim, id), by="sim desc"){code} > > *Response:* > > {code:java} > { > "result-set": { > "docs": [ > { > "EXCEPTION": "class java.lang.Double cannot be cast to class > java.lang.Long (java.lang.Double and java.lang.Long are in module java.base > of loader 'bootstrap')", > "EOF": true, > "RESPONSE_TIME": 13 > } > ] > } > }{code} > > > This is because in org.apache.solr.client.solrj.io.eval.RecursiveEvaluator, > there is a line which examines a numeric (BigDecimal) value and - regardless > of the type of the field the value originated from - converts it to a Long if > it looks like a whole number. This is the code in question from that class: > {code:java} > protected Object normalizeOutputType(Object value) { > if(null == value){ > return null; > } else if (value instanceof VectorFunction) { > return value; > } else if(value instanceof BigDecimal){ > BigDecimal bd = (BigDecimal)value; > if(bd.signum() == 0 || bd.scale() <= 0 || > bd.stripTrailingZeros().scale() <= 0){ > try{ > return bd.longValueExact(); > } > catch(ArithmeticException e){ > // value was too big for a long, so use a double which can handle > scientific notation > } > } > > return bd.doubleValue(); > } > ... [other type conversions] > {code} > Because of the *return bd.longValueExact()*; line, the calculated value for > "sim" in doc 1 is "Float(1)", whereas the calculated value for "sim" for doc > 2 is "Double(0.88938313). These are coming back as incompatible data types, > even though the source data is all of the same type and should be comparable. > Thus when the *sort* evaluator streaming expression (and probably others) > runs on these calculated values and the list should contain ["0.88938313", > "1.0"], an exception is thrown because the it's trying to compare > incompatible data types [Double("0.99"), Long(1)]. > This bug is occurring on master currently, but has probably existed in the > codebase since at least August 2017. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13829) RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing mismatch
[ https://issues.apache.org/jira/browse/SOLR-13829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949422#comment-16949422 ] ASF subversion and git services commented on SOLR-13829: Commit bed9e7c47432777ff09fa8d03d435ad0e59b518a in lucene-solr's branch refs/heads/master from Joel Bernstein [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bed9e7c ] SOLR-13829: Update CHANGES.txt > RecursiveEvaluator casts Continuous numbers to Discrete Numbers, causing > mismatch > - > > Key: SOLR-13829 > URL: https://issues.apache.org/jira/browse/SOLR-13829 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Trey Grainger >Priority: Major > Attachments: SOLR-13829.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In trying to use the "sort" streaming evaluator on float field (pfloat), I am > getting casting errors back based upon which values are calculated based upon > underlying values in a field. > Example: > *Docs:* (paste each into "Documents" pane in Solr Admin UI as type:"json") > > {code:java} > {"id": "1", "name":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]} > {"id": "2", "name":"cheese > pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}{code} > > *Streaming Expression:* > > {code:java} > sort(select(search(food_collection, q="*:*", fl="id,vector_fs", sort="id > asc"), cosineSimilarity(vector_fs, array(5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as > sim, id), by="sim desc"){code} > > *Response:* > > {code:java} > { > "result-set": { > "docs": [ > { > "EXCEPTION": "class java.lang.Double cannot be cast to class > java.lang.Long (java.lang.Double and java.lang.Long are in module java.base > of loader 'bootstrap')", > "EOF": true, > "RESPONSE_TIME": 13 > } > ] > } > }{code} > > > This is because in org.apache.solr.client.solrj.io.eval.RecursiveEvaluator, > there is a line which examines a numeric (BigDecimal) value and - regardless > of the type of the field the value originated from - converts it to a Long if > it looks like a whole number. This is the code in question from that class: > {code:java} > protected Object normalizeOutputType(Object value) { > if(null == value){ > return null; > } else if (value instanceof VectorFunction) { > return value; > } else if(value instanceof BigDecimal){ > BigDecimal bd = (BigDecimal)value; > if(bd.signum() == 0 || bd.scale() <= 0 || > bd.stripTrailingZeros().scale() <= 0){ > try{ > return bd.longValueExact(); > } > catch(ArithmeticException e){ > // value was too big for a long, so use a double which can handle > scientific notation > } > } > > return bd.doubleValue(); > } > ... [other type conversions] > {code} > Because of the *return bd.longValueExact()*; line, the calculated value for > "sim" in doc 1 is "Float(1)", whereas the calculated value for "sim" for doc > 2 is "Double(0.88938313). These are coming back as incompatible data types, > even though the source data is all of the same type and should be comparable. > Thus when the *sort* evaluator streaming expression (and probably others) > runs on these calculated values and the list should contain ["0.88938313", > "1.0"], an exception is thrown because the it's trying to compare > incompatible data types [Double("0.99"), Long(1)]. > This bug is occurring on master currently, but has probably existed in the > codebase since at least August 2017. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13828) Improve ExecutePlanAction error handling
[ https://issues.apache.org/jira/browse/SOLR-13828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949414#comment-16949414 ] ASF subversion and git services commented on SOLR-13828: Commit 9f9e19c2a647cb24e3ae3ec951a84112cb70ae0e in lucene-solr's branch refs/heads/branch_7_7 from Andrzej Bialecki [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9f9e19c ] SOLR-13828: Improve ExecutePlanAction error handling. > Improve ExecutePlanAction error handling > > > Key: SOLR-13828 > URL: https://issues.apache.org/jira/browse/SOLR-13828 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: AutoScaling >Affects Versions: 7.7.2, 8.2, 8.3 >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > > There's a bug in {{ExecutePlanAction}}, where it's possible that in some > situations it would create duplicate asyncId-s for events with multiple > operations - unit tests didn't catch it probably because operations took > shorter time than the default task timeout, which is 120 sec and this > situation would arise if the task timeout was reached but the task was still > running. > Also, error handling in ExecutePlanAction should be improved to correctly > throw Exceptions when an operation fails to complete - it's possible now for > an operation to fail yet the ExecutePlanAction to report success. > This also brings a question of the task timeout - currently it's not > configurable, but it should be. It can be configured in the action properties. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13837) AuditLogger must handle V2 requests better
[ https://issues.apache.org/jira/browse/SOLR-13837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-13837: --- Description: Spinoff from SOLR-13741 Turns out that Audit logger does not log the body of V2 Admin API requests and needs a general improvement in how V2 requests are handled, i.e: * We do not audit log the BODY of the request (which is where the action is) * We do not detect what collections the request is for (so the AuditEvent#collections array is null) * The resource path is internal format {{/v2/c}} instead of {{/api/c}} (should we convert the prefix in the AuditEvent?) was: Spinoff from SOLR-13741 Turns out that Audit logger does not log the body of V2 Admin API requests and needs a general improvement in how V2 requests are handled. > AuditLogger must handle V2 requests better > -- > > Key: SOLR-13837 > URL: https://issues.apache.org/jira/browse/SOLR-13837 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Auditlogging >Affects Versions: 8.2 >Reporter: Jan Høydahl >Priority: Major > > Spinoff from SOLR-13741 > Turns out that Audit logger does not log the body of V2 Admin API requests > and needs a general improvement in how V2 requests are handled, i.e: > * We do not audit log the BODY of the request (which is where the action is) > * We do not detect what collections the request is for (so the > AuditEvent#collections array is null) > * The resource path is internal format {{/v2/c}} instead of {{/api/c}} > (should we convert the prefix in the AuditEvent?) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13741) possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest
[ https://issues.apache.org/jira/browse/SOLR-13741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949366#comment-16949366 ] Jan Høydahl commented on SOLR-13741: Ok, uploaded yet another patch with a new test for V2 API. Discovered that the path is {{/v2/c}} and not {{/api/c}} as expected, so modified the ADMIN detection based on that. Also for V2 request we are lacking in several ways: * We do not audit log the BODY of the request (which is where the action is) * We do not detect what collections the request is for (so the AuditEvent#collections array is null) * The resource path is internal format {{/v2/c}} instead of {{/api/c}} (should we convert the prefix in the AuditEvent?) I spun V2 improvements off into SOLR-13837 to not delay this effort > possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest > -- > > Key: SOLR-13741 > URL: https://issues.apache.org/jira/browse/SOLR-13741 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-13741.patch, SOLR-13741.patch, SOLR-13741.patch, > SOLR-13741.patch, SOLR-13741.patch > > > A while back i saw a weird non-reproducible failure from > AuditLoggerIntegrationTest. When i started reading through that code, 2 > things jumped out at me: > # the way the 'delay' option works is brittle, and makes assumptions about > CPU scheduling that aren't neccessarily going to be true (and also suffers > from the problem that Thread.sleep isn't garunteed to sleep as long as you > ask it too) > # the way the existing {{waitForAuditEventCallbacks(number)}} logic works by > checking the size of a (List) {{buffer}} of recieved events in a sleep/poll > loop, until it contains at least N items -- but the code that adds items to > that buffer in the async Callback thread async _before_ the code that updates > other state variables (like the global {{count}} and the patch specific > {{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 > events added to the buffer, but calling {{assertEquals(3, > receiver.getTotalCount())}} could subsequently fail because that variable > hadn't been udpated yet. > #2 was the source of the failures I was seeing, and while a quick fix for > that specific problem would be to update all other state _before_ adding the > event to the buffer, I set out to try and make more general improvements to > the test: > * eliminate the dependency on sleep loops by {{await}}-ing on concurrent data > structures > * harden the assertions made about the expected events recieved (updating > some test methods that currently just assert the number of events recieved) > * add new assertions that _only_ the expected events are recieved. > In the process of doing this, I've found several oddities/descrepencies > between things the test currently claims/asserts, and what *actually* happens > under more rigerous scrutiny/assertions. > I'll attach a patch shortly that has my (in progress) updates and inlcudes > copious nocommits about things seem suspect. the summary of these concerns > is: > * SolrException status codes that do not match what the existing test says > they should (but doesn't assert) > * extra AuditEvents occuring that the existing test does not expect > * AuditEvents for incorrect credentials that do not at all match the expected > AuditEvent in the existing test -- which the current test seems to miss in > it's assertions because it's picking up some extra events from triggered by > previuos requests earlier in the test that just happen to also match the > asserctions. > ...it's not clear to me if the test logic is correct and these are "code > bugs" or if the test is faulty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333896983 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if ((cost > skipCacheCost || cost > leadCost * skipCacheFactor) + && in.getQuery() instanceof IndexOrDocValuesQuery) { Review comment: This PR is mainly for IndexOrDocValuesQuery now. As discussed earlier, the reason why IndexOrDocValuesQuery slow down is that a large amount of data will be read during caching action, while only a small amount of data will be read from doc values when not caching. I don't find any other type of query that reads much more data for caching than it really needs. @jpountz Looking forward to more discussions if you think this PR should apply to all query types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333896831 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if ((cost > skipCacheCost || cost > leadCost * skipCacheFactor) Review comment: We have tested different scenarios to observe the query latency with/without cacheing in an online ES cluster. Here is the result: | queryPattern | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | -- | :---: | :---: | :---: | :---: | :---: | | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 | | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 | | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 | | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 | | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 1274 | | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 2548 | As the table shows, query latency without caching is low and it's related with the final result set. Query latency with caching is much high and it's mainly related with _rangeQueryCost_. According to the above test, we set the default value of _skipCacheFactor_ to 250, which make the query slower by no more than 10 times. In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in LUCENE-8027, we add a new parameter _skipCacheCost_. The mainly reasons are: - control the time used for caching as the caching time is related to the cost of range query. - skip caching too large range queries which will consume too much memory and evict cache entries frequently. How do you think? Looking forward to your ideas. @jpountz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13741) possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest
[ https://issues.apache.org/jira/browse/SOLR-13741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-13741: --- Attachment: SOLR-13741.patch > possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest > -- > > Key: SOLR-13741 > URL: https://issues.apache.org/jira/browse/SOLR-13741 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-13741.patch, SOLR-13741.patch, SOLR-13741.patch, > SOLR-13741.patch > > > A while back i saw a weird non-reproducible failure from > AuditLoggerIntegrationTest. When i started reading through that code, 2 > things jumped out at me: > # the way the 'delay' option works is brittle, and makes assumptions about > CPU scheduling that aren't neccessarily going to be true (and also suffers > from the problem that Thread.sleep isn't garunteed to sleep as long as you > ask it too) > # the way the existing {{waitForAuditEventCallbacks(number)}} logic works by > checking the size of a (List) {{buffer}} of recieved events in a sleep/poll > loop, until it contains at least N items -- but the code that adds items to > that buffer in the async Callback thread async _before_ the code that updates > other state variables (like the global {{count}} and the patch specific > {{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 > events added to the buffer, but calling {{assertEquals(3, > receiver.getTotalCount())}} could subsequently fail because that variable > hadn't been udpated yet. > #2 was the source of the failures I was seeing, and while a quick fix for > that specific problem would be to update all other state _before_ adding the > event to the buffer, I set out to try and make more general improvements to > the test: > * eliminate the dependency on sleep loops by {{await}}-ing on concurrent data > structures > * harden the assertions made about the expected events recieved (updating > some test methods that currently just assert the number of events recieved) > * add new assertions that _only_ the expected events are recieved. > In the process of doing this, I've found several oddities/descrepencies > between things the test currently claims/asserts, and what *actually* happens > under more rigerous scrutiny/assertions. > I'll attach a patch shortly that has my (in progress) updates and inlcudes > copious nocommits about things seem suspect. the summary of these concerns > is: > * SolrException status codes that do not match what the existing test says > they should (but doesn't assert) > * extra AuditEvents occuring that the existing test does not expect > * AuditEvents for incorrect credentials that do not at all match the expected > AuditEvent in the existing test -- which the current test seems to miss in > it's assertions because it's picking up some extra events from triggered by > previuos requests earlier in the test that just happen to also match the > asserctions. > ...it's not clear to me if the test logic is correct and these are "code > bugs" or if the test is faulty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13741) possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest
[ https://issues.apache.org/jira/browse/SOLR-13741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-13741: --- Attachment: SOLR-13741.patch > possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest > -- > > Key: SOLR-13741 > URL: https://issues.apache.org/jira/browse/SOLR-13741 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-13741.patch, SOLR-13741.patch, SOLR-13741.patch > > > A while back i saw a weird non-reproducible failure from > AuditLoggerIntegrationTest. When i started reading through that code, 2 > things jumped out at me: > # the way the 'delay' option works is brittle, and makes assumptions about > CPU scheduling that aren't neccessarily going to be true (and also suffers > from the problem that Thread.sleep isn't garunteed to sleep as long as you > ask it too) > # the way the existing {{waitForAuditEventCallbacks(number)}} logic works by > checking the size of a (List) {{buffer}} of recieved events in a sleep/poll > loop, until it contains at least N items -- but the code that adds items to > that buffer in the async Callback thread async _before_ the code that updates > other state variables (like the global {{count}} and the patch specific > {{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 > events added to the buffer, but calling {{assertEquals(3, > receiver.getTotalCount())}} could subsequently fail because that variable > hadn't been udpated yet. > #2 was the source of the failures I was seeing, and while a quick fix for > that specific problem would be to update all other state _before_ adding the > event to the buffer, I set out to try and make more general improvements to > the test: > * eliminate the dependency on sleep loops by {{await}}-ing on concurrent data > structures > * harden the assertions made about the expected events recieved (updating > some test methods that currently just assert the number of events recieved) > * add new assertions that _only_ the expected events are recieved. > In the process of doing this, I've found several oddities/descrepencies > between things the test currently claims/asserts, and what *actually* happens > under more rigerous scrutiny/assertions. > I'll attach a patch shortly that has my (in progress) updates and inlcudes > copious nocommits about things seem suspect. the summary of these concerns > is: > * SolrException status codes that do not match what the existing test says > they should (but doesn't assert) > * extra AuditEvents occuring that the existing test does not expect > * AuditEvents for incorrect credentials that do not at all match the expected > AuditEvent in the existing test -- which the current test seems to miss in > it's assertions because it's picking up some extra events from triggered by > previuos requests earlier in the test that just happen to also match the > asserctions. > ...it's not clear to me if the test logic is correct and these are "code > bugs" or if the test is faulty. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13741) possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest
[ https://issues.apache.org/jira/browse/SOLR-13741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949321#comment-16949321 ] Jan Høydahl commented on SOLR-13741: {quote}why did the comment for a "wrong password" claim it was going to get a 403 exception + audit log ? {quote} It should expect 401 when wrong password, this was probably all confused by SOLR-13835 in the initial test. {quote}Are {{/admin/info/key}} events expect when auth is enabled ? ...is it ok for the test to explicitly ignore these events {quote} I can't recall dealing specially with this path. So muting during tests sounds like the right thing to do. Guess you could argue that it could be muted by default but the framework since it is a public always-open path? {quote}why the _actual_ audit log recieved in the "wrong password" situation is so different (and sparse) compared to other audit log events ? // - the resource is *JUST* '/solr' // - note that "resource" for every other expected event in this test class doesn't even // *START* with (or include) the "/solr" portion of the URL // - event 'resource' values are typically "/admin/etc..." // - the requestType is 'UNKNOWN' // - as opposed to the ADMIN that the existing test exists (and seems like should be correct){quote} I will attach a new patch with some of this fixed: * Parsing "resource" from {{httpRequest.getPathInfo()}} instead of {{httpRequest.getContextPath()}} which is always /solr. * Detecting {{/admin/..}} as admin path in {{AuditEvent.findRequestType}} now that the resource is changed, giving requestType=ADMIN * However, principal is not filled since BasicAuth failed, which I believe is correct. But the HTTP headers are there for inspection... It would be nice to have the user field in AuditEvent also in this case, but that would mean that AuthPlugins would need to set it on MDC or something. It would be wrong to set principal on the request since that always means authenticated user, not? {quote}// - this event has no solrParams at all // - even though the httpQueryString show it's from the CREATE test2 req{quote} This event is generated based on {{HttpServletRequest}} so we have no solrParams at this stage. In the new patch I have initialized the solrParams map from the httpRequest for a more consistent AuditEvent experience. Hoss, this test is now so much better than what I managed to whip up the first time, thanks a ton for digging! > possible AuditLogger bugs uncovered while hardening AuditLoggerIntegrationTest > -- > > Key: SOLR-13741 > URL: https://issues.apache.org/jira/browse/SOLR-13741 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Assignee: Chris M. Hostetter >Priority: Major > Attachments: SOLR-13741.patch, SOLR-13741.patch > > > A while back i saw a weird non-reproducible failure from > AuditLoggerIntegrationTest. When i started reading through that code, 2 > things jumped out at me: > # the way the 'delay' option works is brittle, and makes assumptions about > CPU scheduling that aren't neccessarily going to be true (and also suffers > from the problem that Thread.sleep isn't garunteed to sleep as long as you > ask it too) > # the way the existing {{waitForAuditEventCallbacks(number)}} logic works by > checking the size of a (List) {{buffer}} of recieved events in a sleep/poll > loop, until it contains at least N items -- but the code that adds items to > that buffer in the async Callback thread async _before_ the code that updates > other state variables (like the global {{count}} and the patch specific > {{resourceCounts}}) meaning that a test waiting on 3 events could "see" 3 > events added to the buffer, but calling {{assertEquals(3, > receiver.getTotalCount())}} could subsequently fail because that variable > hadn't been udpated yet. > #2 was the source of the failures I was seeing, and while a quick fix for > that specific problem would be to update all other state _before_ adding the > event to the buffer, I set out to try and make more general improvements to > the test: > * eliminate the dependency on sleep loops by {{await}}-ing on concurrent data > structures > * harden the assertions made about the expected events recieved (updating > some test methods that currently just assert the number of events recieved) > * add new assertions that _only_ the expected events are recieved. > In the process of doing this, I've found several oddities/descrepencies > between things the test currently claims/asserts, and what *actually* happens > under more rigerous scrutiny/assertions. > I'll attach a patch shortly that has my (in progress) updates and inlcudes > copious nocommits
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333896448 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -244,6 +275,213 @@ public void testLRUEviction() throws Exception { dir.close(); } + public void testLRUConcurrentLoadAndEviction() throws Exception { +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); +ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final CountDownLatch[] latch = {new CountDownLatch(1)}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true) { + @Override + protected void onDocIdSetCache(Object readerCoreKey, long ramBytesUsed) { +super.onDocIdSetCache(readerCoreKey, ramBytesUsed); +latch[0].countDown(); + } +}; + +final Query blue = new TermQuery(new Term("color", "blue")); +final Query red = new TermQuery(new Term("color", "red")); +final Query green = new TermQuery(new Term("color", "green")); + +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCache(queryCache); +// the filter is not cached on any segment: no changes +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(green), 1); +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// First read should miss +searcher.search(new ConstantScoreQuery(red), 1); + + +// Let the cache load be completed +latch[0].await(); +searcher.search(new ConstantScoreQuery(red), 1); Review comment: I think we should assert that the hit count incremented, in addition to searching again? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333892366 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,6 +734,21 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { + boolean cacheSynchronously = executor == null; + + // If asynchronous caching is requested, perform the same and return + // the uncached iterator + if (cacheSynchronously == false) { +boolean asyncCachingSucceeded; +asyncCachingSucceeded = cacheAsynchronously(context, cacheHelper); Review comment: merge declaration and assignment? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333896563 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -244,6 +275,213 @@ public void testLRUEviction() throws Exception { dir.close(); } + public void testLRUConcurrentLoadAndEviction() throws Exception { +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); +ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final CountDownLatch[] latch = {new CountDownLatch(1)}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true) { + @Override + protected void onDocIdSetCache(Object readerCoreKey, long ramBytesUsed) { +super.onDocIdSetCache(readerCoreKey, ramBytesUsed); +latch[0].countDown(); + } +}; + +final Query blue = new TermQuery(new Term("color", "blue")); +final Query red = new TermQuery(new Term("color", "red")); +final Query green = new TermQuery(new Term("color", "green")); + +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCache(queryCache); +// the filter is not cached on any segment: no changes +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(green), 1); +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// First read should miss +searcher.search(new ConstantScoreQuery(red), 1); + + +// Let the cache load be completed +latch[0].await(); +searcher.search(new ConstantScoreQuery(red), 1); + +// Second read should hit +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); + +latch[0] = new CountDownLatch(1); +searcher.search(new ConstantScoreQuery(green), 1); + +// Let the cache load be completed +latch[0].await(); +assertEquals(Arrays.asList(red, green), queryCache.cachedQueries()); + +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Arrays.asList(green, red), queryCache.cachedQueries()); Review comment: Check that the hit count incremented? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333894810 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -244,6 +275,213 @@ public void testLRUEviction() throws Exception { dir.close(); } + public void testLRUConcurrentLoadAndEviction() throws Exception { +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); +ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final CountDownLatch[] latch = {new CountDownLatch(1)}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true) { + @Override + protected void onDocIdSetCache(Object readerCoreKey, long ramBytesUsed) { +super.onDocIdSetCache(readerCoreKey, ramBytesUsed); +latch[0].countDown(); + } +}; + +final Query blue = new TermQuery(new Term("color", "blue")); +final Query red = new TermQuery(new Term("color", "red")); +final Query green = new TermQuery(new Term("color", "green")); + +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCache(queryCache); +// the filter is not cached on any segment: no changes +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(green), 1); +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// First read should miss +searcher.search(new ConstantScoreQuery(red), 1); + + +// Let the cache load be completed +latch[0].await(); +searcher.search(new ConstantScoreQuery(red), 1); + +// Second read should hit +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); Review comment: shouldn't we be able to assert on this directly after the call to `latch[0].await();` returns? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333900618 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1691,4 +1954,180 @@ public void testBulkScorerLocking() throws Exception { t.start(); t.join(); } + + public void testRejectedExecution() throws IOException { +ExecutorService service = new TestIndexSearcher.RejectingMockExecutor(); +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); + +final Query red = new TermQuery(new Term("color", "red")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true); + +searcher.setQueryCache(queryCache); +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// To ensure that failing ExecutorService still allows query to run +// successfully + +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); + +reader.close(); +w.close(); +dir.close(); +service.shutdown(); + } + + public void testClosedReaderExecution() throws IOException { +CountDownLatch latch = new CountDownLatch(1); +ExecutorService service = new BlockedMockExecutor(latch); + +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +for (int i = 0; i < 100; i++) { + Document doc = new Document(); + StringField f = new StringField("color", "blue", Store.NO); + doc.add(f); + w.addDocument(doc); + f.setStringValue("red"); + w.addDocument(doc); + f.setStringValue("green"); + w.addDocument(doc); + + if (i % 10 == 0) { +w.commit(); + } +} + +final DirectoryReader reader = w.getReader(); + +final Query red = new TermQuery(new Term("color", "red")); + +IndexSearcher searcher = new IndexSearcher(reader, service) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); Review comment: nit: with recent versions of Java I like `slices.toArray(LeafSlice[]::new);` better This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333892145 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -449,12 +452,8 @@ void assertConsistent() { } @Override - public Weight doCache(Weight weight, QueryCachingPolicy policy) { -while (weight instanceof CachingWrapperWeight) { - weight = ((CachingWrapperWeight) weight).in; -} - -return new CachingWrapperWeight(weight, policy); + public Weight doCache(final Weight weight, QueryCachingPolicy policy, Executor executor) { +return new CachingWrapperWeight(weight, policy, executor); Review comment: should we keep the unwrapping? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333898028 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -244,6 +275,213 @@ public void testLRUEviction() throws Exception { dir.close(); } + public void testLRUConcurrentLoadAndEviction() throws Exception { +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); +ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final CountDownLatch[] latch = {new CountDownLatch(1)}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true) { + @Override + protected void onDocIdSetCache(Object readerCoreKey, long ramBytesUsed) { +super.onDocIdSetCache(readerCoreKey, ramBytesUsed); +latch[0].countDown(); + } +}; + +final Query blue = new TermQuery(new Term("color", "blue")); +final Query red = new TermQuery(new Term("color", "red")); +final Query green = new TermQuery(new Term("color", "green")); + +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCache(queryCache); +// the filter is not cached on any segment: no changes +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(green), 1); +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// First read should miss +searcher.search(new ConstantScoreQuery(red), 1); + + +// Let the cache load be completed +latch[0].await(); +searcher.search(new ConstantScoreQuery(red), 1); + +// Second read should hit +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); + +latch[0] = new CountDownLatch(1); +searcher.search(new ConstantScoreQuery(green), 1); + +// Let the cache load be completed +latch[0].await(); +assertEquals(Arrays.asList(red, green), queryCache.cachedQueries()); + +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Arrays.asList(green, red), queryCache.cachedQueries()); + +latch[0] = new CountDownLatch(1); + +searcher.search(new ConstantScoreQuery(blue), 1); + +// Let the cache load be completed +latch[0].await(); +assertEquals(Arrays.asList(red, blue), queryCache.cachedQueries()); + +searcher.search(new ConstantScoreQuery(blue), 1); +assertEquals(Arrays.asList(red, blue), queryCache.cachedQueries()); + +latch[0] = new CountDownLatch(1); + +searcher.search(new ConstantScoreQuery(green), 1); + +// Let the cache load be completed +latch[0].await(); +assertEquals(Arrays.asList(blue, green), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Arrays.asList(blue, green), queryCache.cachedQueries()); Review comment: maybe move the call to service.shutdown() above this line and also call `awaitTermination` to make sure that any ongoing cache operation are done so that the assertion doesn't succeed only because we are lucky with timing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333898690 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -244,6 +275,213 @@ public void testLRUEviction() throws Exception { dir.close(); } + public void testLRUConcurrentLoadAndEviction() throws Exception { +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); +ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final CountDownLatch[] latch = {new CountDownLatch(1)}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true) { + @Override + protected void onDocIdSetCache(Object readerCoreKey, long ramBytesUsed) { +super.onDocIdSetCache(readerCoreKey, ramBytesUsed); +latch[0].countDown(); + } +}; + +final Query blue = new TermQuery(new Term("color", "blue")); +final Query red = new TermQuery(new Term("color", "red")); +final Query green = new TermQuery(new Term("color", "green")); + +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCache(queryCache); +// the filter is not cached on any segment: no changes +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(green), 1); +assertEquals(Collections.emptyList(), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// First read should miss +searcher.search(new ConstantScoreQuery(red), 1); + + +// Let the cache load be completed +latch[0].await(); +searcher.search(new ConstantScoreQuery(red), 1); + +// Second read should hit +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); + +latch[0] = new CountDownLatch(1); +searcher.search(new ConstantScoreQuery(green), 1); + +// Let the cache load be completed +latch[0].await(); +assertEquals(Arrays.asList(red, green), queryCache.cachedQueries()); + +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Arrays.asList(green, red), queryCache.cachedQueries()); + +latch[0] = new CountDownLatch(1); + +searcher.search(new ConstantScoreQuery(blue), 1); + +// Let the cache load be completed +latch[0].await(); +assertEquals(Arrays.asList(red, blue), queryCache.cachedQueries()); + +searcher.search(new ConstantScoreQuery(blue), 1); +assertEquals(Arrays.asList(red, blue), queryCache.cachedQueries()); + +latch[0] = new CountDownLatch(1); + +searcher.search(new ConstantScoreQuery(green), 1); + +// Let the cache load be completed +latch[0].await(); +assertEquals(Arrays.asList(blue, green), queryCache.cachedQueries()); + +searcher.setQueryCachingPolicy(NEVER_CACHE); +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Arrays.asList(blue, green), queryCache.cachedQueries()); + +reader.close(); +w.close(); +dir.close(); +service.shutdown(); + } + + public void testLRUConcurrentLoadsOfSameQuery() throws Exception { +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); +ExecutorService service = new ThreadPoolExecutor(4, 4, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +ExecutorService stressService = new ThreadPoolExecutor(15, 15, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache2")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final CountDownLatch latch = new CountDownLatch(1); + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true) { + @Override + protected void onDocIdSetCache(Object readerCoreKey, long ramBytesUsed) { +super.onDocIdSetCache(readerCoreKey, ramBytesUsed); +latch.countDown(); + } +}; + +final Query green
[GitHub] [lucene-solr] jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache
jpountz commented on a change in pull request #916: LUCENE-8213: Asynchronous Caching in LRUQueryCache URL: https://github.com/apache/lucene-solr/pull/916#discussion_r333907193 ## File path: lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java ## @@ -1691,4 +1954,180 @@ public void testBulkScorerLocking() throws Exception { t.start(); t.join(); } + + public void testRejectedExecution() throws IOException { +ExecutorService service = new TestIndexSearcher.RejectingMockExecutor(); +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +Document doc = new Document(); +StringField f = new StringField("color", "blue", Store.NO); +doc.add(f); +w.addDocument(doc); +f.setStringValue("red"); +w.addDocument(doc); +f.setStringValue("green"); +w.addDocument(doc); +final DirectoryReader reader = w.getReader(); + +final Query red = new TermQuery(new Term("color", "red")); + +IndexSearcher searcher = new IndexSearcher(reader, service); + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true); + +searcher.setQueryCache(queryCache); +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// To ensure that failing ExecutorService still allows query to run +// successfully + +searcher.search(new ConstantScoreQuery(red), 1); +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); + +reader.close(); +w.close(); +dir.close(); +service.shutdown(); + } + + public void testClosedReaderExecution() throws IOException { +CountDownLatch latch = new CountDownLatch(1); +ExecutorService service = new BlockedMockExecutor(latch); + +Directory dir = newDirectory(); +final RandomIndexWriter w = new RandomIndexWriter(random(), dir); + +for (int i = 0; i < 100; i++) { + Document doc = new Document(); + StringField f = new StringField("color", "blue", Store.NO); + doc.add(f); + w.addDocument(doc); + f.setStringValue("red"); + w.addDocument(doc); + f.setStringValue("green"); + w.addDocument(doc); + + if (i % 10 == 0) { +w.commit(); + } +} + +final DirectoryReader reader = w.getReader(); + +final Query red = new TermQuery(new Term("color", "red")); + +IndexSearcher searcher = new IndexSearcher(reader, service) { + @Override + protected LeafSlice[] slices(List leaves) { +ArrayList slices = new ArrayList<>(); +for (LeafReaderContext ctx : leaves) { + slices.add(new LeafSlice(Arrays.asList(ctx))); +} +return slices.toArray(new LeafSlice[0]); + } +}; + +final LRUQueryCache queryCache = new LRUQueryCache(2, 10, context -> true); + +searcher.setQueryCache(queryCache); +searcher.setQueryCachingPolicy(ALWAYS_CACHE); + +// To ensure that failing ExecutorService still allows query to run +// successfully + +ExecutorService tempService = new ThreadPoolExecutor(2, 2, 0L, TimeUnit.MILLISECONDS, +new LinkedBlockingQueue(), +new NamedThreadFactory("TestLRUQueryCache")); + +tempService.submit(new Runnable() { + @Override + public void run() { +try { + Thread.sleep(100); + reader.close(); +} catch (Exception e) { + throw new RuntimeException(e.getMessage()); +} + +latch.countDown(); + + } +}); + +searcher.search(new ConstantScoreQuery(red), 1); + +assertEquals(Collections.singletonList(red), queryCache.cachedQueries()); Review comment: This assertion is actually proving that the test is not working? We would except that nothing gets cached since the reader is already closed by the time that the executor needs to cache the query? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333896983 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if ((cost > skipCacheCost || cost > leadCost * skipCacheFactor) + && in.getQuery() instanceof IndexOrDocValuesQuery) { Review comment: This PR is mainly for IndexOrDocValuesQuery now. As discussed earlier, the reason why IndexOrDocValuesQuery slow down is that a large amount of data will be read during caching action, while only a small amount of data will be read from doc values when not caching. I don't find any other type of query that reads much more data for caching than it really needs. @jpountz Looking forward to more discussions if you this PR should apply to all query types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333896831 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if ((cost > skipCacheCost || cost > leadCost * skipCacheFactor) Review comment: We have tested different scenarios to observe the query latency with/without cacheing in an online ES cluster. Here is the result: | queryPattern | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | -- | :---: | :---: | :---: | :---: | :---: | | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 | | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 | | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 | | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 | | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 1274 | | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 2548 | As the table shows, query latency without caching is low and it's related with the final result set. Query latency with caching is much high and it's mainly related with _rangeQueryCost_. According to the above test, we set the default value of _skipCacheFactor_ to 250, which make the query slower by no more than 10 times. In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in LUCENE-8027, we add a new parameter _skipCacheCost_. The mainly reasons are: - control the time used for caching as the caching time is related to the cost of range query. - skip caching too large range queries which will consume too much memory and evict cache frequently. How do you think? Looking forward to your ideas. @jpountz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333815178 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if (cost > skipCacheCost && cost > leadCost * skipCacheFactor + && in.getQuery() instanceof IndexOrDocValuesQuery) { Review comment: This PR is mainly for IndexOrDocValuesQuery now. As discussed earlier, the reason why IndexOrDocValuesQuery slow down is that a large amount of data will be read during caching action, while only a small amount of data will be read from doc values when not caching. I don't find any other type of query that reads much more data for caching than it really needs. @jpountz Looking forward to more discussions if you this PR should apply to all query types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333851594 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if (cost > skipCacheCost && cost > leadCost * skipCacheFactor Review comment: We have tested different scenarios to observe the query latency with/without cacheing in an online ES cluster. Here is the result: | queryPattern | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | -- | :---: | :---: | :---: | :---: | :---: | | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 | | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 | | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 | | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 | | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 1274 | | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 2548 | As the table shows, query latency without caching is low and it's related with the final result set. Query latency with caching is much high and it's mainly related with _rangeQueryCost_. According to the above test, we set the default value of _skipCacheFactor_ to 250, which make the query slower by no more than 10 times. In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in LUCENE-8027, we add a new parameter _skipCacheCost_. The mainly reasons are: - control the time used for caching as the caching time is related to the cost of range query. - skip caching too large range queries which will consume too much memory and evict cache frequently. How do you think? Looking forward to your ideas. @jpountz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333851594 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if (cost > skipCacheCost && cost > leadCost * skipCacheFactor Review comment: We have tested different scenarios to observe the query latency with/without cacheing in an online ES cluster. Here is the result: | queryPattern | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | -- | :---: | :---: | :---: | :---: | :---: | | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 | | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 | | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 | | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 | | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 1274 | | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 2548 | As the table shows, query latency without caching is low and it's related with the final result set. Query latency with caching is much high and it's mainly related with _rangeQueryCost_. According to the above test, we set the default value of _skipCacheFactor_ to 250, which make the query slower by no more than 10 times. In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in LUCENE-8027, we add a new parameter _skipCacheCost_. The mainly reasons are: - control the time used for caching as the caching time is related to the cost of range query. - skip caching too large range queries which will consume too much memory and evict cache frequently. How do you think? Looking forward to your ideas. @jpountz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333851594 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if (cost > skipCacheCost && cost > leadCost * skipCacheFactor Review comment: We have tested different scenarios to observe the query latency with/without cacheing in an online metric ES cluster. The result is as follows: | queryPattern | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | -- | :---: | :---: | :---: | :---: | :---: | | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 | | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 | | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 | | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 | | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 1274 | | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 2548 | As the table show, query latency without caching is low and related with the final result set, and query latency with caching is much high and mainly related with _rangeQueryCost_. According to the above test, we set the default value of _skipCacheFactor_ to 250, which make the query slower by no more than 10 times. In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in LUCENE-8027, we have added a new parameter _skipCacheCost_. The mainly reasons are: - control the time used for caching as the caching time is related to the cost of range query. - skip caching too large range queries which will consume too much memory and evict cache frequently. How do you think? Looking forward to your ideas. @jpountz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333815178 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if (cost > skipCacheCost && cost > leadCost * skipCacheFactor + && in.getQuery() instanceof IndexOrDocValuesQuery) { Review comment: This PR is mainly for IndexOrDocValuesQuery now. As discussed earlier, the reason why IndexOrDocValuesQuery slow down is that a large amount of data will be read during caching action, while only a small amount of data will be read from doc values when not caching. I don't find any other type of query that reads much more data for caching than it really needs. @jpountz Looking forward to more discussions if you this PR should apply to all query types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9004) Approximate nearest vector search
[ https://issues.apache.org/jira/browse/LUCENE-9004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949250#comment-16949250 ] Adrien Grand commented on LUCENE-9004: -- Pretty cool! I don't know HNSW so I can't comment on that part but it made me wonder about a couple things: * +1 to per-segment structure and rebuild graphs when merging. * You hacked doc-value formats for this POC, but I guess your end idea would have a dedicated file-format (in the Lucene API sense) to support this, e.g. VectorFileFormat, like we have PostingsFormat or PointsFormat? * You added a TODO about supporting ints and floats, I worry this would complicate things too much. Supporting float only has a great advantage which is that you can compute distances with doubles and never have to worry about overflows or underflows. This would be much more challenging if we supported doubles. Regarding ints, codecs could optimize for the case when all dimensions don't have a fractional part (bfloat16 is another type that we might want to optimize for). * You said there is "no Query implementation", but I suspect getting one will be challenging with the current Query API which requires ordered iterators of doc IDs and accept arbitrary filters. So if you were to intersect with a selective filter, you wouldn't be able to know up-front how many nearest-neighbors you'd need to filter. Something like LongDistanceFeatureQuery or LatLonPointDistanceFeatureQuery which further filters documents as more documents get collected would be nice, but this sounds very challenging with high numbers of dimensions? > Approximate nearest vector search > - > > Key: LUCENE-9004 > URL: https://issues.apache.org/jira/browse/LUCENE-9004 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Michael Sokolov >Priority: Major > > "Semantic" search based on machine-learned vector "embeddings" representing > terms, queries and documents is becoming a must-have feature for a modern > search engine. SOLR-12890 is exploring various approaches to this, including > providing vector-based scoring functions. This is a spinoff issue from that. > The idea here is to explore approximate nearest-neighbor search. Researchers > have found an approach based on navigating a graph that partially encodes the > nearest neighbor relation at multiple scales can provide accuracy > 95% (as > compared to exact nearest neighbor calculations) at a reasonable cost. This > issue will explore implementing HNSW (hierarchical navigable small-world) > graphs for the purpose of approximate nearest vector search (often referred > to as KNN or k-nearest-neighbor search). > At a high level the way this algorithm works is this. First assume you have a > graph that has a partial encoding of the nearest neighbor relation, with some > short and some long-distance links. If this graph is built in the right way > (has the hierarchical navigable small world property), then you can > efficiently traverse it to find nearest neighbors (approximately) in log N > time where N is the number of nodes in the graph. I believe this idea was > pioneered in [1]. The great insight in that paper is that if you use the > graph search algorithm to find the K nearest neighbors of a new document > while indexing, and then link those neighbors (undirectedly, ie both ways) to > the new document, then the graph that emerges will have the desired > properties. > The implementation I propose for Lucene is as follows. We need two new data > structures to encode the vectors and the graph. We can encode vectors using a > light wrapper around {{BinaryDocValues}} (we also want to encode the vector > dimension and have efficient conversion from bytes to floats). For the graph > we can use {{SortedNumericDocValues}} where the values we encode are the > docids of the related documents. Encoding the interdocument relations using > docids directly will make it relatively fast to traverse the graph since we > won't need to lookup through an id-field indirection. This choice limits us > to building a graph-per-segment since it would be impractical to maintain a > global graph for the whole index in the face of segment merges. However > graph-per-segment is a very natural at search time - we can traverse each > segments' graph independently and merge results as we do today for term-based > search. > At index time, however, merging graphs is somewhat challenging. While > indexing we build a graph incrementally, performing searches to construct > links among neighbors. When merging segments we must construct a new graph > containing elements of all the merged segments. Ideally we would somehow > preserve the work done when building the initial graphs, but at least as a > start I'd propose we construct a new graph
[jira] [Commented] (SOLR-13835) HttpSolrCall produces incorrect extra AuditEvent on AuthorizationResponse.PROMPT
[ https://issues.apache.org/jira/browse/SOLR-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949225#comment-16949225 ] Jan Høydahl commented on SOLR-13835: The first if block was introduced back in 2005 as part of SOLR-7757. [~noble.paul] why does the if not return? It will *always* fall through to and trigger the next if block! > HttpSolrCall produces incorrect extra AuditEvent on > AuthorizationResponse.PROMPT > > > Key: SOLR-13835 > URL: https://issues.apache.org/jira/browse/SOLR-13835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Authentication, Authorization >Reporter: Chris M. Hostetter >Priority: Major > > spinning this out of SOLR-13741... > {quote} > Wrt the REJECTED + UNAUTHORIZED events I see the same as you, and I believe > there is a code bug, not a test bug. In HttpSolrCall#471 in the > {{authorize()}} call, if authResponse == PROMPT, it will actually match both > blocks and emit two audit events: > [https://github.com/apache/lucene-solr/blob/26ede632e6259eb9d16861a3c0f782c9c8999762/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L475:L493] > > {code:java} > if (authResponse.statusCode == AuthorizationResponse.PROMPT.statusCode) {...} > if (!(authResponse.statusCode == HttpStatus.SC_ACCEPTED) && > !(authResponse.statusCode == HttpStatus.SC_OK)) {...} > {code} > When code==401, it is also true that code!=200. Intuitively there should be > both a sendErrora and return RETURN before line #484 in the first if block? > {quote} > This causes any and all {{REJECTED}} AuditEvent messages to be accompanied by > a coresponding {{UNAUTHORIZED}} AuditEvent. > It's not yet clear if, from the perspective of the external client, there are > any other bugs in behavior (TBD) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13835) HttpSolrCall produces incorrect extra AuditEvent on AuthorizationResponse.PROMPT
[ https://issues.apache.org/jira/browse/SOLR-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-13835: -- Assignee: (was: Jan Høydahl) > HttpSolrCall produces incorrect extra AuditEvent on > AuthorizationResponse.PROMPT > > > Key: SOLR-13835 > URL: https://issues.apache.org/jira/browse/SOLR-13835 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Authentication, Authorization >Reporter: Chris M. Hostetter >Priority: Major > > spinning this out of SOLR-13741... > {quote} > Wrt the REJECTED + UNAUTHORIZED events I see the same as you, and I believe > there is a code bug, not a test bug. In HttpSolrCall#471 in the > {{authorize()}} call, if authResponse == PROMPT, it will actually match both > blocks and emit two audit events: > [https://github.com/apache/lucene-solr/blob/26ede632e6259eb9d16861a3c0f782c9c8999762/solr/core/src/java/org/apache/solr/servlet/HttpSolrCall.java#L475:L493] > > {code:java} > if (authResponse.statusCode == AuthorizationResponse.PROMPT.statusCode) {...} > if (!(authResponse.statusCode == HttpStatus.SC_ACCEPTED) && > !(authResponse.statusCode == HttpStatus.SC_OK)) {...} > {code} > When code==401, it is also true that code!=200. Intuitively there should be > both a sendErrora and return RETURN before line #484 in the first if block? > {quote} > This causes any and all {{REJECTED}} AuditEvent messages to be accompanied by > a coresponding {{UNAUTHORIZED}} AuditEvent. > It's not yet clear if, from the perspective of the external client, there are > any other bugs in behavior (TBD) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333851594 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if (cost > skipCacheCost && cost > leadCost * skipCacheFactor Review comment: We have tested different scenarios to observe the query latency with/without cacheing in an online metric ES cluster. The result is as follows: | queryPattern | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | -- | :---: | :---: | :---: | :---: | :---: | | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 | | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 | | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 | | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 | | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 1274 | | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 2548 | As the table show, query latency without caching is low and related with the final result set, and query latency with caching is much high and mainly related with _rangeQueryCost_. We set the default value of _skipCacheFactor_ to 250, which make the query slower by no more than 10 times. In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in LUCENE-8027, we have added a new parameter _skipCacheCost_. The mainly reasons are: - control the time used for caching as the caching time is related to the cost of range query. - skip caching too large range queries which will consume too much memory and evict cache frequently. How do you think? Looking forward to your ideas. @jpountz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries
jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r333851594 ## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { +putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); +return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { +@Override +public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if (cost > skipCacheCost && cost > leadCost * skipCacheFactor Review comment: We have tested different scenarios to observe the query latency with/without cacheing in an online metric ES cluster. The result is as follows: | query | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | -- | :---: | :---: | :---: | :---: | :---: | | ip:xxx AND time:[t-1h, t] | 10ms | 36ms(+260%) | 20528 | 878979 | 42 | | ip:xxx AND time:[t-4h, t] | 10ms | 100ms(+900%) | 20528 | 4365870 | 212 | | ip:xxx AND time:[t-8h, t] | 11ms | 200ms(+1700%) | 20528 | 8724483 | 425 | | ip:xxx AND time:[t-12h, t] | 12ms | 300ms(+2400%) | 20528 | 13083096 | 637 | | ip:xxx AND time:[t-24h, t] | 16ms | 500ms(+3000%) | 20528 | 26158936 | 1274 | | ip:xxx AND time:[t-48h, t] | 30ms | 1200ms(3900%) | 20528 | 52310616 | 2548 | As the table show, query latency without caching is low and related with the final result set, and query latency with caching is much high and mainly related with _rangeQueryCost_. We set the default value of _skipCacheFactor_ to 250, which make the query slower by no more than 10 times. In addition to _skipCacheFactor_ which is similar to _maxCostFactor_ in LUCENE-8027, we have added a new parameter _skipCacheCost_. The mainly reasons are: - control the time used for caching as the caching time is related to the cost of range query. - skip caching too large range queries which will consume too much memory and evict cache frequently. How do you think? Looking forward to your ideas. @jpountz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] treygrainger opened a new pull request #941: SOLR-13836: Add 'streaming_expression' QParser
treygrainger opened a new pull request #941: SOLR-13836: Add 'streaming_expression' QParser URL: https://github.com/apache/lucene-solr/pull/941 # Description It is currently possible to hit the search handler in a streaming expression ("search(...)"), but it is not currently possible to invoke a streaming expression from within a regular search within the search handler. In some cases, it would be useful to leverage the power of streaming expressions to generate a result set and then join that result set with a normal set of search results. This likely won't be particularly efficient for high cardinality streaming expression results, but it will be pretty powerful feature that could enable a bunch of use cases that aren't possible today within a normal search. See https://issues.apache.org/jira/browse/SOLR-13836 for usage information. # Solution The current solution adds a StreamingExpressionQParserPlugin which executes a streaming expression and joins the tuples returned on an id field with the main docset. The field name from the streaming expression tuples can be overridden ("f" param), as well as the method of joining ("method" parameter). # Usage *Docs:* ``` curl -X POST -H "Content-Type: application/json" http://localhost:8983/solr/food_collection/update?commit=true --data-binary ' [ {"id": "1", "name_s":"donut","vector_fs":[5.0,0.0,1.0,5.0,0.0,4.0,5.0,1.0]}, {"id": "2", "name_s":"apple juice","vector_fs":[1.0,5.0,0.0,0.0,0.0,4.0,4.0,3.0]}, {"id": "3", "name_s":"cappuccino","vector_fs":[0.0,5.0,3.0,0.0,4.0,1.0,2.0,3.0]}, {"id": "4", "name_s":"cheese pizza","vector_fs":[5.0,0.0,4.0,4.0,0.0,1.0,5.0,2.0]}, {"id": "5", "name_s":"green tea","vector_fs":[0.0,5.0,0.0,0.0,2.0,1.0,1.0,5.0]}, {"id": "6", "name_s":"latte","vector_fs":[0.0,5.0,4.0,0.0,4.0,1.0,3.0,3.0]}, {"id": "7", "name_s":"soda","vector_fs":[0.0,5.0,0.0,0.0,3.0,5.0,5.0,0.0]}, {"id": "8", "name_s":"cheese bread sticks","vector_fs":[5.0,0.0,4.0,5.0,0.0,1.0,4.0,2.0]}, {"id": "9", "name_s":"water","vector_fs":[0.0,5.0,0.0,0.0,0.0,0.0,0.0,5.0]}, {"id": "10", "name_s":"cinnamon bread sticks","vector_fs":[5.0,0.0,1.0,5.0,0.0,3.0,4.0,2.0]} ] ``` *Query:* ``` http://localhost:8983/solr/food/select?q=*:*=\{!streaming_expression}top(select(search(food,%20q=%22*:*%22,%20fl=%22id,vector_fs%22,%20sort=%22id%20asc%22),%20cosineSimilarity(vector_fs,%20array(5.1,0.0,1.0,5.0,0.0,4.0,5.0,1.0))%20as%20cos,%20id),%20n=5,%20sort=%22cos%20desc%22)=id,name_s ``` *Response:* ``` { "responseHeader":{ "zkConnected":true, "status":0, "QTime":7, "params":{ "q":"*:*", "fl":"id,name_s", "fq":"{!streaming_expression}top(select(search(food, q=\"*:*\", fl=\"id,vector_fs\", sort=\"id asc\"), cosineSimilarity(vector_fs, array(5.2,0.0,1.0,5.0,0.0,4.0,5.0,1.0)) as cos, id), n=5, sort=\"cos desc\")"}}, "response":{"numFound":5,"start":0,"docs":[ { "name_s":"donut", "id":"1"}, { "name_s":"apple juice", "id":"2"}, { "name_s":"cheese pizza", "id":"4"}, { "name_s":"cheese bread sticks", "id":"8"}, { "name_s":"cinnamon bread sticks", "id":"10"}] }} ``` # Tests No tests written yet. First draft. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I am authorized to contribute this code to the ASF and have removed any code I do not have a license to distribute. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org