[jira] [Commented] (SOLR-12616) Track down performance slowdowns with ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16575995#comment-16575995 ] ASF subversion and git services commented on SOLR-12616: Commit e9f3a3ce1d482bd90ba8aca6e8cb7fe6c86756eb in lucene-solr's branch refs/heads/jira/http2 from [~varunthacker] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e9f3a3c ] SOLR-12616: Optimize Export writer upto 4 sort fields to get better performance. This was removed in SOLR-11598 but brought back in the same version > Track down performance slowdowns with ExportWriter > -- > > Key: SOLR-12616 > URL: https://issues.apache.org/jira/browse/SOLR-12616 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Varun Thacker >Priority: Major > Fix For: master (8.0), 7.5 > > Attachments: DefaultCode-1.png, DefaultCode-2.png, SOLR-12616.patch, > SOLR-12616.patch, SingleSortValue-1.png, SingleSortValue-2.png > > > Just to be clear for users glancing through this Jira : The performance > slowdown is currently on an unreleased version of Solr so no versions are > affected by this. > While doing some benchmarking for SOLR-12572 , I compared the export writers > performance against Solr 7.4 and there seems to be some slowdowns that have > been introduced. Most likely this is because of SOLR-11598 > In an 1 shard 1 replica collection with 25M docs. We issue the following > query > {code:java} > /export?q=*:*=id desc=id{code} > Solr 7.4 took 8:10 , 8:20 and 8:22 in the 3 runs that I did > Master took 10:46 > Amrit's done some more benchmarking so he can fill in with some more numbers > here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12616) Track down performance slowdowns with ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573824#comment-16573824 ] ASF subversion and git services commented on SOLR-12616: Commit 13b9e28f9dbb0d117d8758c37d8df7d4c17a9edc in lucene-solr's branch refs/heads/branch_7x from [~varunthacker] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=13b9e28f ] SOLR-12616: Optimize Export writer upto 4 sort fields to get better performance. This was removed in SOLR-11598 but brought back in the same version (cherry picked from commit e9f3a3c) > Track down performance slowdowns with ExportWriter > -- > > Key: SOLR-12616 > URL: https://issues.apache.org/jira/browse/SOLR-12616 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Varun Thacker >Priority: Major > Fix For: master (8.0), 7.5 > > Attachments: DefaultCode-1.png, DefaultCode-2.png, SOLR-12616.patch, > SOLR-12616.patch, SingleSortValue-1.png, SingleSortValue-2.png > > > Just to be clear for users glancing through this Jira : The performance > slowdown is currently on an unreleased version of Solr so no versions are > affected by this. > While doing some benchmarking for SOLR-12572 , I compared the export writers > performance against Solr 7.4 and there seems to be some slowdowns that have > been introduced. Most likely this is because of SOLR-11598 > In an 1 shard 1 replica collection with 25M docs. We issue the following > query > {code:java} > /export?q=*:*=id desc=id{code} > Solr 7.4 took 8:10 , 8:20 and 8:22 in the 3 runs that I did > Master took 10:46 > Amrit's done some more benchmarking so he can fill in with some more numbers > here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12616) Track down performance slowdowns with ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573820#comment-16573820 ] ASF subversion and git services commented on SOLR-12616: Commit e9f3a3ce1d482bd90ba8aca6e8cb7fe6c86756eb in lucene-solr's branch refs/heads/master from [~varunthacker] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e9f3a3c ] SOLR-12616: Optimize Export writer upto 4 sort fields to get better performance. This was removed in SOLR-11598 but brought back in the same version > Track down performance slowdowns with ExportWriter > -- > > Key: SOLR-12616 > URL: https://issues.apache.org/jira/browse/SOLR-12616 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Varun Thacker >Priority: Major > Attachments: DefaultCode-1.png, DefaultCode-2.png, SOLR-12616.patch, > SOLR-12616.patch, SingleSortValue-1.png, SingleSortValue-2.png > > > Just to be clear for users glancing through this Jira : The performance > slowdown is currently on an unreleased version of Solr so no versions are > affected by this. > While doing some benchmarking for SOLR-12572 , I compared the export writers > performance against Solr 7.4 and there seems to be some slowdowns that have > been introduced. Most likely this is because of SOLR-11598 > In an 1 shard 1 replica collection with 25M docs. We issue the following > query > {code:java} > /export?q=*:*=id desc=id{code} > Solr 7.4 took 8:10 , 8:20 and 8:22 in the 3 runs that I did > Master took 10:46 > Amrit's done some more benchmarking so he can fill in with some more numbers > here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12616) Track down performance slowdowns with ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16573611#comment-16573611 ] Varun Thacker commented on SOLR-12616: -- Patch which adds back SingleValueSortDoc/ DoubleValueSortDoc/ TripleValueSortDoc/ QuadValueSortDoc classes. The speed is back to the original speed after doing some tests. > Track down performance slowdowns with ExportWriter > -- > > Key: SOLR-12616 > URL: https://issues.apache.org/jira/browse/SOLR-12616 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Varun Thacker >Priority: Major > Attachments: DefaultCode-1.png, DefaultCode-2.png, SOLR-12616.patch, > SOLR-12616.patch, SingleSortValue-1.png, SingleSortValue-2.png > > > Just to be clear for users glancing through this Jira : The performance > slowdown is currently on an unreleased version of Solr so no versions are > affected by this. > While doing some benchmarking for SOLR-12572 , I compared the export writers > performance against Solr 7.4 and there seems to be some slowdowns that have > been introduced. Most likely this is because of SOLR-11598 > In an 1 shard 1 replica collection with 25M docs. We issue the following > query > {code:java} > /export?q=*:*=id desc=id{code} > Solr 7.4 took 8:10 , 8:20 and 8:22 in the 3 runs that I did > Master took 10:46 > Amrit's done some more benchmarking so he can fill in with some more numbers > here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12616) Track down performance slowdowns with ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571751#comment-16571751 ] Amrit Sarkar commented on SOLR-12616: - Thanks Varun for such detailed analysis and feedback on the issue. I see SOLR-11598 resulted in slowness. Benchmarks done at my end also validates an approx 10-18% slowdowns in overall processing of export results. I digged deeper and found the actual function which is slow, but have no idea of reason. Let me share the analysis first: For query Q1, single sort, with {{SingleValueSortDoc}} introduced again, taking 4 mins, while vanila master branch code taking 4:45 mins. I attache a sampler and attaching screenshots for the respective export query executions. If you see screenshots: {{SingleSortValue-2}} and {{DefaultSortValue-2}}, the only significant difference (around 33 secs) between the processing times of respective executions is {{setCurrentValue(docId)}}, which we haven't touched. SingleSortValue: setCurrentValue(docId): *148 secs* DefaultCode: setCurrentValue(docId): *181 secs* I have analyzed the code properly enough to conclude we are not making extra / unnecessary calls for {{setCurrentValue}}, we know the exact line number which is causing the slowness: *ExportWriter:235* > Track down performance slowdowns with ExportWriter > -- > > Key: SOLR-12616 > URL: https://issues.apache.org/jira/browse/SOLR-12616 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Varun Thacker >Priority: Major > Attachments: DefaultCode-1.png, DefaultCode-2.png, SOLR-12616.patch, > SingleSortValue-1.png, SingleSortValue-2.png > > > Just to be clear for users glancing through this Jira : The performance > slowdown is currently on an unreleased version of Solr so no versions are > affected by this. > While doing some benchmarking for SOLR-12572 , I compared the export writers > performance against Solr 7.4 and there seems to be some slowdowns that have > been introduced. Most likely this is because of SOLR-11598 > In an 1 shard 1 replica collection with 25M docs. We issue the following > query > {code:java} > /export?q=*:*=id desc=id{code} > Solr 7.4 took 8:10 , 8:20 and 8:22 in the 3 runs that I did > Master took 10:46 > Amrit's done some more benchmarking so he can fill in with some more numbers > here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12616) Track down performance slowdowns with ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571042#comment-16571042 ] Varun Thacker commented on SOLR-12616: -- I can't seem to track down the difference b/w SortDoc and SingleValueSortDoc and why SingleValueSortDoc is so much faster. I tried another round of experiments where I assumed SortDoc will only have one sort field and modified the following functions to mimic SingleValueSortDoc . The only 1 difference being sortValues is still an array of length one VS a single variable. The speed difference still exists {code:java} public void setValues(SortDoc sortDoc) { this.docId = sortDoc.docId; this.ord = sortDoc.ord; this.docBase = sortDoc.docBase; sortValues[0].setCurrentValue((sortDoc.sortValues[0])); } public boolean lessThan(Object o) { if(docId == -1) { return true; } int comp = sortValues[0].compareTo(sd.sortValues[0]); if(comp == -1) { return true; } else if (comp == 1) { return false; } else { return docId+docBase > sd.docId+sd.docBase; } } {code} To bring back the old performance the one approach we could take is still keep the specialized classes for upto 4 sort fields by doing this in the export writer {code:java} if (sortValues.length == 1) { return new SingleValueSortDoc(sortValues[0]); } else if (sortValues.length == 2) { return new DoubleValueSortDoc(sortValues[0]); } ... for 3 and 4 sort fields .. else { return new SortDoc(sortValues); } {code} > Track down performance slowdowns with ExportWriter > -- > > Key: SOLR-12616 > URL: https://issues.apache.org/jira/browse/SOLR-12616 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: streaming expressions >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12616.patch > > > Just to be clear for users glancing through this Jira : The performance > slowdown is currently on an unreleased version of Solr so no versions are > affected by this. > While doing some benchmarking for SOLR-12572 , I compared the export writers > performance against Solr 7.4 and there seems to be some slowdowns that have > been introduced. Most likely this is because of SOLR-11598 > In an 1 shard 1 replica collection with 25M docs. We issue the following > query > {code:java} > /export?q=*:*=id desc=id{code} > Solr 7.4 took 8:10 , 8:20 and 8:22 in the 3 runs that I did > Master took 10:46 > Amrit's done some more benchmarking so he can fill in with some more numbers > here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12616) Track down performance slowdowns with ExportWriter
[ https://issues.apache.org/jira/browse/SOLR-12616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16569031#comment-16569031 ] Varun Thacker commented on SOLR-12616: -- Patch which tests {{SingleValueSortDoc}} vs SortDoc Indexed 25M docs onto a 1 shard X 1 replica collection. query - {{/export?q=*:*=id=id desc}} With {{-Dtest.export.writer.optimized=true}} = 7m13 , 7m23 Without {{-Dtest.export.writer.optimized=true}} = 10m27 , 10m31 I haven't started looking into what's difference b/w SortDoc and SingleValueSortDoc because of which we see such speed differences. > Track down performance slowdowns with ExportWriter > -- > > Key: SOLR-12616 > URL: https://issues.apache.org/jira/browse/SOLR-12616 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Varun Thacker >Priority: Major > Attachments: SOLR-12616.patch > > > Just to be clear for users glancing through this Jira : The performance > slowdown is currently on an unreleased version of Solr so no versions are > affected by this. > While doing some benchmarking for SOLR-12572 , I compared the export writers > performance against Solr 7.4 and there seems to be some slowdowns that have > been introduced. Most likely this is because of SOLR-11598 > In an 1 shard 1 replica collection with 25M docs. We issue the following > query > {code:java} > /export?q=*:*=id desc=id{code} > Solr 7.4 took 8:10 , 8:20 and 8:22 in the 3 runs that I did > Master took 10:46 > Amrit's done some more benchmarking so he can fill in with some more numbers > here. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org