[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877096#comment-16877096 ] ASF subversion and git services commented on LUCENE-8757: - Commit 5e109fb0a76070c0ccb8e36f0c9db2f297a246c5 in lucene-solr's branch refs/heads/SOLR-13105-visual from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5e109fb ] LUCENE-8757: Move changes entry. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876689#comment-16876689 ] Adrien Grand commented on LUCENE-8757: -- This change as been reverted from 8.x due to the fact that it required changes to TopDocs#merge that would necessarily be breaking to our users. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876685#comment-16876685 ] ASF subversion and git services commented on LUCENE-8757: - Commit 8448a33ed8ffc978497a0d21e8042a61bf0ddd04 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8448a33 ] LUCENE-8757: Revert on 8.x. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876688#comment-16876688 ] ASF subversion and git services commented on LUCENE-8757: - Commit 5e109fb0a76070c0ccb8e36f0c9db2f297a246c5 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5e109fb ] LUCENE-8757: Move changes entry. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860181#comment-16860181 ] Atri Sharma commented on LUCENE-8757: - [~hossman] This is a known issue (https://issues.apache.org/jira/browse/LUCENE-8829). The issue is not with this JIRA but an existing issue in TopDocs.merge. We are discussing fix on LUCENE-8829. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860173#comment-16860173 ] Hoss Man commented on LUCENE-8757: -- another similar failure... http://fucit.org/solr-jenkins-reports/job-data/thetaphi/Lucene-Solr-8.x-Linux/677/ https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.x-Linux/677/ {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRegexpRandom2 -Dtests.method=testRegexps -Dtests.seed=A993DB1E59F6C2DC -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=rn -D tests.timezone=Africa/El_Aaiun -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.06s J2 | TestRegexpRandom2.testRegexps <<< [junit4]> Throwable #1: junit.framework.AssertionFailedError: Hit 9 docnumbers don't match [junit4]> Hits length1=25length2=25 [junit4]> hit=0: doc3=1.0 shardIndex=0, doc3=1.0 shardIndex=0 [junit4]> hit=1: doc18=1.0 shardIndex=0, doc18=1.0 shardIndex=0 [junit4]> hit=2: doc40=1.0 shardIndex=0, doc40=1.0 shardIndex=0 [junit4]> hit=3: doc81=1.0 shardIndex=0, doc81=1.0 shardIndex=0 [junit4]> hit=4: doc85=1.0 shardIndex=0, doc85=1.0 shardIndex=0 [junit4]> hit=5: doc150=1.0 shardIndex=0, doc150=1.0 shardIndex=0 [junit4]> hit=6: doc159=1.0 shardIndex=0, doc159=1.0 shardIndex=0 [junit4]> hit=7: doc165=1.0 shardIndex=0, doc165=1.0 shardIndex=0 [junit4]> hit=8: doc175=1.0 shardIndex=0, doc175=1.0 shardIndex=0 [junit4]> hit=9: doc208=1.0 shardIndex=0, doc180=1.0 shardIndex=0 [junit4]> hit=10: doc256=1.0 shardIndex=0,doc181=1.0 shardIndex=0 [junit4]> hit=11: doc270=1.0 shardIndex=0,doc208=1.0 shardIndex=0 [junit4]> hit=12: doc295=1.0 shardIndex=0,doc256=1.0 shardIndex=0 [junit4]> hit=13: doc300=1.0 shardIndex=0,doc270=1.0 shardIndex=0 [junit4]> hit=14: doc331=1.0 shardIndex=0,doc295=1.0 shardIndex=0 [junit4]> hit=15: doc347=1.0 shardIndex=0,doc300=1.0 shardIndex=0 [junit4]> hit=16: doc357=1.0 shardIndex=0,doc331=1.0 shardIndex=0 [junit4]> hit=17: doc363=1.0 shardIndex=0,doc347=1.0 shardIndex=0 [junit4]> hit=18: doc366=1.0 shardIndex=0,doc357=1.0 shardIndex=0 [junit4]> hit=19: doc385=1.0 shardIndex=0,doc363=1.0 shardIndex=0 [junit4]> hit=20: doc459=1.0 shardIndex=0,doc366=1.0 shardIndex=0 [junit4]> hit=21: doc464=1.0 shardIndex=0,doc385=1.0 shardIndex=0 [junit4]> hit=22: doc467=1.0 shardIndex=0,doc459=1.0 shardIndex=0 [junit4]> hit=23: doc489=1.0 shardIndex=0,doc464=1.0 shardIndex=0 [junit4]> hit=24: doc514=1.0 shardIndex=0,doc467=1.0 shardIndex=0 [junit4]> for query:field:// [junit4]>at __randomizedtesting.SeedInfo.seed([A993DB1E59F6C2DC:48CF9A0F875C9554]:0) [junit4]>at junit.framework.Assert.fail(Assert.java:57) [junit4]>at org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:205) [junit4]>at org.apache.lucene.search.TestRegexpRandom2.assertSame(TestRegexpRandom2.java:178) [junit4]>at org.apache.lucene.search.TestRegexpRandom2.testRegexps(TestRegexpRandom2.java:164) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4]>at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4]>at java.base/java.lang.reflect.Method.invoke(Method.java:567) [junit4]>at java.base/java.lang.Thread.run(Thread.java:835) [junit4] 2> NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=5, maxDocsPerChunk=8, blockSize=1), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=5, blockSize=1)), sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@1c32fc08), locale=rn, timezone=Africa/El_Aaiun {noformat} ...note that in both of these builds, TestFieldCacheRewriteMethod (which subclasses TestRegexpRandom2) also failed w/similar errors and the same master seeds .. suggesting it's probably related to one of the persistent options chosen by the random master seed (codec, similarity, locale, etc) /cc [~jpountz] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860166#comment-16860166 ] Hoss Man commented on LUCENE-8757: -- git bisect has identified cfd9de894d1f0f1b9e368994b972a81f449c as the cause of the reproducible test failures that have ocured in jenkins jobs on branch_8x... http://fucit.org/solr-jenkins-reports/job-data/thetaphi/Lucene-Solr-8.x-Linux/672/ https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.x-Linux/672/ {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestRegexpRandom2 -Dtests.method=testRegexps -Dtests.seed=E712F4979E38CFD8 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=fr-HT -Dtests.timezone=America/New_York -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] FAILURE 0.07s J1 | TestRegexpRandom2.testRegexps <<< [junit4]> Throwable #1: junit.framework.AssertionFailedError: Hit 5 docnumbers don't match [junit4]> Hits length1=25length2=25 [junit4]> hit=0: doc24=1.0, doc24=1.0 [junit4]> hit=1: doc33=1.0, doc33=1.0 [junit4]> hit=2: doc34=1.0, doc34=1.0 [junit4]> hit=3: doc38=1.0, doc38=1.0 [junit4]> hit=4: doc43=1.0, doc43=1.0 [junit4]> hit=5: doc119=1.0, doc183=1.0 [junit4]> hit=6: doc142=1.0, doc188=1.0 [junit4]> hit=7: doc146=1.0, doc193=1.0 [junit4]> hit=8: doc152=1.0, doc228=1.0 [junit4]> hit=9: doc159=1.0, doc244=1.0 [junit4]> hit=10: doc163=1.0, doc250=1.0 [junit4]> hit=11: doc169=1.0, doc282=1.0 [junit4]> hit=12: doc173=1.0, doc284=1.0 [junit4]> hit=13: doc183=1.0, doc291=1.0 [junit4]> hit=14: doc188=1.0, doc300=1.0 [junit4]> hit=15: doc193=1.0, doc320=1.0 [junit4]> hit=16: doc228=1.0, doc347=1.0 [junit4]> hit=17: doc244=1.0, doc349=1.0 [junit4]> hit=18: doc250=1.0, doc405=1.0 [junit4]> hit=19: doc282=1.0, doc418=1.0 [junit4]> hit=20: doc284=1.0, doc431=1.0 [junit4]> hit=21: doc291=1.0, doc443=1.0 [junit4]> hit=22: doc300=1.0, doc490=1.0 [junit4]> hit=23: doc320=1.0, doc513=1.0 [junit4]> hit=24: doc347=1.0, doc555=1.0 [junit4]> for query:/[)*-|핯Ѿ]*+/ [junit4]>at __randomizedtesting.SeedInfo.seed([E712F4979E38CFD8:64EB58640929850]:0) [junit4]>at junit.framework.Assert.fail(Assert.java:57) [junit4]>at org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:205) [junit4]>at org.apache.lucene.search.TestRegexpRandom2.assertSame(TestRegexpRandom2.java:178) [junit4]>at org.apache.lucene.search.TestRegexpRandom2.testRegexps(TestRegexpRandom2.java:164) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit4]>at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [junit4]>at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit4]>at java.base/java.lang.reflect.Method.invoke(Method.java:567) [junit4]>at java.base/java.lang.Thread.run(Thread.java:835) {noformat} > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846494#comment-16846494 ] ASF subversion and git services commented on LUCENE-8757: - Commit 97046c70545ae3b7835f153cc7f59c21e45a4883 in lucene-solr's branch refs/heads/jira/SOLR-13484 from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=97046c7 ] LUCENE-8757: Fix test bug. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845726#comment-16845726 ] Atri Sharma commented on LUCENE-8757: - [~jpountz] Thanks for pushing! > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845599#comment-16845599 ] ASF subversion and git services commented on LUCENE-8757: - Commit 97046c70545ae3b7835f153cc7f59c21e45a4883 in lucene-solr's branch refs/heads/master from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=97046c7 ] LUCENE-8757: Fix test bug. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845598#comment-16845598 ] ASF subversion and git services commented on LUCENE-8757: - Commit dbfa8454e21bef1eb4ecabef7de6d801cadc2df8 in lucene-solr's branch refs/heads/branch_8x from Adrien Grand [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=dbfa845 ] LUCENE-8757: Fix test bug. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845120#comment-16845120 ] ASF subversion and git services commented on LUCENE-8757: - Commit cfd9de894d1f0f1b9e368994b972a81f449c in lucene-solr's branch refs/heads/branch_8x from Atri Sharma [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cfd9de8 ] LUCENE-8757: Improving Default Segments To Thread Mapping Algorithm The current slicing algorithm assigns a thread per segment, which can be detrimental to performance in case the distribution has a large number of small segments. The patch introduces a slicing algorithm which coalesces smaller segments to a single thread, thus reducing the impact of context switching by limiting the number of threads Signed-off-by: Adrien Grand > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.2 > > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845113#comment-16845113 ] ASF subversion and git services commented on LUCENE-8757: - Commit 87e936f1bb76b89acdf8d0c3071bb43349c0e00c in lucene-solr's branch refs/heads/master from Atri Sharma [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=87e936f ] LUCENE-8757: Improving Default Segments To Thread Mapping Algorithm The current slicing algorithm assigns a thread per segment, which can be detrimental to performance in case the distribution has a large number of small segments. The patch introduces a slicing algorithm which coalesces smaller segments to a single thread, thus reducing the impact of context switching by limiting the number of threads Signed-off-by: Adrien Grand > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844849#comment-16844849 ] Atri Sharma commented on LUCENE-8757: - [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844848#comment-16844848 ] Atri Sharma commented on LUCENE-8757: - [~jpountz] Essentially, the idea is to maintain the previous leaf's maxDoc outside the scope of per leaf collector and move it to AssertingCollector's state, right? If I understood you correctly, attached patch should fix this. I verified that the test the previous iteration added specifically for the out of order docIDs catches this issue, but agree that AssertingCollector should have the right assertions in place. {quote}Looking at the AssertingCollector again, it has a check that doc IDs are collected in doc ID order, so I wonder why this assertion didn't trip with the earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we just got lucky? {quote} Do you think similar assertions/checks would make sense in IndexSearcher too? If AssertingCollector missed this issue, maybe we should make IndexSearcher's input arguments validation more robust as well. WDYT? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844595#comment-16844595 ] Adrien Grand commented on LUCENE-8757: -- [~atris] I think it is still not correct since the values of the docBase/maxDoc can only be seen by the current leaf collector while we need this check across all leaf collectors that are created from the same collector. Looking at the AssertingCollector again, it has a check that doc IDs are collected in doc ID order, so I wonder why this assertion didn't trip with the earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we just got lucky? Nevertheless I think it's worth adding another assertion that leaves are collected in the right order and that their doc ID space doesn't intersect as described above, eg. we could record a {{previousLeafMaxDoc}} at the same level as {{maxDoc}} in AssertinCollector, and then in {{getLeafCollector}} do something like {code} assert context.docBase >= previousLeafMaxDoc; // generally equal, but might be greater if some leaves are skipped previousLeafMaxDoc = context.docBase + context.reader().maxDoc(); {code} > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844470#comment-16844470 ] Atri Sharma commented on LUCENE-8757: - [^LUCENE-8757.patch] [~jpountz] Updated the assert, thanks > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844198#comment-16844198 ] Adrien Grand commented on LUCENE-8757: -- Thanks [~atris]. I think there is a bug in AssertingCollector as previousDocBase is always 1? By the way, we don't only need to ensure that previousDocBase <= docBase, but even that previousDocBase + previousMaxDoc <= docBase? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844048#comment-16844048 ] Atri Sharma commented on LUCENE-8757: - Added both, a test and the assertion in AssertingCollector. [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844013#comment-16844013 ] Adrien Grand commented on LUCENE-8757: -- I think we could add an assertion for this in AssertingCollector. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844005#comment-16844005 ] Michael McCandless commented on LUCENE-8757: {quote}Your last patch sorts in reverse order of docBase, it should sort by the natural order? {quote} Hmm can we add a test case or an assertion somewhere that would fail if this happens again in the future? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843886#comment-16843886 ] Atri Sharma commented on LUCENE-8757: - Yeah, I noted that after posting the patch. Attached is an updated version with that fixed and the redundant sort removed. Thanks [~jpountz] for pointing it out. [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843876#comment-16843876 ] Adrien Grand commented on LUCENE-8757: -- [~atris] Your last patch sorts in reverse order of docBase, it should sort by the natural order? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843807#comment-16843807 ] Atri Sharma commented on LUCENE-8757: - [~simonw] Attached is an updated patch [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843726#comment-16843726 ] Simon Willnauer commented on LUCENE-8757: - [~atris] can we instead of asserting the order just sort the slice in LeafSlice ctor? This should prevent any issues down the road and it's cheap enough IMO > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840683#comment-16840683 ] Atri Sharma commented on LUCENE-8757: - Hi [~jpountz], I was going through IndexSearcher code and see that there are no checks that we do in IndexSearcher to ensure that LeafReaderContexts are ordered by docID. Should we add those checks while constructing a new instance? I think that can be orthogonal to this patch, since this patch anyways orders leaf slices by docIDs. WDYT? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838419#comment-16838419 ] Atri Sharma commented on LUCENE-8757: - [~jpountz] Thanks, TopDocs#merge is what really opened my eyes to this invariant. Attached is an updated patch. Please let me know if it looks fine. [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
I think this should be done inside IndexSearcher. It’s a general problem, no? > On 13. May 2019, at 10:25, Adrien Grand (JIRA) wrote: > > >[ > https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838363#comment-16838363 > ] > > Adrien Grand commented on LUCENE-8757: > -- > > Yes. Top-docs collectors are expected to tie-break by doc ID in case > documents compare equal. Things like TopDocs#merge compare doc IDs explicitly > for that purpose, but Collector#collect implementations just rely on the fact > that documents are collected in order to ignore documents that compare equal > to the current k-th best hit. So we need to sort segments within a slice by > docBase in order to get the same top hits regardless of how slices have been > constructed. > >> Better Segment To Thread Mapping Algorithm >> -- >> >>Key: LUCENE-8757 >>URL: https://issues.apache.org/jira/browse/LUCENE-8757 >>Project: Lucene - Core >> Issue Type: Improvement >> Reporter: Atri Sharma >> Assignee: Simon Willnauer >> Priority: Major >>Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, >> LUCENE-8757.patch >> >> >> The current segments to threads allocation algorithm always allocates one >> thread per segment. This is detrimental to performance in case of skew in >> segment sizes since small segments also get their dedicated thread. This can >> lead to performance degradation due to context switching overheads. >> >> A better algorithm which is cognizant of size skew would have better >> performance for realistic scenarios > > > > -- > This message was sent by Atlassian JIRA > (v7.6.3#76005) > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838363#comment-16838363 ] Adrien Grand commented on LUCENE-8757: -- Yes. Top-docs collectors are expected to tie-break by doc ID in case documents compare equal. Things like TopDocs#merge compare doc IDs explicitly for that purpose, but Collector#collect implementations just rely on the fact that documents are collected in order to ignore documents that compare equal to the current k-th best hit. So we need to sort segments within a slice by docBase in order to get the same top hits regardless of how slices have been constructed. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838240#comment-16838240 ] Atri Sharma commented on LUCENE-8757: - [~jpountz] Do you mean ordering segments within a slice by docBase? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838131#comment-16838131 ] Adrien Grand commented on LUCENE-8757: -- I think we need to sort by docBase before constructing the slices, otherwise we might collect doc IDs out-of-order. By the way we should probably make the LeafSlice constructor check that leaves come in order? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Assignee: Simon Willnauer >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837615#comment-16837615 ] Simon Willnauer commented on LUCENE-8757: - LGTM I will try to commit this in the coming days > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837056#comment-16837056 ] Atri Sharma commented on LUCENE-8757: - Added the segments cap back with additional random testing. [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, > LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837003#comment-16837003 ] Simon Willnauer commented on LUCENE-8757: - {quote} I think there is an important justification for the 2nd criteria (number of segments in each work unit / slice), which is if you have an index with some large segments, and then with a long tail of small segments (easily happens if your machine has substantially CPU concurrency and you use multiple threads), since there is a fixed cost for visiting each segment, if you put too many small segments into one work unit, those fixed costs multiply and that one work unit can become too slow even though it's not actually going to visit too many documents. I think we should keep it? {quote} fair enough. lets add it back > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836281#comment-16836281 ] Atri Sharma commented on LUCENE-8757: - [~simonw] Please let me know if you have any further concerns. Happy to address > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835508#comment-16835508 ] Atri Sharma commented on LUCENE-8757: - bq. Are the work units tackled in order for each query? I.e. is the queue a FIFO queue? If so, the sorting can be useful since IndexSearcher would work first on the hardest/slowest work units, the "long poles" for the concurrent search? Yes, the leafslices are tackled in order in IndexSearcher i.e. threads are created for work units in the same order in which slices() created the work units. So with a sort, what you said will be applicable i.e. the larger work units get scheduled first. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835502#comment-16835502 ] Michael McCandless commented on LUCENE-8757: Are the work units tackled in order for each query? I.e. is the queue a FIFO queue? If so, the sorting can be useful since {{IndexSearcher}} would work first on the hardest/slowest work units, the "long poles" for the concurrent search? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835498#comment-16835498 ] Michael McCandless commented on LUCENE-8757: Whoa, fast iterations over here! I think there is an important justification for the 2nd criteria (number of segments in each work unit / slice), which is if you have an index with some large segments, and then with a long tail of small segments (easily happens if your machine has substantially CPU concurrency and you use multiple threads), since there is a fixed cost for visiting each segment, if you put too many small segments into one work unit, those fixed costs multiply and that one work unit can become too slow even though it's not actually going to visit too many documents. I think we should keep it? Re: the choice of the constants – I ran some performance tests quite a while ago on our production data/queries and a machine with sizable concurrency ({{i3.16xlarge}}) and found those two constants to be a sweet spot at the time. But let's also remember: this is simply a default segment -> work units assignment, and expert users can always continue to override. Good defaults are important ;) > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835491#comment-16835491 ] Atri Sharma commented on LUCENE-8757: - [~simonw] The reason the sort was added was to have a consistency guarantee from the slicing algorithm i.e. two queries with the exact same distribution of segments should get the same number of slices, irrespective of the order in which the segments are traversed by the method. Consider a distribution of 8 segments where 6 segments have 10,000 documents each, and two segments have 130,000 documents each. For the below order of traversal of segments (each value represents the maxDoc of the segment): {10_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 130_000). The slicing algorithm will create one slice consisting of all segments (since the last segment's addition is what causes the maxDocs limit to be breached). If the segments were sorted, the order would be: {130_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 10_000} This would lead to two slices being created. Thoughts? bq. also want to suggest to beef up testing a bit Thanks, added the test. Will raise another iteration post conclusion on above discussion. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835481#comment-16835481 ] Simon Willnauer commented on LUCENE-8757: - Thanks for the additional iteration, now that we simplified this can we remove the sorting? I don't necessearily see how the sort makes things simpler. If we see a segment > threshold we can just add it as a group? I though you did that already and hence my comment about the assertion. WDYT? I also want to suggest to beef up testing a bit with a randomized version of this like this: {code} diff --git a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java index 7c63a817adb..76ccca64ee7 100644 --- a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java +++ b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java @@ -1933,6 +1933,14 @@ public abstract class LuceneTestCase extends Assert { ret = random.nextBoolean() ? new AssertingIndexSearcher(random, r, ex) : new AssertingIndexSearcher(random, r.getContext(), ex); + } else if (random.nextBoolean()) { +int maxDocPerSlice = 1 + random.nextInt(10); +ret = new IndexSearcher(r, ex) { + @Override + protected LeafSlice[] slices(List leaves) { +return slices(leaves, maxDocPerSlice); + } +}; } else { ret = random.nextBoolean() ? new IndexSearcher(r, ex) {code} > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834856#comment-16834856 ] Atri Sharma commented on LUCENE-8757: - Hi [~simonw] bq. if the previous segment was smallish then group is non-null? I think you should test these cases, maybe add a random test and randomize the order or the segments? I dont think that case is possible, since we sort LeafReaderContexts based on the number of documents per segment in descending order. Hence, no LeafReaderContext can be succeded by one which has more documents than its predecessor. I agree with your thought of having a random test with variety of configurations for segment size distributions. bq.can and should be replaced by: Fixed, thanks. [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834767#comment-16834767 ] Simon Willnauer commented on LUCENE-8757: - [~atris] I think the assertion in this part doesn't hold: {code} +for (LeafReaderContext ctx : sortedLeaves) { + if (ctx.reader().maxDoc() > maxDocsPerSlice) { +assert group == null; +List singleSegmentSlice = new ArrayList(); {code} if the previous segment was smallish then _group_ is non-null? I think you should test these cases, maybe add a random test and randomize the order or the segments? This: {code} +List singleSegmentSlice = new ArrayList(); + +singleSegmentSlice.add(ctx); +groupedLeaves.add(singleSegmentSlice); {code} can and should be replaced by: {code} groupedLeaves.add(Collections.singletonList(ctx)); {code} otherwise it looks good. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834587#comment-16834587 ] Atri Sharma commented on LUCENE-8757: - [~simonw] The reasoning behind adding the second parameter was to ensure that we do not bias against the case where there are a large number of small segments. For eg, if there are 100 segments and all of them are small, then we should still allow parallel searches to get some performance gains. Although this should be a rare case since merging will coalesce them. However, I agree with you that this might be contradicting the whole idea of adding the 250K docs split point. If all segments together in an index do not add up to 250K, then the index is small enough to not need parallelism. Attached is an updated patch [^LUCENE-8757.patch] > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch, LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834525#comment-16834525 ] Simon Willnauer commented on LUCENE-8757: - [~atris] actually I thought about these defaults again and I am starting to think it's an ok default. The reason for this is that we try to prevent having dedicated threads for smallish segments so we group them together. I still do wonder if we need to have 2 parameters? Wouldn't it be enough to just say that we group things together until we have at least 250k docs per thread to be searched? is it really necessary to have another parameter that limits the number of segmetns per slice? I think a single parameter would be great and simpler. WDYT? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834463#comment-16834463 ] Atri Sharma commented on LUCENE-8757: - :bq I don't think we should push this if we already know we wanna do something different. That said, I am not convinced the numbers are good defaults. At the same time I don't have any numbers here do you have anything to back these defaults up? Sure. The reason I was suggesting pushing this patch per se is because the other approach we are advancing would require a couple of new semantics to be introduced, so we could pote ntially want users to have an option to opt-in for either of the two. That said, I believe the cost based algorithm would also require some hard defaults to be present – to ensure that small segments do not get independent threads even if system had the capacity. RE: The default constant values, these numbers are derived from empirical testing across different datasets in ESRally (nyc_taxis, logging) and looking at the default segment size distribution of wikipedia10M dataset in luceneutil. However, this might not be a good default size to split on. One thing we could do (albeit expensive) is to take the mean number of documents in the corresponding LeafReaderContexts for a query as the split point. Would that be a better dynamic way? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1683#comment-1683 ] Simon Willnauer commented on LUCENE-8757: - > Would it make sense to push this patch, and then let users consume it and > provide feedback while we iterate on the more sophisticated version? We could > even have both of the methods available as options to users, potentially I don't think we should push this if we already know we wanna do something different. That said, I am not convinced the numbers are good defaults. At the same time I don't have any numbers here do you have anything to back these defaults up? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834065#comment-16834065 ] Atri Sharma commented on LUCENE-8757: - Hi [~simonw], Spending a bit more time thinking about your suggestions, I agree that it is a great idea, albeit requiring more thought and effort than what this Jira proposes to achieve. I have opened LUCENE-8794 - Cost Based Slice Allocation Algorithm for discussing the same. Please share your thoughts. Would it make sense to push this patch, and then let users consume it and provide feedback while we iterate on the more sophisticated version? We could even have both of the methods available as options to users, potentially Thoughts? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832382#comment-16832382 ] Atri Sharma commented on LUCENE-8757: - [~simonw] Attached is an updated patch. My two cents are that segregating segments to keep the document count fair is a more complex operation that what the slices API does today (and in this patch). Fair segmentation is a known hard problem (integer partitioning, for eg). We should also consider how much of a bootstrap time latency would a more complex algorithm add. Given that a user has the option of overriding IndexSearcher to add their own ways of splicing, I feel our default algorithm should do well on the common usecase, but not more than that. Happy to discuss the alternatives. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832343#comment-16832343 ] Simon Willnauer commented on LUCENE-8757: - Thanks [~atris], can you bring back the javadocs for {code:java} protected LeafSlice[] slices(List leaves){code} please don't reassign an argument like here: {code:java} leaves = new ArrayList<>(leaves); {code} The rest of the patch looks OK to me yet I am not so sure about the defaults. I do wonder if we should look at this from a different perspective. Rather than using hard numbers can we try to evenly balance the total number of documents across N threads and make N the variable? [~mikemccand] WDYT? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832233#comment-16832233 ] Atri Sharma commented on LUCENE-8757: - [^LUCENE-8757.patch]Hi [~simonw] Thanks for reviewing the patch and the comments. Attached is an updated patch. Regards, Atri > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831591#comment-16831591 ] Simon Willnauer commented on LUCENE-8757: - Hey Atri, thanks for putting up this patch, here is some additional feedback: - can we stick with an protected non-static method on IndexSearcher subclasses should be able to override your impl. I think it's ok to have a static method like this: {code:java} public static LeafSlice[] slices (List leaves, int maxDocsPerSlice, int maxSegPerSlice){code} that you can call from the protected method with your defaults? - you might want to change your sort to something like this: {code:java} Collections.sort(leaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc(;{code} - I think the _Leaves_ class is unnecessary we can just use _List_ instead? > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830958#comment-16830958 ] Atri Sharma commented on LUCENE-8757: - Hi [~mikemccand] Thanks for taking a look at the patch. I have attached an updated patch which fixes your comments. For the constants, I think I was being too conservative in not using too many threads :) Please let me know if the current patch seems sane > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830173#comment-16830173 ] Michael McCandless commented on LUCENE-8757: Thanks [~atris] – I agree it's important to have better defaults for how we coalesce segments into per-query-per-thread work units. A few small comments: * Can you insert {{_}} in the big number constants (e.g. {{2500}})? Makes it easier to read, and open-source code is written for reading :) * I think something is wrong with {{docSum}} – you only set it, and never add to it? I think the intention is to sum up docs in multiple adjacent (sorted by {{maxDoc}}) segments until that count exceeds {{2500}}? * How did you pick {{2500}} and {{100}} as good constants? We are using much smaller values in our production infrastructure – {{250_000}} and {{5}}, admittedly after only a little experimentation. * Can you add some tests? You can maybe make the slice method a package private static method and then create test cases with "interesting" {{LeafReaderContext}} combinations? In particular, a test case exposing the {{docSum}} bug would be great, then fix that bug, then see the test case pass. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
[ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823295#comment-16823295 ] Atri Sharma commented on LUCENE-8757: - Attached is a first cut for the patch. The main idea there is that smaller segments are coalesced into single threads with a hard cap on the number of segments per thread to avoid overwhelming a single thread. This can be enhanced with taking number of cores on the machine and current CPU utilization, but that might lead to a higher IndexSearcher bootstrap time. Comments and thoughts are welcome. > Better Segment To Thread Mapping Algorithm > -- > > Key: LUCENE-8757 > URL: https://issues.apache.org/jira/browse/LUCENE-8757 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Atri Sharma >Priority: Major > Attachments: LUCENE-8757.patch > > > The current segments to threads allocation algorithm always allocates one > thread per segment. This is detrimental to performance in case of skew in > segment sizes since small segments also get their dedicated thread. This can > lead to performance degradation due to context switching overheads. > > A better algorithm which is cognizant of size skew would have better > performance for realistic scenarios -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org