[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-07-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16877096#comment-16877096
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit 5e109fb0a76070c0ccb8e36f0c9db2f297a246c5 in lucene-solr's branch 
refs/heads/SOLR-13105-visual from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5e109fb ]

LUCENE-8757: Move changes entry.


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-07-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876689#comment-16876689
 ] 

Adrien Grand commented on LUCENE-8757:
--

This change as been reverted from 8.x due to the fact that it required changes 
to TopDocs#merge that would necessarily be breaking to our users.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-07-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876685#comment-16876685
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit 8448a33ed8ffc978497a0d21e8042a61bf0ddd04 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8448a33 ]

LUCENE-8757: Revert on 8.x.


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-07-02 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876688#comment-16876688
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit 5e109fb0a76070c0ccb8e36f0c9db2f297a246c5 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5e109fb ]

LUCENE-8757: Move changes entry.


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-06-10 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860181#comment-16860181
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~hossman] This is a known issue 
(https://issues.apache.org/jira/browse/LUCENE-8829). The issue is not with this 
JIRA but an existing issue in TopDocs.merge. We are discussing fix on 
LUCENE-8829.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-06-10 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860173#comment-16860173
 ] 

Hoss Man commented on LUCENE-8757:
--

another similar failure...

http://fucit.org/solr-jenkins-reports/job-data/thetaphi/Lucene-Solr-8.x-Linux/677/
https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.x-Linux/677/

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRegexpRandom2 
-Dtests.method=testRegexps -Dtests.seed=A993DB1E59F6C2DC -Dtests.multiplier=3 
-Dtests.slow=true -Dtests.locale=rn -D
tests.timezone=Africa/El_Aaiun -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 0.06s J2 | TestRegexpRandom2.testRegexps <<<
   [junit4]> Throwable #1: junit.framework.AssertionFailedError: Hit 9 
docnumbers don't match
   [junit4]> Hits length1=25length2=25
   [junit4]> hit=0: doc3=1.0 shardIndex=0,   doc3=1.0 shardIndex=0
   [junit4]> hit=1: doc18=1.0 shardIndex=0,  doc18=1.0 shardIndex=0
   [junit4]> hit=2: doc40=1.0 shardIndex=0,  doc40=1.0 shardIndex=0
   [junit4]> hit=3: doc81=1.0 shardIndex=0,  doc81=1.0 shardIndex=0
   [junit4]> hit=4: doc85=1.0 shardIndex=0,  doc85=1.0 shardIndex=0
   [junit4]> hit=5: doc150=1.0 shardIndex=0, doc150=1.0 shardIndex=0
   [junit4]> hit=6: doc159=1.0 shardIndex=0, doc159=1.0 shardIndex=0
   [junit4]> hit=7: doc165=1.0 shardIndex=0, doc165=1.0 shardIndex=0
   [junit4]> hit=8: doc175=1.0 shardIndex=0, doc175=1.0 shardIndex=0
   [junit4]> hit=9: doc208=1.0 shardIndex=0, doc180=1.0 shardIndex=0
   [junit4]> hit=10: doc256=1.0 shardIndex=0,doc181=1.0 shardIndex=0
   [junit4]> hit=11: doc270=1.0 shardIndex=0,doc208=1.0 shardIndex=0
   [junit4]> hit=12: doc295=1.0 shardIndex=0,doc256=1.0 shardIndex=0
   [junit4]> hit=13: doc300=1.0 shardIndex=0,doc270=1.0 shardIndex=0
   [junit4]> hit=14: doc331=1.0 shardIndex=0,doc295=1.0 shardIndex=0
   [junit4]> hit=15: doc347=1.0 shardIndex=0,doc300=1.0 shardIndex=0
   [junit4]> hit=16: doc357=1.0 shardIndex=0,doc331=1.0 shardIndex=0
   [junit4]> hit=17: doc363=1.0 shardIndex=0,doc347=1.0 shardIndex=0
   [junit4]> hit=18: doc366=1.0 shardIndex=0,doc357=1.0 shardIndex=0
   [junit4]> hit=19: doc385=1.0 shardIndex=0,doc363=1.0 shardIndex=0
   [junit4]> hit=20: doc459=1.0 shardIndex=0,doc366=1.0 shardIndex=0
   [junit4]> hit=21: doc464=1.0 shardIndex=0,doc385=1.0 shardIndex=0
   [junit4]> hit=22: doc467=1.0 shardIndex=0,doc459=1.0 shardIndex=0
   [junit4]> hit=23: doc489=1.0 shardIndex=0,doc464=1.0 shardIndex=0
   [junit4]> hit=24: doc514=1.0 shardIndex=0,doc467=1.0 shardIndex=0
   [junit4]> for query:field://
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([A993DB1E59F6C2DC:48CF9A0F875C9554]:0)
   [junit4]>at junit.framework.Assert.fail(Assert.java:57)
   [junit4]>at 
org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:205)
   [junit4]>at 
org.apache.lucene.search.TestRegexpRandom2.assertSame(TestRegexpRandom2.java:178)
   [junit4]>at 
org.apache.lucene.search.TestRegexpRandom2.testRegexps(TestRegexpRandom2.java:164)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]>at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]>at 
java.base/java.lang.reflect.Method.invoke(Method.java:567)
   [junit4]>at java.base/java.lang.Thread.run(Thread.java:835)
   [junit4]   2> NOTE: test params are: 
codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY,
 chunkSize=5, maxDocsPerChunk=8, blockSize=1), 
termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, 
chunkSize=5, blockSize=1)), 
sim=Asserting(org.apache.lucene.search.similarities.AssertingSimilarity@1c32fc08),
 locale=rn, timezone=Africa/El_Aaiun
{noformat}



...note that in both of these builds, TestFieldCacheRewriteMethod (which 
subclasses TestRegexpRandom2) also failed w/similar errors and the same master 
seeds .. suggesting it's probably related to one of the persistent options 
chosen by the random master seed (codec, similarity, locale, etc)

/cc [~jpountz]


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-06-10 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16860166#comment-16860166
 ] 

Hoss Man commented on LUCENE-8757:
--

git bisect has identified cfd9de894d1f0f1b9e368994b972a81f449c as the cause 
of the reproducible test failures that have ocured in jenkins jobs on 
branch_8x...

http://fucit.org/solr-jenkins-reports/job-data/thetaphi/Lucene-Solr-8.x-Linux/672/
https://jenkins.thetaphi.de/view/Lucene-Solr/job/Lucene-Solr-8.x-Linux/672/

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRegexpRandom2 
-Dtests.method=testRegexps -Dtests.seed=E712F4979E38CFD8 -Dtests.multiplier=3 
-Dtests.slow=true -Dtests.locale=fr-HT -Dtests.timezone=America/New_York 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] FAILURE 0.07s J1 | TestRegexpRandom2.testRegexps <<<
   [junit4]> Throwable #1: junit.framework.AssertionFailedError: Hit 5 
docnumbers don't match
   [junit4]> Hits length1=25length2=25
   [junit4]> hit=0: doc24=1.0,   doc24=1.0
   [junit4]> hit=1: doc33=1.0,   doc33=1.0
   [junit4]> hit=2: doc34=1.0,   doc34=1.0
   [junit4]> hit=3: doc38=1.0,   doc38=1.0
   [junit4]> hit=4: doc43=1.0,   doc43=1.0
   [junit4]> hit=5: doc119=1.0,  doc183=1.0
   [junit4]> hit=6: doc142=1.0,  doc188=1.0
   [junit4]> hit=7: doc146=1.0,  doc193=1.0
   [junit4]> hit=8: doc152=1.0,  doc228=1.0
   [junit4]> hit=9: doc159=1.0,  doc244=1.0
   [junit4]> hit=10: doc163=1.0, doc250=1.0
   [junit4]> hit=11: doc169=1.0, doc282=1.0
   [junit4]> hit=12: doc173=1.0, doc284=1.0
   [junit4]> hit=13: doc183=1.0, doc291=1.0
   [junit4]> hit=14: doc188=1.0, doc300=1.0
   [junit4]> hit=15: doc193=1.0, doc320=1.0
   [junit4]> hit=16: doc228=1.0, doc347=1.0
   [junit4]> hit=17: doc244=1.0, doc349=1.0
   [junit4]> hit=18: doc250=1.0, doc405=1.0
   [junit4]> hit=19: doc282=1.0, doc418=1.0
   [junit4]> hit=20: doc284=1.0, doc431=1.0
   [junit4]> hit=21: doc291=1.0, doc443=1.0
   [junit4]> hit=22: doc300=1.0, doc490=1.0
   [junit4]> hit=23: doc320=1.0, doc513=1.0
   [junit4]> hit=24: doc347=1.0, doc555=1.0
   [junit4]> for query:/[)*-|핯Ѿ]*+/
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([E712F4979E38CFD8:64EB58640929850]:0)
   [junit4]>at junit.framework.Assert.fail(Assert.java:57)
   [junit4]>at 
org.apache.lucene.search.CheckHits.checkEqual(CheckHits.java:205)
   [junit4]>at 
org.apache.lucene.search.TestRegexpRandom2.assertSame(TestRegexpRandom2.java:178)
   [junit4]>at 
org.apache.lucene.search.TestRegexpRandom2.testRegexps(TestRegexpRandom2.java:164)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   [junit4]>at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   [junit4]>at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   [junit4]>at 
java.base/java.lang.reflect.Method.invoke(Method.java:567)
   [junit4]>at java.base/java.lang.Thread.run(Thread.java:835)

{noformat}
 

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-23 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846494#comment-16846494
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit 97046c70545ae3b7835f153cc7f59c21e45a4883 in lucene-solr's branch 
refs/heads/jira/SOLR-13484 from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=97046c7 ]

LUCENE-8757: Fix test bug.


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-22 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845726#comment-16845726
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~jpountz] Thanks for pushing!

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-22 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845599#comment-16845599
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit 97046c70545ae3b7835f153cc7f59c21e45a4883 in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=97046c7 ]

LUCENE-8757: Fix test bug.


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-22 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845598#comment-16845598
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit dbfa8454e21bef1eb4ecabef7de6d801cadc2df8 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=dbfa845 ]

LUCENE-8757: Fix test bug.


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845120#comment-16845120
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit cfd9de894d1f0f1b9e368994b972a81f449c in lucene-solr's branch 
refs/heads/branch_8x from Atri Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cfd9de8 ]

LUCENE-8757: Improving Default Segments To Thread Mapping Algorithm

The current slicing algorithm assigns a thread per segment, which
can be detrimental to performance in case the distribution has
a large number of small segments. The patch introduces a slicing
algorithm which coalesces smaller segments to a single thread,
thus reducing the impact of context switching by limiting the
number of threads

Signed-off-by: Adrien Grand 


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845113#comment-16845113
 ] 

ASF subversion and git services commented on LUCENE-8757:
-

Commit 87e936f1bb76b89acdf8d0c3071bb43349c0e00c in lucene-solr's branch 
refs/heads/master from Atri Sharma
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=87e936f ]

LUCENE-8757: Improving Default Segments To Thread Mapping Algorithm

The current slicing algorithm assigns a thread per segment, which
can be detrimental to performance in case the distribution has
a large number of small segments. The patch introduces a slicing
algorithm which coalesces smaller segments to a single thread,
thus reducing the impact of context switching by limiting the
number of threads

Signed-off-by: Adrien Grand 


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-21 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844849#comment-16844849
 ] 

Atri Sharma commented on LUCENE-8757:
-

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-21 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844848#comment-16844848
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~jpountz] Essentially, the idea is to maintain the previous leaf's maxDoc 
outside the scope of per leaf collector and move it to AssertingCollector's 
state, right? 

If I understood you correctly, attached patch should fix this. I verified that 
the test the previous iteration added specifically for the out of order docIDs 
catches this issue, but agree that AssertingCollector should have the right 
assertions in place.

 
{quote}Looking at the AssertingCollector again, it has a check that doc IDs are 
collected in doc ID order, so I wonder why this assertion didn't trip with the 
earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we 
just got lucky? 
{quote}
Do you think similar assertions/checks would make sense in IndexSearcher too? 
If AssertingCollector missed this issue, maybe we should make IndexSearcher's 
input arguments validation more robust as well. WDYT?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844595#comment-16844595
 ] 

Adrien Grand commented on LUCENE-8757:
--

[~atris] I think it is still not correct since the values of the docBase/maxDoc 
can only be seen by the current leaf collector while we need this check across 
all leaf collectors that are created from the same collector.

Looking at the AssertingCollector again, it has a check that doc IDs are 
collected in doc ID order, so I wonder why this assertion didn't trip with the 
earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we 
just got lucky? Nevertheless I think it's worth adding another assertion that 
leaves are collected in the right order and that their doc ID space doesn't 
intersect as described above, eg. we could record a {{previousLeafMaxDoc}} at 
the same level as {{maxDoc}} in AssertinCollector, and then in 
{{getLeafCollector}} do something like

{code}
assert context.docBase >= previousLeafMaxDoc; // generally equal, but might be 
greater if some leaves are skipped
previousLeafMaxDoc = context.docBase + context.reader().maxDoc();
{code}

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844470#comment-16844470
 ] 

Atri Sharma commented on LUCENE-8757:
-

[^LUCENE-8757.patch]

 

[~jpountz] Updated the assert, thanks

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844198#comment-16844198
 ] 

Adrien Grand commented on LUCENE-8757:
--

Thanks [~atris]. I think there is a bug in AssertingCollector as 
previousDocBase is always 1? By the way, we don't only need to ensure that 
previousDocBase <= docBase, but even that previousDocBase + previousMaxDoc <= 
docBase?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844048#comment-16844048
 ] 

Atri Sharma commented on LUCENE-8757:
-

Added both, a test and the assertion in AssertingCollector.

 

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844013#comment-16844013
 ] 

Adrien Grand commented on LUCENE-8757:
--

I think we could add an assertion for this in AssertingCollector.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844005#comment-16844005
 ] 

Michael McCandless commented on LUCENE-8757:


{quote}Your last patch sorts in reverse order of docBase, it should sort by the 
natural order?
{quote}
Hmm can we add a test case or an assertion somewhere that would fail if this 
happens again in the future?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843886#comment-16843886
 ] 

Atri Sharma commented on LUCENE-8757:
-

Yeah, I noted that after posting the patch. Attached is an updated version with 
that fixed and the redundant sort removed.

 

Thanks [~jpountz] for pointing it out.

 

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843876#comment-16843876
 ] 

Adrien Grand commented on LUCENE-8757:
--

[~atris] Your last patch sorts in reverse order of docBase, it should sort by 
the natural order?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843807#comment-16843807
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~simonw] Attached is an updated patch

 

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843726#comment-16843726
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] can we instead of asserting the order just sort the slice in LeafSlice 
ctor? This should prevent any issues down the road and it's cheap enough IMO

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-15 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840683#comment-16840683
 ] 

Atri Sharma commented on LUCENE-8757:
-

Hi [~jpountz],

I was going through IndexSearcher code and see that there are no checks that we 
do in IndexSearcher to ensure that LeafReaderContexts are ordered by docID. 
Should we add those checks while constructing a new instance? I think that can 
be orthogonal to this patch, since this patch anyways orders leaf slices by 
docIDs. WDYT?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-13 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838419#comment-16838419
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~jpountz] Thanks, TopDocs#merge is what really opened my eyes to this 
invariant. Attached is an updated patch.

 

Please let me know if it looks fine.

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-13 Thread Simon Willnauer
I think this should be done inside IndexSearcher. It’s a general problem, no?

> On 13. May 2019, at 10:25, Adrien Grand (JIRA)  wrote:
> 
> 
>[ 
> https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838363#comment-16838363
>  ] 
> 
> Adrien Grand commented on LUCENE-8757:
> --
> 
> Yes. Top-docs collectors are expected to tie-break by doc ID in case 
> documents compare equal. Things like TopDocs#merge compare doc IDs explicitly 
> for that purpose, but Collector#collect implementations just rely on the fact 
> that documents are collected in order to ignore documents that compare equal 
> to the current k-th best hit. So we need to sort segments within a slice by 
> docBase in order to get the same top hits regardless of how slices have been 
> constructed.
> 
>> Better Segment To Thread Mapping Algorithm
>> --
>> 
>>Key: LUCENE-8757
>>URL: https://issues.apache.org/jira/browse/LUCENE-8757
>>Project: Lucene - Core
>> Issue Type: Improvement
>>   Reporter: Atri Sharma
>>   Assignee: Simon Willnauer
>>   Priority: Major
>>Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
>> LUCENE-8757.patch
>> 
>> 
>> The current segments to threads allocation algorithm always allocates one 
>> thread per segment. This is detrimental to performance in case of skew in 
>> segment sizes since small segments also get their dedicated thread. This can 
>> lead to performance degradation due to context switching overheads.
>>  
>> A better algorithm which is cognizant of size skew would have better 
>> performance for realistic scenarios
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-13 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838363#comment-16838363
 ] 

Adrien Grand commented on LUCENE-8757:
--

Yes. Top-docs collectors are expected to tie-break by doc ID in case documents 
compare equal. Things like TopDocs#merge compare doc IDs explicitly for that 
purpose, but Collector#collect implementations just rely on the fact that 
documents are collected in order to ignore documents that compare equal to the 
current k-th best hit. So we need to sort segments within a slice by docBase in 
order to get the same top hits regardless of how slices have been constructed.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-12 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838240#comment-16838240
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~jpountz] Do you mean ordering segments within a slice by docBase?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838131#comment-16838131
 ] 

Adrien Grand commented on LUCENE-8757:
--

I think we need to sort by docBase before constructing the slices, otherwise we 
might collect doc IDs out-of-order. By the way we should probably make the 
LeafSlice constructor check that leaves come in order?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837615#comment-16837615
 ] 

Simon Willnauer commented on LUCENE-8757:
-

LGTM I will try to commit this in the coming days

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837056#comment-16837056
 ] 

Atri Sharma commented on LUCENE-8757:
-

Added the segments cap back with additional random testing.

 

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-10 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16837003#comment-16837003
 ] 

Simon Willnauer commented on LUCENE-8757:
-

{quote}
I think there is an important justification for the 2nd criteria (number of 
segments in each work unit / slice), which is if you have an index with some 
large segments, and then with a long tail of small segments (easily happens if 
your machine has substantially CPU concurrency and you use multiple threads), 
since there is a fixed cost for visiting each segment, if you put too many 
small segments into one work unit, those fixed costs multiply and that one work 
unit can become too slow even though it's not actually going to visit too many 
documents.

I think we should keep it?
{quote}

fair enough. lets add it back


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-09 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836281#comment-16836281
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~simonw] Please let me know if you have any further concerns. Happy to address

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-08 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835508#comment-16835508
 ] 

Atri Sharma commented on LUCENE-8757:
-

bq. Are the work units tackled in order for each query?  I.e. is the queue a 
FIFO queue?  If so, the sorting can be useful since IndexSearcher would work 
first on the hardest/slowest work units, the "long poles" for the concurrent 
search?

Yes, the leafslices are tackled in order in IndexSearcher i.e. threads are 
created for work units in the same order in which slices() created the work 
units. So with a sort, what you said will be applicable i.e. the larger work 
units get scheduled first.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-08 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835502#comment-16835502
 ] 

Michael McCandless commented on LUCENE-8757:


Are the work units tackled in order for each query?  I.e. is the queue a FIFO 
queue?  If so, the sorting can be useful since {{IndexSearcher}} would work 
first on the hardest/slowest work units, the "long poles" for the concurrent 
search?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-08 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835498#comment-16835498
 ] 

Michael McCandless commented on LUCENE-8757:


Whoa, fast iterations over here!

I think there is an important justification for the 2nd criteria (number of 
segments in each work unit / slice), which is if you have an index with some 
large segments, and then with a long tail of small segments (easily happens if 
your machine has substantially CPU concurrency and you use multiple threads), 
since there is a fixed cost for visiting each segment, if you put too many 
small segments into one work unit, those fixed costs multiply and that one work 
unit can become too slow even though it's not actually going to visit too many 
documents.

I think we should keep it?

Re: the choice of the constants – I ran some performance tests quite a while 
ago on our production data/queries and a machine with sizable concurrency 
({{i3.16xlarge}}) and found those two constants to be a sweet spot at the time.

But let's also remember: this is simply a default segment -> work units 
assignment, and expert users can always continue to override.  Good defaults 
are important ;)

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-08 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835491#comment-16835491
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~simonw] The reason the sort was added was to have a consistency guarantee 
from the slicing algorithm i.e. two queries with the exact same distribution of 
segments should get the same number of slices, irrespective of the order in 
which the segments are traversed by the method. Consider a distribution of 8 
segments where 6 segments have 10,000 documents each, and two segments have 
130,000 documents each. For the below order of traversal of segments (each 
value represents the maxDoc of the segment):

{10_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 130_000).

The slicing algorithm will create one slice consisting of all segments (since 
the last segment's addition is what causes the maxDocs limit to be breached).

 
If the segments were sorted, the order would be:

{130_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 10_000}

 

This would lead to two slices being created.

Thoughts?



bq. also want to suggest to beef up testing a bit

Thanks, added the test. Will raise another iteration post conclusion on above 
discussion.

 

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-08 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835481#comment-16835481
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Thanks for the additional iteration, now that we simplified this can we remove 
the sorting? I don't necessearily see how the sort makes things simpler. If we 
see a segment > threshold we can just add it as a group? I though you did that 
already and hence my comment about the assertion. WDYT?

I also want to suggest to beef up testing a bit with a randomized version of 
this like this:
{code}
diff --git 
a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java 
b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
index 7c63a817adb..76ccca64ee7 100644
--- a/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
+++ b/lucene/test-framework/src/java/org/apache/lucene/util/LuceneTestCase.java
@@ -1933,6 +1933,14 @@ public abstract class LuceneTestCase extends Assert {
 ret = random.nextBoolean()
 ? new AssertingIndexSearcher(random, r, ex)
 : new AssertingIndexSearcher(random, r.getContext(), ex);
+  } else if (random.nextBoolean()) {
+int maxDocPerSlice = 1 + random.nextInt(10);
+ret = new IndexSearcher(r, ex) {
+  @Override
+  protected LeafSlice[] slices(List leaves) {
+return slices(leaves, maxDocPerSlice);
+  }
+};
   } else {
 ret = random.nextBoolean()
 ? new IndexSearcher(r, ex)
{code}



> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834856#comment-16834856
 ] 

Atri Sharma commented on LUCENE-8757:
-

Hi [~simonw]

bq. if the previous segment was smallish then group is non-null? I think you 
should test these cases, maybe add a random test and randomize the order or the 
segments?

I dont think that case is possible, since we sort LeafReaderContexts based on 
the number of documents per segment in descending order. Hence, no 
LeafReaderContext can be succeded by one which has more documents than its 
predecessor. I agree with your thought of having a random test with variety of 
configurations for segment size distributions.

bq.can and should be replaced by:

Fixed, thanks. 

[^LUCENE-8757.patch] 


 

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834767#comment-16834767
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] I think the assertion in this part doesn't hold:

{code}
+for (LeafReaderContext ctx : sortedLeaves) {
+  if (ctx.reader().maxDoc() > maxDocsPerSlice) {
+assert group == null;
+List singleSegmentSlice = new ArrayList();
{code}

if the previous segment was smallish then _group_ is non-null? I think you 
should test these cases, maybe add a random test and randomize the order or the 
segments?

This:
{code}
+List singleSegmentSlice = new ArrayList();
+
+singleSegmentSlice.add(ctx);
+groupedLeaves.add(singleSegmentSlice);
{code}
can and should be replaced by:

{code}
groupedLeaves.add(Collections.singletonList(ctx));
{code}


otherwise it looks good.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834587#comment-16834587
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~simonw] The reasoning behind adding the second parameter was to ensure that 
we do not bias against the case where there are a large number of small 
segments. For eg, if there are 100 segments and all of them are small, then we 
should still allow parallel searches to get some performance gains. Although 
this should be a rare case since merging will coalesce them.

 

However, I agree with you that this might be contradicting the whole idea of 
adding the 250K docs split point. If all segments together in an index do not 
add up to 250K, then the index is small enough to not need parallelism.

 

Attached is an updated patch

 

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834525#comment-16834525
 ] 

Simon Willnauer commented on LUCENE-8757:
-

[~atris] actually I thought about these defaults again and I am starting to 
think it's an ok default. The reason for this is that we try to prevent having 
dedicated threads for smallish segments so we group them together. I still do 
wonder if we need to have 2 parameters? Wouldn't it be enough to just say that 
we group things together until we have at least 250k docs per thread to be 
searched? is it really necessary to have another parameter that limits the 
number of segmetns per slice? I think a single parameter would be great and 
simpler. WDYT?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834463#comment-16834463
 ] 

Atri Sharma commented on LUCENE-8757:
-

:bq  I don't think we should push this if we already know we wanna do something 
different. That said, I am not convinced the numbers are good defaults. At the 
same time I don't have any numbers here do you have anything to back these 
defaults up?

 

Sure. The reason I was suggesting pushing this patch per se is because the 
other approach we are advancing would require a couple of new semantics to be 
introduced, so we could pote ntially want users to have an option to opt-in for 
either of the two. That said, I believe the cost based algorithm would also 
require some hard defaults to be present – to ensure that small segments do not 
get independent threads even if system had the capacity.

 

RE: The default constant values, these numbers are derived from empirical 
testing across different datasets in ESRally (nyc_taxis, logging) and looking 
at the default segment size distribution of wikipedia10M dataset in luceneutil. 
However, this might not be a good default size to split on.

 

One thing we could do (albeit expensive) is to take the mean number of 
documents in the corresponding LeafReaderContexts for a query as the split 
point. Would that be a better dynamic way?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-07 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1683#comment-1683
 ] 

Simon Willnauer commented on LUCENE-8757:
-

> Would it make sense to push this patch, and then let users consume it and 
> provide feedback while we iterate on the more sophisticated version? We could 
> even have both of the methods available as options to users, potentially

I don't think we should push this if we already know we wanna do something 
different. That said, I am not convinced the numbers are good defaults. At the 
same time I don't have any numbers here do you have anything to back these 
defaults up?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-06 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834065#comment-16834065
 ] 

Atri Sharma commented on LUCENE-8757:
-

Hi [~simonw],

 

Spending a bit more time thinking about your suggestions, I agree that it is a 
great idea, albeit requiring more thought and effort than what this Jira 
proposes to achieve.

I have opened LUCENE-8794 - Cost Based Slice Allocation Algorithm for 
discussing the same. Please share your thoughts.

 

Would it make sense to push this patch, and then let users consume it and 
provide feedback while we iterate on the more sophisticated version? We could 
even have both of the methods available as options to users, potentially

 

Thoughts?

 

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-03 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832382#comment-16832382
 ] 

Atri Sharma commented on LUCENE-8757:
-

[~simonw] Attached is an updated patch.

My two cents are that segregating segments to keep the document count fair is a 
more complex operation that what the slices API does today (and in this patch). 
Fair segmentation is a known hard problem (integer partitioning, for eg).

 

We should also consider how much of a bootstrap time latency would a more 
complex algorithm add. Given that a user has the option of overriding 
IndexSearcher to add their own ways of splicing, I feel our default algorithm 
should do well on the common usecase, but not more than that.

 

Happy to discuss the alternatives.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-03 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832343#comment-16832343
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Thanks [~atris], can you bring back the javadocs for 
{code:java}
protected LeafSlice[] slices(List leaves){code}

please don't reassign an argument like here:


{code:java}
leaves = new ArrayList<>(leaves);
{code}

The rest of the patch looks OK to me yet I am not so sure about the defaults. I 
do wonder if we should look at this from a different perspective. Rather than 
using hard numbers can we try to evenly balance the total number of documents 
across N threads and make N the variable? [~mikemccand] WDYT?


> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-02 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16832233#comment-16832233
 ] 

Atri Sharma commented on LUCENE-8757:
-

[^LUCENE-8757.patch]Hi [~simonw]

 

Thanks for reviewing the patch and the comments. Attached is an updated patch.


Regards,


Atri

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-02 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831591#comment-16831591
 ] 

Simon Willnauer commented on LUCENE-8757:
-

Hey Atri,

thanks for putting up this patch, here is some additional feedback:
 - can we stick with an protected non-static method on IndexSearcher subclasses 
should be able to override your impl. I think it's ok to have a static method 
like this:
{code:java}
 public static LeafSlice[] slices (List leaves, int 
maxDocsPerSlice, int maxSegPerSlice){code}
that you can call from the protected method with your defaults?
 - you might want to change your sort to something like this: 
{code:java}
Collections.sort(leaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc(;{code}

 - I think the _Leaves_ class is unnecessary we can just use 
_List_ instead?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-01 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830958#comment-16830958
 ] 

Atri Sharma commented on LUCENE-8757:
-

Hi [~mikemccand]

 

Thanks for taking a look at the patch. I have attached an updated patch which 
fixes your comments.

For the constants, I think I was being too conservative in not using too many 
threads :)

 

Please let me know if the current patch seems sane

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-04-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16830173#comment-16830173
 ] 

Michael McCandless commented on LUCENE-8757:


Thanks [~atris] – I agree it's important to have better defaults for how we 
coalesce segments into per-query-per-thread work units.  A few small comments:
 * Can you insert {{_}} in the big number constants (e.g. {{2500}})?  Makes 
it easier to read, and open-source code is written for reading :)
 * I think something is wrong with {{docSum}} – you only set it, and never add 
to it?  I think the intention is to sum up docs in multiple adjacent (sorted by 
{{maxDoc}}) segments until that count exceeds {{2500}}?
 * How did you pick {{2500}} and {{100}} as good constants?  We are using 
much smaller values in our production infrastructure – {{250_000}} and {{5}}, 
admittedly after only a little experimentation. 
 * Can you add some tests?  You can maybe make the slice method a package 
private static method and then create test cases with "interesting" 
{{LeafReaderContext}} combinations?  In particular, a test case exposing the 
{{docSum}} bug would be great, then fix that bug, then see the test case pass.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-04-22 Thread Atri Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823295#comment-16823295
 ] 

Atri Sharma commented on LUCENE-8757:
-

Attached is a first cut for the patch. The main idea there is that smaller 
segments are coalesced into single threads with a hard cap on the number of 
segments per thread to avoid overwhelming a single thread.

 

This can be enhanced with taking number of cores on the machine and current CPU 
utilization, but that might lead to a higher IndexSearcher bootstrap time.

 

Comments and thoughts are welcome.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
> Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org