[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-04-22 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: SOLR-13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> SOLR-13337.patch, SOLR-13337.patch, cvrg.jpg, screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-04-22 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: (was: SOLR-13337.patch)

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> SOLR-13337.patch, SOLR-13337.patch, cvrg.jpg, screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-04-22 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: SOLR-13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> SOLR-13337.patch, SOLR-13337.patch, cvrg.jpg, screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-04-22 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: cvrg.jpg

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> SOLR-13337.patch, cvrg.jpg, screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-04-21 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: SOLR-13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> SOLR-13337.patch, screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-04-21 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: (was: SOLR-13337.patch)

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-04-21 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: SOLR-13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> SOLR-13337.patch, screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: SOLR-13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, 
> screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-27 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: screenshot-1.png

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch, screenshot-1.png
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: SOLR-13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch, SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted sever with more memory (.5g -> 8g)
>  * request completed in ~18k ms
>  
> I don't see how I'm supposed to unit test the functionality given it requires 
> a cloud instance and sufficient data to give measurable difference with or 
> without extra request arguments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Description: 
When the TermsComponet distributes across all shards, all (terms.limit=-1) are 
returned.

This ought not to be needed when using terms.sort=index.

When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s to 
do a

/terms?terms.fl=register=index=a I did not try it on 
production data (10x)

I do get the reason for getting all terms when sorting by count, however when 
sorting by index, no more than the terms.limit number rows is required from any 
shard. Most likely some will get discarded due to presence in more than one 
shard. Given no term.min/maxcount (which definetely throws a spanner in the 
works).

 

I've attached what I think would do the trick.

I haven't actually tested the patch (it compiles, however some other files in 
the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")

 

Might be somewhat related issue (SOLR-2908). I didn't quite get the more subtle 
information in it.

 

 

Tested by
 * applying patch to 7.7.1 (the one we use in production)
 * start up on spare server (during off house on test system)
 * add a replica from a collection (so that it'll serve requests)
 * request /terms?terms.fl=phrase.title=index=a from the 
instance ~30 ms
 * request the same from another unpatched instance ~17k ms
 * both returned same result
 * added terms.mincount=2 to the quick request. failed with out of memory
 * restarted sever with more memory (.5g -> 8g)
 * request completed in ~18k ms

 

I don't see how I'm supposed to unit test the functionality given it requires a 
cloud instance and sufficient data to give measurable difference with or 
without extra request arguments.

 

  was:
When the TermsComponet distributes across all shards, all (terms.limit=-1) are 
returned.

This ought not to be needed when using terms.sort=index.

When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s to 
do a

/terms?terms.fl=register=index=a I did not try it on 
production data (10x)

I do get the reason for getting all terms when sorting by count, however when 
sorting by index, no more than the terms.limit number rows is required from any 
shard. Most likely some will get discarded due to presence in more than one 
shard. Given no term.min/maxcount (which definetely throws a spanner in the 
works).

 

I've attached what I think would do the trick.

I haven't actually tested the patch (it compiles, however some other files in 
the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")

 

Might be somewhat related issue (SOLR-2908). I didn't quite get the more subtle 
information in it.

 

 

 


> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
> Tested by
>  * applying patch to 7.7.1 (the one we use in production)
>  * start up on spare server (during off house on test system)
>  * add a replica from a collection (so that it'll serve requests)
>  * request /terms?terms.fl=phrase.title=index=a from 
> the instance ~30 ms
>  * request the same from another unpatched instance ~17k ms
>  * both returned same result
>  * added terms.mincount=2 to the quick request. failed with out of memory
>  * restarted 

[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-26 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Description: 
When the TermsComponet distributes across all shards, all (terms.limit=-1) are 
returned.

This ought not to be needed when using terms.sort=index.

When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s to 
do a

/terms?terms.fl=register=index=a I did not try it on 
production data (10x)

I do get the reason for getting all terms when sorting by count, however when 
sorting by index, no more than the terms.limit number rows is required from any 
shard. Most likely some will get discarded due to presence in more than one 
shard. Given no term.min/maxcount (which definetely throws a spanner in the 
works).

 

I've attached what I think would do the trick.

I haven't actually tested the patch (it compiles, however some other files in 
the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")

 

Might be somewhat related issue (SOLR-2908). I didn't quite get the more subtle 
information in it.

 

 

 

  was:
When the TermsComponet distributes across all shards, all (terms.limit=-1) are 
returned.

This ought not to be needed when using terms.sort=index.

When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s to 
do a

/terms?terms.fl=register=index=a I did not try it on 
production data (10x)

I do get the reason for getting all terms when sorting by count, however when 
sorting by index, no more than the terms.limit number rows is required from any 
shard. Most likely some will get discarded due to presence in more than one 
shard. Given no term.min/maxcount (which definetely throws a spanner in the 
works).

 

I've attached what I think would do the trick.

I haven't actually tested the patch (it compiles, however some other files in 
the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")

 

Might be somewhat related issue (SOLR-2908). I didn't quite get the more subtle 
information in it.


> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-22 Thread Mikhail Khludnev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13337:

Attachment: SOLR-13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch, SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: (was: SOLR-13337.patch)

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: (was: 
0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch)

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-21 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: SOLR-13337.patch
0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: SOLR-13337.patch
0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 
> 0001-SOLR-13337-Avoid-requesting-unneeded-terms-from-shar.patch, 
> SOLR-13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: (was: 13337.patch)

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: 13337.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: (was: terms-component-index-order-speedup.patch)

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: 13337.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-13337) TermsComponent sharded and terms.sort=index performance

2019-03-20 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Morten Bøgeskov updated SOLR-13337:
---
Attachment: terms-component-index-order-speedup.patch

> TermsComponent sharded and terms.sort=index performance
> ---
>
> Key: SOLR-13337
> URL: https://issues.apache.org/jira/browse/SOLR-13337
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Affects Versions: 7.7
> Environment: Linux 64bit debian
> 20-node cluster
>Reporter: Morten Bøgeskov
>Priority: Minor
> Attachments: terms-component-index-order-speedup.patch
>
>
> When the TermsComponet distributes across all shards, all (terms.limit=-1) 
> are returned.
> This ought not to be needed when using terms.sort=index.
> When using terms.lower=a in small test base (400k entries) it took 8.5-11.5s 
> to do a
> /terms?terms.fl=register=index=a I did not try it on 
> production data (10x)
> I do get the reason for getting all terms when sorting by count, however when 
> sorting by index, no more than the terms.limit number rows is required from 
> any shard. Most likely some will get discarded due to presence in more than 
> one shard. Given no term.min/maxcount (which definetely throws a spanner in 
> the works).
>  
> I've attached what I think would do the trick.
> I haven't actually tested the patch (it compiles, however some other files in 
> the checkout I have doesn't: ant compile, javac: "error: cannot find symbol")
>  
> Might be somewhat related issue (SOLR-2908). I didn't quite get the more 
> subtle information in it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org