[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication
[ https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805657#comment-15805657 ] Joel Bernstein commented on SOLR-9636: -- I'll also test javabin with gatherNodes() graph traversal. gatherNodes simply passes through the parameters to CloudSolrStream so it's easy just take off and on the writer type and test performance. > Add support for javabin for /stream, /sql internode communication > - > > Key: SOLR-9636 > URL: https://issues.apache.org/jira/browse/SOLR-9636 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9636.patch > > > currently it uses json, which is verbose and slow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication
[ https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795244#comment-15795244 ] Joel Bernstein commented on SOLR-9636: -- I will look at gathering this as well. And also look a GC's, which in theory should be less frequent. > Add support for javabin for /stream, /sql internode communication > - > > Key: SOLR-9636 > URL: https://issues.apache.org/jira/browse/SOLR-9636 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9636.patch > > > currently it uses json, which is verbose and slow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication
[ https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787304#comment-15787304 ] Noble Paul commented on SOLR-9636: -- Another useful metric would be to measure the memory used in both json and javabin formats > Add support for javabin for /stream, /sql internode communication > - > > Key: SOLR-9636 > URL: https://issues.apache.org/jira/browse/SOLR-9636 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9636.patch > > > currently it uses json, which is verbose and slow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication
[ https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785486#comment-15785486 ] Joel Bernstein commented on SOLR-9636: -- I added a new NullStream to test the performance of exporting and sorting on a high cardinality field. This is a much more real world scenario for supporting distributed joins on primary keys. The query looks like this: {code} parallel(collection2, workers=7, sort="count desc", null(search(collection1, q=*:*, fl="id", sort="id desc", qt="/export", wt="javabin", partitionKeys=id))) {code} Notice the new *null* function which eats the tuples and returns a count to verify the number of tuples processed. The test query is sorting on the id field which has a unique value in each record. Again performance was impressive: * With json: 1,210,000 Tuples per second. * With javabin: 1,350,000 Tuples per second. So the ExportWriter doesn't slow down sorting on a high cardinality field. > Add support for javabin for /stream, /sql internode communication > - > > Key: SOLR-9636 > URL: https://issues.apache.org/jira/browse/SOLR-9636 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9636.patch > > > currently it uses json, which is verbose and slow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication
[ https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782945#comment-15782945 ] Joel Bernstein commented on SOLR-9636: -- After increasing the heap size for the test Solr instance to 6g I saw a large boost in throughput. I'll update the numbers above. > Add support for javabin for /stream, /sql internode communication > - > > Key: SOLR-9636 > URL: https://issues.apache.org/jira/browse/SOLR-9636 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9636.patch > > > currently it uses json, which is verbose and slow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication
[ https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781505#comment-15781505 ] Noble Paul commented on SOLR-9636: -- Waiting to see the numbers with bated breath > Add support for javabin for /stream, /sql internode communication > - > > Key: SOLR-9636 > URL: https://issues.apache.org/jira/browse/SOLR-9636 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9636.patch > > > currently it uses json, which is verbose and slow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication
[ https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781496#comment-15781496 ] Joel Bernstein commented on SOLR-9636: -- I finally had the time to test out the javabin writer with the /export handler and streaming stack. My initial findings are really good. Here is a summary: 1) Currently testing must be done on branch_6x. There is a bug in master which breaks the /export handler. I haven't gotten to the bottom yet but I'm pretty sure it was introduced with the new docValues iterator API which is only in master. I will open a ticket for this bug shortly and see if I can fix the problem. But testing in branch_6x is better anyway as it won't be testing both the docValues iterator API performance at the same time as the javabin /export performance. 2) For my test I worked on a single Solr instance with a single data shard loaded with 10,000,000 small documents. I also created a worker collection with 5 shards. The I ran the following expression with and without the javabin writer. {code} parallel(collection2, workers=5, sort="test_s desc", rollup(over="test_s", sum(price_f), search(collection1, q=*:*, fl="test_s, price_f", sort="test_s desc", qt="/export", wt="javabin", partitionKeys=test_s))) {code} > Add support for javabin for /stream, /sql internode communication > - > > Key: SOLR-9636 > URL: https://issues.apache.org/jira/browse/SOLR-9636 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: master (7.0), 6.4 > > Attachments: SOLR-9636.patch > > > currently it uses json, which is verbose and slow -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org