[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2017-01-06 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805657#comment-15805657
 ] 

Joel Bernstein commented on SOLR-9636:
--

I'll also test javabin with gatherNodes() graph traversal. gatherNodes simply 
passes through the parameters to CloudSolrStream so it's easy just take off and 
on the writer type and test performance.

> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2017-01-03 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795244#comment-15795244
 ] 

Joel Bernstein commented on SOLR-9636:
--

I will look at gathering this as well. And also look a GC's, which in theory 
should be less frequent.

> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2016-12-30 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787304#comment-15787304
 ] 

Noble Paul commented on SOLR-9636:
--

Another useful metric would be to measure the memory used in both json and 
javabin formats

> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2016-12-29 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785486#comment-15785486
 ] 

Joel Bernstein commented on SOLR-9636:
--

I added a new NullStream to test the performance of exporting and sorting on a 
high cardinality field. This is a much more real world scenario for supporting 
distributed joins on primary keys. The query looks like this:
 
{code}
parallel(collection2, workers=7, sort="count desc", 
  null(search(collection1, 
   q=*:*, 
   fl="id", 
   sort="id desc", 
   qt="/export", 
   wt="javabin", 
   partitionKeys=id)))
{code}

Notice the new *null* function which eats the tuples and returns a count to 
verify the number of tuples processed.

The test query is sorting on the id field which has a unique value in each 
record. Again performance was impressive:

* With json: 1,210,000 Tuples per second.
* With javabin: 1,350,000 Tuples per second.

So the ExportWriter doesn't slow down sorting on a high cardinality field.





> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2016-12-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15782945#comment-15782945
 ] 

Joel Bernstein commented on SOLR-9636:
--

After increasing the heap size for the test Solr instance to 6g I saw a large 
boost in throughput. I'll update the numbers above.

> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2016-12-27 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781505#comment-15781505
 ] 

Noble Paul commented on SOLR-9636:
--

Waiting to see the numbers with bated breath

> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9636) Add support for javabin for /stream, /sql internode communication

2016-12-27 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15781496#comment-15781496
 ] 

Joel Bernstein commented on SOLR-9636:
--

I finally had the time to test out the javabin writer with the /export handler 
and streaming stack. My initial findings are really good. Here is a summary:

1) Currently testing must be done on branch_6x. There is a bug in master which 
breaks the /export handler. I haven't gotten to the bottom yet but I'm pretty 
sure it was introduced with the new docValues iterator API which is only in 
master. I will open a ticket for this bug shortly and see if I can fix the 
problem.

But testing in branch_6x is better anyway as it won't be testing both the 
docValues iterator API performance at the same time as the javabin /export 
performance.

2) For my test I worked on a single Solr instance with a single data shard 
loaded with 10,000,000 small documents. I also created a worker collection with 
5 shards. The I ran the following expression with and without the javabin 
writer.
{code}
parallel(collection2, workers=5, sort="test_s desc", 
rollup(over="test_s", sum(price_f),
  search(collection1, q=*:*, fl="test_s, price_f", 
sort="test_s desc", qt="/export", wt="javabin", partitionKeys=test_s)))
{code}




> Add support for javabin for /stream, /sql internode communication
> -
>
> Key: SOLR-9636
> URL: https://issues.apache.org/jira/browse/SOLR-9636
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: master (7.0), 6.4
>
> Attachments: SOLR-9636.patch
>
>
> currently it uses json, which is verbose and slow



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org