[ 
https://issues.apache.org/jira/browse/SOLR-906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655298#action_12655298
 ] 

Ryan McKinley commented on SOLR-906:
------------------------------------

One basic problem with calling add( SolrInputDocument) with the 
CommonsHttpSolrServer is that it logs a request for each document.  This can be 
a substantial impact.  For example while indexing 40K docs on my machine, it 
takes ~3 1/2 mins.  If I turn logging off the time drops to ! 2 1/2 mins.  With 
the streaming approach, the time drops to 20sec!   Some of that is obviously 
because it limits the logging:
{code}
INFO: {add=[id1,id2,id3,id4, ...(38293 more)]} 0 20714
{code}

> Buffered / Streaming SolrServer implementaion
> ---------------------------------------------
>
>                 Key: SOLR-906
>                 URL: https://issues.apache.org/jira/browse/SOLR-906
>             Project: Solr
>          Issue Type: New Feature
>          Components: clients - java
>            Reporter: Ryan McKinley
>             Fix For: 1.4
>
>
> While indexing lots of documents, the CommonsHttpSolrServer add( 
> SolrInputDocument ) is less then optimal.  This makes a new request for each 
> document.
> With a "StreamingHttpSolrServer", documents are buffered and then written to 
> a single open Http connection.
> For related discussion see:
> http://www.nabble.com/solr-performance-tt9055437.html#a20833680

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to