mmmmmm one broken document in a batch should not break the entire batch , right ( whatever approach used) ? Are you referring to the fact that you want to programmatically re-index the broken docs ?
Would be interesting to return the id of the broken docs along with the solr update response! Cheers On 6 October 2015 at 15:30, Bill Dueber <b...@dueber.com> wrote: > Just to add...my informal tests show that batching has waaaaay more effect > than solrj vs json. > > I haven't look at CUSC in a while, last time I looked it was impossible to > do anything smart about error handling, so check that out before you get > too deeply into it. We use a strategy of sending a batch of json documents, > and if it returns an error sending each record one at a time until we find > the bad one and can log something useful. > > > > On Mon, Oct 5, 2015 at 12:07 PM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > Thanks Erick, > > you confirmed my impressions! > > Thank you very much for the insights, an other opinion is welcome :) > > > > Cheers > > > > 2015-10-05 14:55 GMT+01:00 Erick Erickson <erickerick...@gmail.com>: > > > > > SolrJ tends to be faster for several reasons, not the least of which > > > is that it sends packets to Solr in a more efficient binary format. > > > > > > Batching is critical. I did some rough tests using SolrJ and sending > > > docs one at a time gave a throughput of < 400 docs/second. > > > Sending 10 gave 2,300 or so. Sending 100 at a time gave > > > over 5,300 docs/second. Curiously, 1,000 at a time gave only > > > marginal improvement over 100. This was with a single thread. > > > YMMV of course. > > > > > > CloudSolrClient is definitely the better way to go with SolrCloud, > > > it routes the docs to the correct leader instead of having the > > > node you send the docs to do the routing. > > > > > > Best, > > > Erick > > > > > > On Mon, Oct 5, 2015 at 4:57 AM, Alessandro Benedetti > > > <abenede...@apache.org> wrote: > > > > I was doing some studies and analysis, just wondering in your opinion > > > which > > > > one is the best approach to use to index in Solr to reach the best > > > > throughput possible. > > > > I know that a lot of factor are affecting Indexing time, so let's > only > > > > focus in the feeding approach. > > > > Let's isolate different scenarios : > > > > > > > > *Single Solr Infrastructure* > > > > > > > > 1) Xml/Json batch request to /update IndexHandler (xml/json) > > > > > > > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin) > > > > I was thinking this to be the fastest approach for a multi threaded > > > > indexing application. > > > > Posting batch of docs if possible per request. > > > > > > > > *Solr Cloud* > > > > > > > > 1) Xml/Json batch request to /update IndexHandler(xml/json) > > > > > > > > 2) SolrJ ConcurrentUpdateSolrClient ( javabin) > > > > > > > > 3) CloudSolrClient ( javabin) > > > > it seems the best approach accordingly to this improvements [1] > > > > > > > > What are your opinions ? > > > > > > > > A bonus observation should be for using some Map/Reduce big data > > indexer, > > > > but let's assume we don't have a big cluster of cpus, but the average > > > > Indexer server. > > > > > > > > > > > > [1] > > > > > > > > > > https://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ > > > > > > > > > > > > Cheers > > > > > > > > > > > > -- > > > > -------------------------- > > > > > > > > Benedetti Alessandro > > > > Visiting card : http://about.me/alessandro_benedetti > > > > > > > > "Tyger, tyger burning bright > > > > In the forests of the night, > > > > What immortal hand or eye > > > > Could frame thy fearful symmetry?" > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > > > > -- > Bill Dueber > Library Systems Programmer > University of Michigan Library > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England