Right, that is why we batch.

When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. 
Then it can report the exact document with problems.

If you want to continue, go back to the bigger batch size. I usually fail the 
whole batch on one error.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Nov 7, 2014, at 11:44 AM, Peter Keegan <peterlkee...@gmail.com> wrote:

> I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single
> thread, so it's certainly worth it.
> 
> Thanks,
> Peter
> 
> 
> On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> And Walter has also been around for a _long_ time ;)
>> 
>> (sorry, couldn't resist)....
>> 
>> Erick
>> 
>> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
>>> 
>>> It isn’t to hard if the code is structured for it; retry with a batch
>> size of 1.
>>> 
>>> wunder
>>> 
>>> On Nov 7, 2014, at 11:01 AM, Erick Erickson <erickerick...@gmail.com>
>> wrote:
>>> 
>>>> Yeah, this has been an ongoing issue for a _long_ time. Basically,
>>>> you can't. So far, people have essentially written fallback logic to
>>>> index the docs of a failing packet one at a time and report it.
>>>> 
>>>> I'd really like better reporting back, but we haven't gotten there yet.
>>>> 
>>>> Best,
>>>> Erick
>>>> 
>>>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <peterlkee...@gmail.com>
>> wrote:
>>>>> How are folks handling Solr exceptions that occur during batch
>> indexing?
>>>>> Solr stops parsing the docs stream when an error occurs (e.g. a doc
>> with a
>>>>> missing mandatory field), and stops indexing the batch. The bad
>> document is
>>>>> not identified, so it would be hard for the client to recover by
>> skipping
>>>>> over it.
>>>>> 
>>>>> Peter
>>> 
>> 

Reply via email to