Right, that is why we batch. When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. Then it can report the exact document with problems.
If you want to continue, go back to the bigger batch size. I usually fail the whole batch on one error. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 7, 2014, at 11:44 AM, Peter Keegan <peterlkee...@gmail.com> wrote: > I'm seeing 9X throughput with 1000 docs/batch vs 1 doc/batch, with a single > thread, so it's certainly worth it. > > Thanks, > Peter > > > On Fri, Nov 7, 2014 at 2:18 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> And Walter has also been around for a _long_ time ;) >> >> (sorry, couldn't resist).... >> >> Erick >> >> On Fri, Nov 7, 2014 at 11:12 AM, Walter Underwood <wun...@wunderwood.org> >> wrote: >>> Yes, I implemented exactly that fallback for Solr 1.2 at Netflix. >>> >>> It isn’t to hard if the code is structured for it; retry with a batch >> size of 1. >>> >>> wunder >>> >>> On Nov 7, 2014, at 11:01 AM, Erick Erickson <erickerick...@gmail.com> >> wrote: >>> >>>> Yeah, this has been an ongoing issue for a _long_ time. Basically, >>>> you can't. So far, people have essentially written fallback logic to >>>> index the docs of a failing packet one at a time and report it. >>>> >>>> I'd really like better reporting back, but we haven't gotten there yet. >>>> >>>> Best, >>>> Erick >>>> >>>> On Fri, Nov 7, 2014 at 8:25 AM, Peter Keegan <peterlkee...@gmail.com> >> wrote: >>>>> How are folks handling Solr exceptions that occur during batch >> indexing? >>>>> Solr stops parsing the docs stream when an error occurs (e.g. a doc >> with a >>>>> missing mandatory field), and stops indexing the batch. The bad >> document is >>>>> not identified, so it would be hard for the client to recover by >> skipping >>>>> over it. >>>>> >>>>> Peter >>> >>