And just to fill in the blanks that I missed before, DataImportHandler currently does handle a single content stream. One stream is pretty much all I've ever used, but there can be more than one and it would seem rude for a handler to ignore them.

I still think subclassing ContentStreamHandlerBase and doing the work in ContentStreamHandlerBase#load seems the best way to go.

Mainly I was just curious about UpdateRequestProcessor#finish though, which DIH currently does not call (that I see).

        Erik

On Jul 31, 2009, at 9:26 PM, Erik Hatcher wrote:

Shouldn't DIH, I presume in either SolrWriter or DataImportHandler, call processor.finish()?

Maybe DataImportHandler should subclass ContentStreamHandlerBase, which calls #finish already. This would mean we implement a new ContentStreamLoader. This would allow DIH to hand the streams off as either data sources or data to entities, right? This is where we want to head with Tika integration into DIH, methinks.

Thoughts?

        Erik

Reply via email to