And just to fill in the blanks that I missed before, DataImportHandler
currently does handle a single content stream. One stream is pretty
much all I've ever used, but there can be more than one and it would
seem rude for a handler to ignore them.
I still think subclassing ContentStreamHandlerBase and doing the work
in ContentStreamHandlerBase#load seems the best way to go.
Mainly I was just curious about UpdateRequestProcessor#finish though,
which DIH currently does not call (that I see).
Erik
On Jul 31, 2009, at 9:26 PM, Erik Hatcher wrote:
Shouldn't DIH, I presume in either SolrWriter or DataImportHandler,
call processor.finish()?
Maybe DataImportHandler should subclass ContentStreamHandlerBase,
which calls #finish already. This would mean we implement a new
ContentStreamLoader. This would allow DIH to hand the streams off
as either data sources or data to entities, right? This is where we
want to head with Tika integration into DIH, methinks.
Thoughts?
Erik