Hi,

 

The document that I am trying to index with DIH contains an entity with
fields queried from a DB and an entity with the content of a file extracted
with TikaEntityProcessor. I was testing the onError="skip" option with
TikaEntityProcessor and found out it does not work. It basically behaves
like an onError="continue". I.e. the document still ends up in my index with
the DB fields but no file content. This is a problem because my index is
inconsistent with respect to my business data.

 

It seems that the issue lies in EntityProcessorWrapper which swallows
exceptions from nextRow() unless onError="abort". So is it safe to say that
this option just does not work? Can somebody please suggest an alternative
that would enable me to import all or nothing?

 

1 more observation: TikaEntityProcessor line 132 does not close the
InputStream in a finally clause; if parsing fails it remains open.

 

Thanks,

Reeza

Reply via email to