Hi,
The document that I am trying to index with DIH contains an entity with fields queried from a DB and an entity with the content of a file extracted with TikaEntityProcessor. I was testing the onError="skip" option with TikaEntityProcessor and found out it does not work. It basically behaves like an onError="continue". I.e. the document still ends up in my index with the DB fields but no file content. This is a problem because my index is inconsistent with respect to my business data. It seems that the issue lies in EntityProcessorWrapper which swallows exceptions from nextRow() unless onError="abort". So is it safe to say that this option just does not work? Can somebody please suggest an alternative that would enable me to import all or nothing? 1 more observation: TikaEntityProcessor line 132 does not close the InputStream in a finally clause; if parsing fails it remains open. Thanks, Reeza