On Thu, May 26, 2011 at 6:52 PM, Rahul Warawdekar
<rahul.warawde...@gmail.com> wrote:
> Hi All,
>
> I am using Solr 3.1 for one of our search based applications.
> We are using DIH to index our data and TikaEntityProcessor to index
> attachments.
> Currently we are running into an issue while extracting content from one of
> our MS Excel 2007 files, using TikaEntityProcessor.
[...]

Have not done this with Tika, but we have run into similar
issues while trying to convert Microsoft Word documents
externally, before indexing to Solr. It turned out in our case
that these documents were referring external URLs, which
were not always accessible to our converter sitting behind
a firewall.

> Also, does someone know of a way to just skip this type of behaviour for
> that file and move to the next document to be indexed ?
[...]

This is probably not of much help to you, but what we ended
up doing was killing a conversion process that was taking
longer than a maximum time.

Regards,
Gora

Reply via email to