Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor
On Thu, May 26, 2011 at 6:52 PM, Rahul Warawdekar rahul.warawde...@gmail.com wrote: Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. [...] Have not done this with Tika, but we have run into similar issues while trying to convert Microsoft Word documents externally, before indexing to Solr. It turned out in our case that these documents were referring external URLs, which were not always accessible to our converter sitting behind a firewall. Also, does someone know of a way to just skip this type of behaviour for that file and move to the next document to be indexed ? [...] This is probably not of much help to you, but what we ended up doing was killing a conversion process that was taking longer than a maximum time. Regards, Gora
Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor
Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. The issue is the TikaEntityProcessor is hung without throwing any exception which in tuen causes the indexing to be hung on the server. Has anyone faced a similar kind of issue in the past with TikaEntityProcessor ? Also, does someone know of a way to just skip this type of behaviour for that file and move to the next document to be indexed ? -- Thanks and Regards Rahul A. Warawdekar
Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor
Can you rule out Tika or Solr by trying to parse the file with a stand-alone Tika? Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. The issue is the TikaEntityProcessor is hung without throwing any exception which in tuen causes the indexing to be hung on the server. Has anyone faced a similar kind of issue in the past with TikaEntityProcessor ? Also, does someone know of a way to just skip this type of behaviour for that file and move to the next document to be indexed ?
Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor
Hi Markus, It is Tika. I tried using tika standalone. On 5/26/11, Markus Jelsma markus.jel...@openindex.io wrote: Can you rule out Tika or Solr by trying to parse the file with a stand-alone Tika? Hi All, I am using Solr 3.1 for one of our search based applications. We are using DIH to index our data and TikaEntityProcessor to index attachments. Currently we are running into an issue while extracting content from one of our MS Excel 2007 files, using TikaEntityProcessor. The issue is the TikaEntityProcessor is hung without throwing any exception which in tuen causes the indexing to be hung on the server. Has anyone faced a similar kind of issue in the past with TikaEntityProcessor ? Also, does someone know of a way to just skip this type of behaviour for that file and move to the next document to be indexed ? -- Thanks and Regards Rahul A. Warawdekar