Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-27 Thread Gora Mohanty
On Thu, May 26, 2011 at 6:52 PM, Rahul Warawdekar
rahul.warawde...@gmail.com wrote:
 Hi All,

 I am using Solr 3.1 for one of our search based applications.
 We are using DIH to index our data and TikaEntityProcessor to index
 attachments.
 Currently we are running into an issue while extracting content from one of
 our MS Excel 2007 files, using TikaEntityProcessor.
[...]

Have not done this with Tika, but we have run into similar
issues while trying to convert Microsoft Word documents
externally, before indexing to Solr. It turned out in our case
that these documents were referring external URLs, which
were not always accessible to our converter sitting behind
a firewall.

 Also, does someone know of a way to just skip this type of behaviour for
 that file and move to the next document to be indexed ?
[...]

This is probably not of much help to you, but what we ended
up doing was killing a conversion process that was taking
longer than a maximum time.

Regards,
Gora


Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
Hi All,

I am using Solr 3.1 for one of our search based applications.
We are using DIH to index our data and TikaEntityProcessor to index
attachments.
Currently we are running into an issue while extracting content from one of
our MS Excel 2007 files, using TikaEntityProcessor.

The issue is the TikaEntityProcessor is hung without throwing any exception
which in tuen causes the indexing to be hung on the server.

Has anyone faced a similar kind of issue in the past with
TikaEntityProcessor ?

Also, does someone know of a way to just skip this type of behaviour for
that file and move to the next document to be indexed ?



-- 
Thanks and Regards
Rahul A. Warawdekar


Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Markus Jelsma
Can you rule out Tika or Solr by trying to parse the file with a stand-alone 
Tika?

 Hi All,
 
 I am using Solr 3.1 for one of our search based applications.
 We are using DIH to index our data and TikaEntityProcessor to index
 attachments.
 Currently we are running into an issue while extracting content from one of
 our MS Excel 2007 files, using TikaEntityProcessor.
 
 The issue is the TikaEntityProcessor is hung without throwing any exception
 which in tuen causes the indexing to be hung on the server.
 
 Has anyone faced a similar kind of issue in the past with
 TikaEntityProcessor ?
 
 Also, does someone know of a way to just skip this type of behaviour for
 that file and move to the next document to be indexed ?


Re: Issue while extracting content from MS Excel 2007 file using TikaEntityProcessor

2011-05-26 Thread Rahul Warawdekar
Hi Markus,

It is Tika.
I tried using tika standalone.

On 5/26/11, Markus Jelsma markus.jel...@openindex.io wrote:
 Can you rule out Tika or Solr by trying to parse the file with a stand-alone
 Tika?

 Hi All,

 I am using Solr 3.1 for one of our search based applications.
 We are using DIH to index our data and TikaEntityProcessor to index
 attachments.
 Currently we are running into an issue while extracting content from one
 of
 our MS Excel 2007 files, using TikaEntityProcessor.

 The issue is the TikaEntityProcessor is hung without throwing any
 exception
 which in tuen causes the indexing to be hung on the server.

 Has anyone faced a similar kind of issue in the past with
 TikaEntityProcessor ?

 Also, does someone know of a way to just skip this type of behaviour for
 that file and move to the next document to be indexed ?



-- 
Thanks and Regards
Rahul A. Warawdekar