You can test the standalone content extraction with the tika-app.jar -

Command to output in text format -
java -jar tika-app-0.8.jar --text file_path

For more options java -jar tika-app-0.8.jar --help

Use the correct tika-app version jar matching the Solr build.

Regards,
Jayendra

On Wed, Aug 10, 2011 at 1:53 PM, Tim AtLee <timat...@gmail.com> wrote:
> Hello
>
> So, I'm a newbie to Solr and Tika and whatnot, so please use simple words
> for me :P
>
> I am running Solr on Tomcat 7 on Windows Server 2008 r2, running as the
> search engine for a Drupal web site.
>
> Up until recently, everything has been fine - searching works, faceting
> works, etc.
>
> Recently a user uploaded a 5mb xltm file, which seems to be causing Tomcat
> to spike in CPU usage, and eventually error out.  When the documents are
> submitted to be index, the tomcat process spikes up to use 100% of 1
> available CPU, with the eventual error in Drupal of "Exception occured
> sending *sites/default/files/nodefiles/533/June 30, 2011.xltm* to Solr "0"
> Status: Communication Error".
>
> I am looking for some help in figuring out where to troubleshoot this.  I
> assume it's this file, but I guess I'd like to be sure - so how can I submit
> this file for content extraction manually to see what happens?
>
> Thanks,
>
> Tim
>

Reply via email to