You can test the standalone content extraction with the tika-app.jar - Command to output in text format - java -jar tika-app-0.8.jar --text file_path
For more options java -jar tika-app-0.8.jar --help Use the correct tika-app version jar matching the Solr build. Regards, Jayendra On Wed, Aug 10, 2011 at 1:53 PM, Tim AtLee <timat...@gmail.com> wrote: > Hello > > So, I'm a newbie to Solr and Tika and whatnot, so please use simple words > for me :P > > I am running Solr on Tomcat 7 on Windows Server 2008 r2, running as the > search engine for a Drupal web site. > > Up until recently, everything has been fine - searching works, faceting > works, etc. > > Recently a user uploaded a 5mb xltm file, which seems to be causing Tomcat > to spike in CPU usage, and eventually error out. When the documents are > submitted to be index, the tomcat process spikes up to use 100% of 1 > available CPU, with the eventual error in Drupal of "Exception occured > sending *sites/default/files/nodefiles/533/June 30, 2011.xltm* to Solr "0" > Status: Communication Error". > > I am looking for some help in figuring out where to troubleshoot this. I > assume it's this file, but I guess I'd like to be sure - so how can I submit > this file for content extraction manually to see what happens? > > Thanks, > > Tim >