Thanks, Nick. It also supports MHT. Used Curl for Windows to feed some files 
into the default Solr installation (3.6.1) and it handled them with aplomb. 

For those who want to use Curl, there's an exe version here: 
http://curl.haxx.se/dlwiz/?type=bin&os=Win64&flav=-

Add where you put that EXE to your PATH environment variable. Then run 
something like this from the command prompt where your file to index is:

curl "http://localhost:8983/solr/update/extract?literal.id=doc3&commit=true"; -F 
"[email protected]"

Sincerely,
Alex 

-----Original Message-----
From: Nick Burch [mailto:[email protected]] 
Sent: 2 August 2012 5:29 PM
To: [email protected]
Subject: Re: File types supported

On Thu, 2 Aug 2012, Alexander Cougarman wrote:
> Hi. Does the latest version of Tika index text in these file types?
> - Office 2007/2010 file types of DOCX, XLSX, PPTX

Yes (thought the few tiny bits of new functionality introduced in 2010 will be 
skipped over)

> - MHT file (MHTML Document)

Not sure, how close is this to a regular html file?

> This page helped on many of the file formats, but wanted to clarify: 
> http://tika.apache.org/1.2/formats.html

Often the best way to check is to grab the tika-app jar, and try a few sample 
files with it

Nick

Reply via email to