yes tika indexes all formats. but i am specifically looking for OCR (thru java) atleast for PDF or JPEG images
any clues? Best Regards, Kranti K K Parisa On Thu, Feb 4, 2010 at 8:29 PM, mike anderson <saidthero...@gmail.com>wrote: > There might be an OCR plugin for Apache Tika (which does exactly this out > of > the box except for OCR capability, i believe). > > http://lucene.apache.org/tika/ > > -mike > > > 2010/2/4 Kranti™ K K Parisa <kranti.par...@gmail.com> > > > Hi, > > > > Can anyone list the best OCR APIs available to use in combination with > > SOLR. > > > > The idea is to take a scanned file (format could be pdf,word,image..etc) > as > > input and give OCRd file which could be used to get the contents for the > > SOLR indexing. > > > > Best Regards, > > Kranti K K Parisa > > >