yes tika indexes all formats.

but i am specifically looking for OCR (thru java) atleast for PDF or JPEG
images

any clues?

Best Regards,
Kranti K K Parisa



On Thu, Feb 4, 2010 at 8:29 PM, mike anderson <saidthero...@gmail.com>wrote:

> There might be an OCR plugin for Apache Tika (which does exactly this out
> of
> the box except for OCR capability, i believe).
>
> http://lucene.apache.org/tika/
>
> -mike
>
>
> 2010/2/4 Kranti™ K K Parisa <kranti.par...@gmail.com>
>
> > Hi,
> >
> > Can anyone list the best OCR APIs available to use in combination with
> > SOLR.
> >
> > The idea is to take a scanned file (format could be pdf,word,image..etc)
> as
> > input and give OCRd file which could be used to get the contents for the
> > SOLR indexing.
> >
> > Best Regards,
> > Kranti K K Parisa
> >
>

Reply via email to