Hi Antoni,

I tried many charset detection libraries while working on Nutch but none of
them was really working.
I also tried to take a look at the mozilla charset detector , but it was
really too complicated to integrate into Nutch (or Tika).

Best regards

Jérôme

2009/12/9 Antoni Mylka <antoni.my...@gmail.com>

> Aperturians, Tika
>
> I was wondering if anyone has any experience with the jchardet library
> for charset detection. Does it work? What kinds of documents does it
> actually support.
>
> Christiaan has posted an idea to the Aperture tracker how we could use
> jchardet to improve the plain text extractor, but it doesn't seem to
> work.  Or maybe the Tika guys have figured it out already and I can just
> use Tika for this? :)
>
> Antoni Mylka
> antoni.my...@gmail.com
>



-- 
Jérôme Charron
Directeur Technique @ WebPulse
Tel: +33675742890 <= ** NEW **
eMail : jerome.char...@webpulse.fr
http://www.webpulse.fr/
http://www.shopreflex.com/
http://www.staragora.com/

Reply via email to