Hi Antoni, I tried many charset detection libraries while working on Nutch but none of them was really working. I also tried to take a look at the mozilla charset detector , but it was really too complicated to integrate into Nutch (or Tika).
Best regards Jérôme 2009/12/9 Antoni Mylka <antoni.my...@gmail.com> > Aperturians, Tika > > I was wondering if anyone has any experience with the jchardet library > for charset detection. Does it work? What kinds of documents does it > actually support. > > Christiaan has posted an idea to the Aperture tracker how we could use > jchardet to improve the plain text extractor, but it doesn't seem to > work. Or maybe the Tika guys have figured it out already and I can just > use Tika for this? :) > > Antoni Mylka > antoni.my...@gmail.com > -- Jérôme Charron Directeur Technique @ WebPulse Tel: +33675742890 <= ** NEW ** eMail : jerome.char...@webpulse.fr http://www.webpulse.fr/ http://www.shopreflex.com/ http://www.staragora.com/