Hi, I use hbase-0.92.1 and do not have problem with utf-8 chars. What is exactly your problem?
Alex. -----Original Message----- From: Ake Tangkananond <[email protected]> To: user <[email protected]> Sent: Thu, Aug 9, 2012 11:12 am Subject: Re: Nutch 2 encoding Hi, I'm debugging. I inserted a code to print out the encoding here in HtmlParser:java function getParse and it printed utf-8. So I think it might be the data store problem. What else could be the cause? Could you advise what next I should go for to have my Thai chars stored correctly in HBase? Can I simply go with the latest version of HBase? (Not sure if it is compatible with nutch 2.0) byte[] contentInOctets = page.getContent().array(); InputSource input = new InputSource(new ByteArrayInputStream(contentInOctets)); EncodingDetector detector = new EncodingDetector(conf); detector.autoDetectClues(page, true); detector.addClue(sniffCharacterEncoding(contentInOctets), "sniffed"); String encoding = detector.guessEncoding(page, defaultCharEncoding); metadata.set(Metadata.ORIGINAL_CHAR_ENCODING, encoding); metadata.set(Metadata.CHAR_ENCODING_FOR_CONVERSION, encoding); LOG.info("encoding : " + encoding); input.setEncoding(encoding); Regards, Ake Tangkananond On 8/9/12 11:06 PM, "Ake Tangkananond" <[email protected]> wrote: >Hi, > >Sorry for late reply. I was trying to figure out myself but seem no luck. > >I'm on Hbase with local deploy version 0.90.6, r1295128, the working >version as said in Wiki: >http://wiki.apache.org/nutch/Nutch2Tutorial > > >Regards, >Ake Tangkananond > > > > >On 8/9/12 10:30 PM, "Ferdy Galema" <[email protected]> wrote: > >>It depends on the datastore and possibly the server? What store are you >>using? >> >>On Thu, Aug 9, 2012 at 4:05 PM, Ake Tangkananond <[email protected]> >>wrote: >> >>> Hi all, >>> >>> I just wonder if Nutch 2 is working fine with non english characters in >>> your >>> deployment? Thai language used to work fine for me in Nutch 1.5 but not >>>in >>> Nutch 2. Did I miss something. Anything I should check. >>> >>> Sorry for silly questions, but thank you in advance. ;-) >>> >>> >>> Regards, >>> Ake Tangkananond >>> >>> >>> > >

