It's definitely Tomcat. I just browsed through segments/*/content/part-*/data files with hex viewer and it looks like Nutch uses some sort of compression.
2008/9/29 daut <[EMAIL PROTECTED]>: > > I want to use utf-8. How can I force nutch to use utf-8? Or is it tomcat > issue? > > > David Jashi wrote: >> >> ყველაფერი რიგზეა, utf-8 მაგივრად nutch რამოღაც 16–ბიტიანს აბრუნებს. >> >> It's OK, for some strange reason Nutch uses this encoding instead of >> UTF-8. Text is displayed normally anyhow. >> >> On Mon, Sep 29, 2008 at 1:04 PM, daut <[EMAIL PROTECTED]> wrote: >>> >>> hello, >>> I've installed nutch-0.9 and made first crawling.Then I've made search on >>> search page. Everithing seems ok. I can see all result characters >>> correctly. >>> (non ASCI characters, Georgian language). But when I view page source, >>> Instead of georgian letters, for example პოლ, there are such >>> simbols:&_#_4_3_1_8;&_#_4_3_1_7;&_#_4_3_1_4;.(without "_" simbols :) ) >>> Why >>> happens this? Is it normal? >>> Best Rgds daut. >>> >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/encoding-tp19720443p19720443.html >>> Sent from the Nutch - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> with best regards, >> David Jashi >> Web development EO, >> Caucasus Online >> +995(32)970368 >> [EMAIL PROTECTED] >> >> პატივისცემით, >> დავით ჯაში >> ვებ–განვითარების დირექტორი >> "კავკასუს ონლაინი" >> +995(32)970368 >> [EMAIL PROTECTED] >> >> > > -- > View this message in context: > http://www.nabble.com/encoding-tp19720443p19721356.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- with best regards, David Jashi Web development EO, Caucasus Online +995(32)970368 [EMAIL PROTECTED] პატივისცემით, დავით ჯაში ვებ–განვითარების დირექტორი "კავკასუს ონლაინი" +995(32)970368 [EMAIL PROTECTED]
