It's definitely Tomcat. I just browsed through
segments/*/content/part-*/data files with hex viewer and it looks like
Nutch uses some sort of compression.

2008/9/29 daut <[EMAIL PROTECTED]>:
>
> I want to use utf-8. How can I force nutch to use utf-8? Or is it tomcat
> issue?
>
>
> David Jashi wrote:
>>
>> ყველაფერი რიგზეა, utf-8 მაგივრად nutch რამოღაც 16–ბიტიანს აბრუნებს.
>>
>> It's OK, for some strange reason Nutch uses this encoding instead of
>> UTF-8. Text is displayed normally anyhow.
>>
>> On Mon, Sep 29, 2008 at 1:04 PM, daut <[EMAIL PROTECTED]> wrote:
>>>
>>> hello,
>>> I've installed nutch-0.9 and made first crawling.Then I've made search on
>>> search page. Everithing seems ok. I can see all result characters
>>> correctly.
>>> (non ASCI characters, Georgian language). But when I view page source,
>>> Instead of georgian letters, for example პოლ, there are such
>>> simbols:&_#_4_3_1_8;&_#_4_3_1_7;&_#_4_3_1_4;.(without "_" simbols :) )
>>> Why
>>> happens this? Is it normal?
>>> Best Rgds daut.
>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/encoding-tp19720443p19720443.html
>>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> with best regards,
>> David Jashi
>> Web development EO,
>> Caucasus Online
>> +995(32)970368
>> [EMAIL PROTECTED]
>>
>> პატივისცემით,
>> დავით ჯაში
>> ვებ–განვითარების დირექტორი
>> "კავკასუს  ონლაინი"
>> +995(32)970368
>> [EMAIL PROTECTED]
>>
>>
>
> --
> View this message in context: 
> http://www.nabble.com/encoding-tp19720443p19721356.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>



-- 
with best regards,
David Jashi
Web development EO,
Caucasus Online
+995(32)970368
[EMAIL PROTECTED]

პატივისცემით,
დავით ჯაში
ვებ–განვითარების დირექტორი
"კავკასუს  ონლაინი"
+995(32)970368
[EMAIL PROTECTED]

Reply via email to