Hi,
I'm trying to build a small perl (could be any scripting language)
utility that takes nutch readseg -dump 's output as its input, decodes
the content field to utf-8 (independent of what encoding the raw page
was in) and outputs that decoded content. After a little bit of
experimentation,
Yves Petinot wrote:
Hi,
I'm trying to build a small perl (could be any scripting language)
utility that takes nutch readseg -dump 's output as its input, decodes
the content field to utf-8 (independent of what encoding the raw page
was in) and outputs that decoded content. After a little bit
Thanks a lot, Andrzej, this makes perfect sense.
-y
Andrzej Bialecki wrote:
Yves Petinot wrote:
Hi,
I'm trying to build a small perl (could be any scripting language)
utility that takes nutch readseg -dump 's output as its input,
decodes the content field to utf-8 (independent of what