decoding nutch readseg -dump 's output

2009-11-16 Thread Yves Petinot
Hi, I'm trying to build a small perl (could be any scripting language) utility that takes nutch readseg -dump 's output as its input, decodes the content field to utf-8 (independent of what encoding the raw page was in) and outputs that decoded content. After a little bit of experimentation,

Re: decoding nutch readseg -dump 's output

2009-11-16 Thread Andrzej Bialecki
Yves Petinot wrote: Hi, I'm trying to build a small perl (could be any scripting language) utility that takes nutch readseg -dump 's output as its input, decodes the content field to utf-8 (independent of what encoding the raw page was in) and outputs that decoded content. After a little bit

Re: decoding nutch readseg -dump 's output

2009-11-16 Thread Yves Petinot
Thanks a lot, Andrzej, this makes perfect sense. -y Andrzej Bialecki wrote: Yves Petinot wrote: Hi, I'm trying to build a small perl (could be any scripting language) utility that takes nutch readseg -dump 's output as its input, decodes the content field to utf-8 (independent of what