and OutputStreamWriter
> constructors, should that work? Is it likely to break something else?
>
>
>
>
>
>
>
>
> ____
> From: Sebastian Nagel <wastl.na...@googlemail.com>
> To: user@nutch.apache.org
> Sent: Wednesday,
dnesday, November 15, 2017 5:18 AM
Subject: Re: readseg dump and non-ASCII characters
Hi Michael,
from the arguments I guess you're interested in the raw/binary HTML content,
right?
After a closer look I have no simple answer:
1. HTML has no fix encoding - it could be everything, pageA may have a
all nodes in the cluster? Would
it work just as well, or better, to use "-Dfile.encoding=UTF8" in the binNutch
command?
From: Sebastian Nagel <wastl.na...@googlemail.com>
To: user@nutch.apache.org
Sent: Wednesday, November 15, 2017 5:18 AM
Subject: Re: readseg dump and non-ASCII
Hi Michael,
from the arguments I guess you're interested in the raw/binary HTML content,
right?
After a closer look I have no simple answer:
1. HTML has no fix encoding - it could be everything, pageA may have a
different
encoding than pageB.
2. That's different for parsed text: it's a
4 matches
Mail list logo