On 11-04-18 05:06 PM, Dmitry V'yal wrote:
The readDocument arrow fails with the following message:
fatal error: encoding scheme not supported: WINDOWS-1251
Can someone suggest a workaround for my use case?
If you have a Handle (from file or Network for example),
import
Greetings,
I'm writing a small webcrawler. Usually I used tagsoup for such tasks
but this time I decided to give hxt a try.
Unfortunately, I ran into the troubles with character encodings. The
site I'm targeting uses cp1251, which is the one of the most popular
among sites in Russian. Pages
Since the document claims it is HTML, you should be parsing it with an HTML
parser. Try hxt-tagsoup -- specifically, the parseHtmlTagSoup arrow.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe