Re: [Haskell-cafe] HXT and xhtml page encoded in cp1251

2011-04-19 Thread Albert Y. C. Lai
On 11-04-18 05:06 PM, Dmitry V'yal wrote: The readDocument arrow fails with the following message: fatal error: encoding scheme not supported: WINDOWS-1251 Can someone suggest a workaround for my use case? If you have a Handle (from file or Network for example), import

[Haskell-cafe] HXT and xhtml page encoded in cp1251

2011-04-18 Thread Dmitry V'yal
Greetings, I'm writing a small webcrawler. Usually I used tagsoup for such tasks but this time I decided to give hxt a try. Unfortunately, I ran into the troubles with character encodings. The site I'm targeting uses cp1251, which is the one of the most popular among sites in Russian. Pages

Re: [Haskell-cafe] HXT and xhtml page encoded in cp1251

2011-04-18 Thread John Millikin
Since the document claims it is HTML, you should be parsing it with an HTML parser. Try hxt-tagsoup -- specifically, the parseHtmlTagSoup arrow. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe