Parsing a hebrew website and maintaining the encoding to something readable

2005-07-05 Thread Lior Kesos
Hello Gog (Gang of Geeks), I'm writing a python script that is supposed to get some information off a hebrew website having this in it's headers... META HTTP-EQUIV=Content-Type content=text/html; charset=windows-1255 and style select{font-family:arial;font-size:13px}

Re: Parsing a hebrew website and maintaining the encoding to something readable

2005-07-05 Thread Dvir Volk
I'm not a python expert, but you can use libiconv to convert the text to utf-8. I use it with C and PHP, it probably has pyhton bindings, and it also has a small app called iconv, which you can pipe to get what you need. if you're not sure what your source encoding will be in all cases, i'd

Re: Parsing a hebrew website and maintaining the encoding to something readable

2005-07-05 Thread Arik Baratz
On 05/07/05, Dvir Volk [EMAIL PROTECTED] wrote: I'm not a python expert, but you can use libiconv to convert the text to utf-8. I use it with C and PHP, it probably has pyhton bindings, and it also has a small app called iconv, which you can pipe to get what you need. if you're not sure what

Re: Parsing a hebrew website and maintaining the encoding to something readable

2005-07-05 Thread Lior Kesos
Pasted from the python-il list. - Thanks Viktorija (vika?) - that provided half of the solution. The full one is - unicode(text,'cp1255').encode('utf-8') Because the text is encoded in cp1255 it first needs to get decoded by that and encoded to utf8 regards Lior. Viktorija Zaksiene wrote: On

Re: Parsing a hebrew website and maintaining the encoding to something readable

2005-07-05 Thread Arik Baratz
On 05/07/05, Lior Kesos [EMAIL PROTECTED] wrote: Pasted from the python-il list. - Thanks Viktorija (vika?) - that provided half of the solution. The full one is - unicode(text,'cp1255').encode('utf-8') This one uses the unicode constructor to create the unicode object. I rather like