Re: reading hebrew text file
realy thanks hagai -- http://mail.python.org/mailman/listinfo/python-list
reading hebrew text file
I have a hebrew text file, which I want to read in python I don't know which encoding I need to use how I do that thanks, hagai -- http://mail.python.org/mailman/listinfo/python-list
Re: reading hebrew text file
[EMAIL PROTECTED] wrote: I have a hebrew text file, which I want to read in python I don't know which encoding I need to use how I do that As for the how, look to the codecs module -- but if you don't know what codec the textfile is written in, I know of no ways to guess from here!-) Alex -- http://mail.python.org/mailman/listinfo/python-list
Re: reading hebrew text file
I looked for VAV in the files in the encodings directory (/usr/lib/python2.4/encodings/*.py on my machine). I found that the following character encodings seem to include hebrew characters: cp1255 cp424 cp856 cp862 iso8859-8 A file containing hebrew text might be in any one of these encodings, or any unicode-based encoding. To open an encoded file for reading, use f = codecs.open(file, 'r', encoding='...') Now, calls like 'f.readline()' will return unicode strings. Here's an example, using a file in UTF-8 I have laying around: f = codecs.open(/users/jepler/txt/UTF-8-demo.txt, r, utf-8) for i in range(5): print repr(f.readline()) ... u'UTF-8 encoded sample plain-text file\n' u'\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\n' u'\n' u'Markus Kuhn [\u02c8ma\u02b3k\u028as ku\u02d0n] [EMAIL PROTECTED] \u2014 1999-08-20\n' u'\n' Jeff pgpIIx2zTStwL.pgp Description: PGP signature -- http://mail.python.org/mailman/listinfo/python-list
Re: reading hebrew text file
[EMAIL PROTECTED] wrote: I have a hebrew text file, which I want to read in python I don't know which encoding I need to use that's not a good start. but maybe it's one of these: http://sites.huji.ac.il/tex/hebtex_fontsrep.html ? how I do that f = open(myfile) text = f.readline() followed by one of text = text.decode(iso-8859-8) text = text.decode(cp1255) text = text.decode(cp862) alternatively, use: f = codecs.open(myfile, r, encoding) to get a stream that decodes things on the fly. /F -- http://mail.python.org/mailman/listinfo/python-list