Re: reading hebrew text file

2005-10-18 Thread hagai26
realy thanks

hagai

-- 
http://mail.python.org/mailman/listinfo/python-list


reading hebrew text file

2005-10-17 Thread hagai26
I have a hebrew text file, which I want to read in python
I don't know which encoding I need to use  how I do that

thanks,
hagai

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: reading hebrew text file

2005-10-17 Thread Alex Martelli
[EMAIL PROTECTED] wrote:

 I have a hebrew text file, which I want to read in python
 I don't know which encoding I need to use  how I do that

As for the how, look to the codecs module -- but if you don't know
what codec the textfile is written in, I know of no ways to guess from
here!-)


Alex
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: reading hebrew text file

2005-10-17 Thread jepler
I looked for VAV in the files in the encodings directory
(/usr/lib/python2.4/encodings/*.py on my machine).  I found that the following
character encodings seem to include hebrew characters:
cp1255
cp424
cp856
cp862
iso8859-8
A file containing hebrew text might be in any one of these encodings, or
any unicode-based encoding.

To open an encoded file for reading, use
f = codecs.open(file, 'r', encoding='...')
Now, calls like 'f.readline()' will return unicode strings.

Here's an example, using a file in UTF-8 I have laying around:
 f = codecs.open(/users/jepler/txt/UTF-8-demo.txt, r, utf-8)
 for i in range(5): print repr(f.readline())
... 
u'UTF-8 encoded sample plain-text file\n'
u'\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\u203e\n'
u'\n'
u'Markus Kuhn [\u02c8ma\u02b3k\u028as ku\u02d0n] [EMAIL PROTECTED] \u2014 
1999-08-20\n'
u'\n'

Jeff


pgpIIx2zTStwL.pgp
Description: PGP signature
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: reading hebrew text file

2005-10-17 Thread Fredrik Lundh
[EMAIL PROTECTED] wrote:

 I have a hebrew text file, which I want to read in python
 I don't know which encoding I need to use

that's not a good start.  but maybe it's one of these:

http://sites.huji.ac.il/tex/hebtex_fontsrep.html

?

 how I do that

f = open(myfile)
text = f.readline()

followed by one of

text = text.decode(iso-8859-8)
text = text.decode(cp1255)
text = text.decode(cp862)

alternatively, use:

f = codecs.open(myfile, r, encoding)

to get a stream that decodes things on the fly.

/F



-- 
http://mail.python.org/mailman/listinfo/python-list