Title: Message
When in doubt, turn the problem around 90 degrees.
 
file( filename[, mode[, bufsize]])
Return a new file object (described in section 2.3.8, ``File Objects''). The first two arguments are the same as for stdio's fopen(): filename is the file name to be opened, mode indicates how the file is to be opened: 'r' for reading, 'w' for writing (truncating an existing file), and 'a' opens it for appending (which on some Unix systems means that all writes append to the end of the file, regardless of the current seek position).
 
The problem is that your file contains BINARY data....
 
So, let's remove the binary data:
 

import sys
import string
 
def strip_binary ( filename, newname ):
    test = open ( filename, 'rb')
    stripped = open (newname, 'wb')
 
    data = "">    while data <> "":
        data = "" (1)
       
        if data <> "":
            if data in string.printable:
                stripped.write (data)
 
    stripped.close ()
    test.close ()
 
strip_binary ( sys.argv[1], sys.argv[2])
 
This will remove all characters that are not contained in the string modules PRINTABLE variable.
 
Then you should be able to open the NEW file as a ASCII file, without any issues.
 
You could instead of creating a temporary file, write the data to a list, and then use a SPLIT("\n") on the temporary list, and process that.  That would be the rough equivalent of READLINES....
 
        - Ben
 
-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Wednesday, January 12, 2005 8:44 AM
To: python-win32@python.org
Subject: [python-win32] File I/O problem

I trying to process a file that was originally created on an AS/400 as a spooled report. The file has been converted to ASCII before sending to me by e-mail. The original report is in Arabic script and so any Arabic script has been mapped to

I can’t read the whole file in unless I chop out all the (formerly) Arabic characters as read(), readline() or readlines() seems to think its done too early. The problem appears to be that the conversion has produced a byte with hex value 1a and Python is treating this as an end-of-file marker. This I’ve worked this out by using a Hex Editor and looking at the character after where the read operation stops.  The offending character the square (unprintable) character in the file snippet below.

Start file snippet >>

MK    2005/01/10 البنك العربي(ش .م.ع)        الميزانية الموحدة - تقريـر الميزانية الشهــرية                              كما هي في

              01 : فروع دولة امارات                =========================================                              الصـفحة 

<< End file snippet

Is there a way I can pre-process this file with Python and chop out the characters ( the 1a) I don’t want?

 

If I do this:

import string

report = open('d:\\Software\\PythonScripts\\ear11050110.txt').readlines()               

report is:

>>> report

['MK    2005/01/10 \xc7\xe1\xc8\xe4\xdf \xc7\xe1\xda\xd1\xc8\xed(\xd4 .\xe3.\xda)        \xc7\xe1\xe3\xed\xd2\xc7\xe4\xed\xc9 \xc7\xe1\xe3\xe6\xcd\xcf\xc9 - \xca\xde\xd1\xed\xdc\xd1 \xc7\xe1\xe3\xed\xd2\xc7\xe4\xed\xc9 \xc7\xe1\xd4\xe5\xdc\xdc\xd1\xed\xc9                              \xdf\xe3\xc7 \xe5\xed \xdd\xed\n', '              01 : \xdd\xd1\xe6\xda \xcf\xe6\xe1\xc9 \xc7']

 

Which is everything up to the hex 1a.

 

Thanks for any prompting whatsoever.

 

Nick.

 



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
Information Technology International (ITI) +44 (0)20 7315 8500
**********************************************************************
_______________________________________________
Python-win32 mailing list
Python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Reply via email to