I've been struggling with a problem for the past couple of days.  Can anyone
provide me some help or insight?

The problem comes down to this.  I have an email that I have received in
Microsoft Outlook that contains characters outside of the ascii set.  I was
able to use your library to traverse through my outlook folders and select
the appropriate emails etc.  There are characters in the email that are
ascii, extended ascii, and other.  Really, the characters outside of ascii
are all "fraction" characters.  I am trying to replace these characters with
their respective floating point numbers (so I can parse the email and
picking out the numbers I need etc.).  The problem I am having is when I
read the "item.Body" into body, the characters "\xe2\x85\x9b",
"\xe2\x85\x9c", "\xe2\x85\x9d", "\xe2\x85\x9d" become unknown characters
(question marks).  These characters have unicode equivalents of u215b,
u215c, u215d, and u215e.  They are the fractions 1/8, 3/8, 5/8, 7/8
respectively.

I have the same problem when I open the outlook email and "SaveAs" a text
file or use the win32 interface to do that.

Interestingly enough, when I copy and paste the email text into a text file
and save it, I can parse the text file just fine using Python and none of
the characters are lost.  I get the feeling that the problem is occurring
when I read the email's body with "item.Body", but I am not sure how I can
read it using a different encoding scheme.  I have included my code below.
I would be extremely grateful if someone could provide some insight as to
how I can get around this.

Thanks!

Pramod




import codecs, win32com.client,string,re


def parseEmailsWithFractions():
    # Create instance of Outlook
    o = win32com.client.Dispatch("Outlook.Application")
    mapi = o.GetNamespace("MAPI")

    folder=mapi.Folders["Folder_NAME"]

    numItems=folder.Items.Count+1
    SubjectNotFound=1


    for i in range(1,501):

        item = folder.Items[numItems-i]
        for attribute in attributes:
            subject=item.Subject
            body=item.Body

            disclaimerStart=body.find("------")

            if subject.find("SUBJECT_TO_FIND") >=0 and SubjectNotFound:
                print subject

                EncodedBody=body.encode("utf-8")

                print repr(EncodedBody)
#Replacing all fractional characters with decimals
                EncodedBody=EncodedBody.replace("\xc2\xbe",".75")
                EncodedBody=EncodedBody.replace("\xc2\xbc",".25")
                EncodedBody=EncodedBody.replace("\xc2\xbd",".5")
                EncodedBody=EncodedBody.replace("\xe2\x85\x9b",".125")
                EncodedBody=EncodedBody.replace("\xe2\x85\x9c",".375")
                EncodedBody=EncodedBody.replace("\xe2\x85\x9d",".625")
                EncodedBody=EncodedBody.replace("\xe2\x85\x9d",".875")
                print EncodedBody
                SubjectNotFound=0

parseEmailsWithFractions()
_______________________________________________
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Reply via email to