I've been struggling with a problem for the past couple of days. Can anyone provide me some help or insight?
The problem comes down to this. I have an email that I have received in Microsoft Outlook that contains characters outside of the ascii set. I was able to use your library to traverse through my outlook folders and select the appropriate emails etc. There are characters in the email that are ascii, extended ascii, and other. Really, the characters outside of ascii are all "fraction" characters. I am trying to replace these characters with their respective floating point numbers (so I can parse the email and picking out the numbers I need etc.). The problem I am having is when I read the "item.Body" into body, the characters "\xe2\x85\x9b", "\xe2\x85\x9c", "\xe2\x85\x9d", "\xe2\x85\x9d" become unknown characters (question marks). These characters have unicode equivalents of u215b, u215c, u215d, and u215e. They are the fractions 1/8, 3/8, 5/8, 7/8 respectively. I have the same problem when I open the outlook email and "SaveAs" a text file or use the win32 interface to do that. Interestingly enough, when I copy and paste the email text into a text file and save it, I can parse the text file just fine using Python and none of the characters are lost. I get the feeling that the problem is occurring when I read the email's body with "item.Body", but I am not sure how I can read it using a different encoding scheme. I have included my code below. I would be extremely grateful if someone could provide some insight as to how I can get around this. Thanks! Pramod import codecs, win32com.client,string,re def parseEmailsWithFractions(): # Create instance of Outlook o = win32com.client.Dispatch("Outlook.Application") mapi = o.GetNamespace("MAPI") folder=mapi.Folders["Folder_NAME"] numItems=folder.Items.Count+1 SubjectNotFound=1 for i in range(1,501): item = folder.Items[numItems-i] for attribute in attributes: subject=item.Subject body=item.Body disclaimerStart=body.find("------") if subject.find("SUBJECT_TO_FIND") >=0 and SubjectNotFound: print subject EncodedBody=body.encode("utf-8") print repr(EncodedBody) #Replacing all fractional characters with decimals EncodedBody=EncodedBody.replace("\xc2\xbe",".75") EncodedBody=EncodedBody.replace("\xc2\xbc",".25") EncodedBody=EncodedBody.replace("\xc2\xbd",".5") EncodedBody=EncodedBody.replace("\xe2\x85\x9b",".125") EncodedBody=EncodedBody.replace("\xe2\x85\x9c",".375") EncodedBody=EncodedBody.replace("\xe2\x85\x9d",".625") EncodedBody=EncodedBody.replace("\xe2\x85\x9d",".875") print EncodedBody SubjectNotFound=0 parseEmailsWithFractions()
_______________________________________________ python-win32 mailing list python-win32@python.org http://mail.python.org/mailman/listinfo/python-win32