Hello - and apologies in advance for the length of this post. I am having a hard time understanding the errors being generated by a program I've written. The code is intended to parse text files which are copied and pasted from web pages from an online game. The encoding of the pages is ISO-8859-1, but the text that gets copied contains characters from character sets other than latin-1. For instance, one of the lines I need to be able to read is: 196679 Daimyo 石 Druid 145 27 12/09/07 21:40:04 [ Expel ]
I start with the file 'citizen_list' and use this function to read it and return a list of names (for instance, Daimyo 石 Druid) and ID numbers: # builds the list of names from the citizens list def getNames(f): """Builds a list from the town list of names Returns a list""" newlist = [] for line in f: namewords = line.rstrip('[Expel]\n\t ')\ .rstrip(':/0123456789 ').rstrip('\t ').rstrip('0123456789 ')\ .rstrip('\t ').rstrip('0123456789 ').rstrip('\t ').split() entry = ";".join([namewords[0], " ".join(namewords[1:len(namewords)])]) newlist.append(entry) return newlist citizens = codecs.open('citizen_list', 'r', 'utf-8', 'strict') listNames = getNames(citizens) citizens.close() I've specified 'utf-8' as the encoding as this seemed to be the best candidate for picking up all the names in the list. I use the names in other functions - for example: def getdamage(warrior, rpt): """reads each line of war report returns damage and number of kills for citizen name""" for line in rpt: if (line.startswith(warrior.name) or \ line.startswith('A blue aura surrounds ' + warrior.name))\ and line.find('weapon') > 0: warrior.addDamage(int(line[line.find('caused ') +7:line.find(' damage')])) if rpt.next().find('is dead') >0: warrior.addKill() elif line.startswith(warrior.name+' is dead'): warrior.dies() break elif line.startswith('Starting round'): warrior.addRound() for cit in listNames: c = Warrior(cit.split(';')[0], cit.split(';')[1]) totalnum += 1 report = codecs.open('war_report','r', 'utf-8', 'strict') getdamage(c, report) report.close() --[snip]-- def buildString(warrior): """Build a string from a warrior's stats Returns string for output to warStat.""" return "!tr!!td!!id!"+str(warrior.ID)+"!/id!!/td!"+\ "!td!"+str(warrior.damage)+"!/td!!td!"+str(warrior.kills)+\ "!/td!!td!"+str(warrior.survived)+"!/td!!/tr!" This code runs fine on my linux machine, but when I sent the code to a friend with python running on windows, he got the following error: Traceback (most recent call last): File "D:\Python25\Lib\SITE-P~1\PYTHON~1\pywin\framework \scriptutils.py", line 310, in RunScript exec codeObject in _main_._dict_ File "C:\Documents and Settings\Administrator\Desktop \reparser_014(2)\parser_1.0.py", line 63, in <module> "".join(["%s" % buildString(c) for c in citlistS[:100]])+"!/ table!") File "C:\Documents and Settings\Administrator\Desktop \reparser_014(2)\iotp_alt2.py", line 169, in buildString "!/td!!td!"+str(warrior.survived)+"!/td!!/tr!" UnicodeEncodeError: 'ascii' codec can't encode character u'\ufeff' in position 0: ordinal not in range(128) As I understand it the error is related to the ascii codec being unable to cope with the unicode string u'\ufeff'. The issue I have is that this error doesn't show up for me - ascii is the default encoding for me also. Any thoughts or assistance would be welcomed. Cheers -- http://mail.python.org/mailman/listinfo/python-list