Hi Michael, It seems to be a bug with IronPython 2.0.x then. I just installed IronPython 2.0.3 and this is what I found:
C:\>"C:\Program Files\IronPython 2.0.3\ipy.exe" IronPython 2.0.3 (2.0.0.0) on .NET 2.0.50727.3603 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path.append(r'D:\Python25\lib') >>> import codecs >>> lines = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r').readlines() >>> print lines ['This is a text file without a UTF-8 BOM.\n', 'Line 2\n', 'Line 3'] >>> lines[0].startswith(codecs.BOM_UTF8) True >>> ^Z C:\> It returned 'True' even though the text file did not have a UTF-8 BOM. Contrasted with standard Python 2.5: C:\> D:\Python25\python.exe Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> import codecs >>> lines = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r').readlines() >>> print lines ['This is a text file without a UTF-8 BOM.\n', 'Line 2\n', 'Line 3'] >>> lines[0].startswith(codecs.BOM_UTF8) False >>> ^Z C:\> So it looks like there was a bug in IronPython 2.0.x with regards to the handling of codecs.BOM_UTF8 that now appears to be fixed in IronPython 2.6. Does that sound like a fair assessment? Thanks! -- Leo -----Original Message----- From: users-boun...@lists.ironpython.com [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord Sent: 2009?11?10? 14:30 To: Discussion of IronPython Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8 Leonides Saguisag wrote: > Hi Michael, > > I am using SharpDevelop 3.1 which comes with "2.5.0 (IronPython 2.0.2 > (2.0.0.0) on .NET 2.0.50727.3603)". > > Yeah, that's IronPython 2.0 - which is fine but not as good as IronPython 2.6. ;-) > So this issue is resolved with IronPython 2.6, then? > > No idea. I can't reproduce the problem with IronPython 2.6 though. Try installing IronPython 2 and seeing what happens from the interactive interpreter (whether you can reproduce the problem or not). It is *possible* that it's caused by the way SharpDevelop generates its executables, but that's highly unlikely to be the cause of the problem. All the best, Michael Foord > Thanks! > > -- Leo > > -----Original Message----- > From: users-boun...@lists.ironpython.com > [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord > Sent: 2009?11?10? 14:05 > To: Discussion of IronPython > Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8 > > Leonides Saguisag wrote: > >> Hi Michael, >> >> I just verified the empty string theory that you mentioned and >> Python25\lib\codecs.py (comes with the standard library in Python 2.5) has >> the following defined: >> >> >> > You're using IronPython 2.0 then? > > If I use IronPython 2.6 it correctly reports a text file as not starting with > the BOM: > > >>> import codecs > >>> codecs.BOM_UTF8 > u'\xef\xbb\xbf' > >>> lines = open('foo.txt').readlines() >>> lines ['foo'] >>> > lines[0].startswith(codecs.BOM_UTF8) > False > > > All the best, > > Michael Foord > >> # UTF-8 >> BOM_UTF8 = '\xef\xbb\xbf' >> >> >> So it is not an empty string. >> >> Maybe I am approaching this wrong and you guys can provide me with an >> alternative way of doing this. I am trying to read a file and determine if >> the file is encoded in UTF-8 or not. The approach I took was to use >> python's built-in open function to read the text file into an array of >> strings and check if the first line starts with the UTF-8 byte order mark by >> using line.startswith(codecs.BOM_UTF8). As I noted below, this works fine >> in Python 2.5 but in IronPython it just keeps saying it found a UTF-8 BOM >> even though there is none present. >> >> Thanks! >> >> -- Leo >> >> -----Original Message----- >> From: users-boun...@lists.ironpython.com >> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael >> Foord >> Sent: 2009?11?10? 13:32 >> To: Discussion of IronPython >> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8 >> >> Leonides Saguisag wrote: >> >> >>> Thank you for taking the time to reply. Any idea why this would happen in >>> IronPython but not with the standard Python interpreter? What is weirding >>> me out is that the exact same script behaves differently depending on >>> whether I use IronPython or the standard Python interpreter. >>> >>> >>> >> Well, if codecs.BOM_UTF8 is set to the empty string (you didn't say if you >> have tried this yet?) then it would be due to a bug in IronPython somewhere >> - but at least you would know what was causing it. >> >> If it is the empty string, purely speculating, it could be due to the way >> the .NET framework treats the BOM at the start of strings. Pure speculation >> though - that might not be the problem at all or it could be caused by >> something entirely different. >> >> In .NET it would be more normal to check for the BOM with bytes, as by the >> time you have a string you have (usually) decoded already. >> IronPython 2.X is a bit odd for the .NET framework in this respect. >> >> Michael >> >> >> >>> Thanks! >>> >>> -- Leo >>> >>> -----Original Message----- >>> From: users-boun...@lists.ironpython.com >>> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael >>> Foord >>> Sent: 2009?11?10? 13:17 >>> To: Discussion of IronPython >>> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8 >>> >>> Leonides Saguisag wrote: >>> >>> >>> >>>> Hi everyone, >>>> >>>> I am encountering a weird issue with getting to codecs.BOM_UTF8 to work >>>> correctly. I am using SharpDevelop 3.1. >>>> >>>> Here is the test script that I put together: >>>> >>>> >>>> import sys >>>> sys.path.append(r'D:\Python25\Lib') >>>> import codecs >>>> >>>> print sys.version >>>> myfile = open(r'D:\Temp\text_file_with_utf8_bom.txt', 'r') lines = >>>> myfile.readlines() >>>> myfile.close() >>>> if lines[0].startswith(codecs.BOM_UTF8): >>>> print ('UTF-8 BOM detected!') >>>> else: >>>> print ('UTF-8 BOM not detected!') >>>> >>>> myfile = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r') lines >>>> = >>>> myfile.readlines() >>>> myfile.close() >>>> if lines[0].startswith(codecs.BOM_UTF8): >>>> print ('UTF-8 BOM detected!') >>>> else: >>>> print ('UTF-8 BOM not detected!') >>>> >>>> >>>> If I run the executable that I get from SharpDevelop this is what I get: >>>> bin\Debug> Test.exe >>>> 2.5.0 () >>>> UTF-8 BOM detected! >>>> UTF-8 BOM detected! >>>> >>>> >>>> But if I run the same script using the standard python interpreter, this >>>> is what I get: >>>> bin\Debug> D:\Python25\python.exe ..\..\Program.py >>>> 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit >>>> (Intel)] >>>> UTF-8 BOM detected! >>>> UTF-8 BOM not detected! >>>> >>>> >>>> The script works correctly with the standard python interpreter but for >>>> some reason is not working right with IronPython. >>>> >>>> Any ideas what is going wrong? >>>> >>>> >>>> >>>> >>> I'm not in a position to check right now, but this could happen if >>> codes.UTF8_BOM is set to the empty string. >>> >>> Michael >>> >>> >>> >>> >>>> Thanks! >>>> >>>> Best regards, >>>> -- Leo >>>> _______________________________________________ >>>> Users mailing list >>>> Users@lists.ironpython.com >>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >>>> >>>> >>>> >>>> >>> -- >>> http://www.ironpythoninaction.com/ >>> >>> _______________________________________________ >>> Users mailing list >>> Users@lists.ironpython.com >>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >>> _______________________________________________ >>> Users mailing list >>> Users@lists.ironpython.com >>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >>> >>> >>> >> -- >> http://www.ironpythoninaction.com/ >> >> _______________________________________________ >> Users mailing list >> Users@lists.ironpython.com >> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >> _______________________________________________ >> Users mailing list >> Users@lists.ironpython.com >> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >> >> > > > -- > http://www.ironpythoninaction.com/ > > _______________________________________________ > Users mailing list > Users@lists.ironpython.com > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > _______________________________________________ > Users mailing list > Users@lists.ironpython.com > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > -- http://www.ironpythoninaction.com/ _______________________________________________ Users mailing list Users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com _______________________________________________ Users mailing list Users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com