Hi Michael, I am using SharpDevelop 3.1 which comes with "2.5.0 (IronPython 2.0.2 (2.0.0.0) on .NET 2.0.50727.3603)".
So this issue is resolved with IronPython 2.6, then? Thanks! -- Leo -----Original Message----- From: users-boun...@lists.ironpython.com [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord Sent: 2009?11?10? 14:05 To: Discussion of IronPython Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8 Leonides Saguisag wrote: > Hi Michael, > > I just verified the empty string theory that you mentioned and > Python25\lib\codecs.py (comes with the standard library in Python 2.5) has > the following defined: > > You're using IronPython 2.0 then? If I use IronPython 2.6 it correctly reports a text file as not starting with the BOM: >>> import codecs >>> codecs.BOM_UTF8 u'\xef\xbb\xbf' >>> lines = open('foo.txt').readlines() >>> lines ['foo'] >>> >>> lines[0].startswith(codecs.BOM_UTF8) False All the best, Michael Foord > # UTF-8 > BOM_UTF8 = '\xef\xbb\xbf' > > > So it is not an empty string. > > Maybe I am approaching this wrong and you guys can provide me with an > alternative way of doing this. I am trying to read a file and determine if > the file is encoded in UTF-8 or not. The approach I took was to use python's > built-in open function to read the text file into an array of strings and > check if the first line starts with the UTF-8 byte order mark by using > line.startswith(codecs.BOM_UTF8). As I noted below, this works fine in > Python 2.5 but in IronPython it just keeps saying it found a UTF-8 BOM even > though there is none present. > > Thanks! > > -- Leo > > -----Original Message----- > From: users-boun...@lists.ironpython.com > [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord > Sent: 2009?11?10? 13:32 > To: Discussion of IronPython > Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8 > > Leonides Saguisag wrote: > >> Thank you for taking the time to reply. Any idea why this would happen in >> IronPython but not with the standard Python interpreter? What is weirding >> me out is that the exact same script behaves differently depending on >> whether I use IronPython or the standard Python interpreter. >> >> > Well, if codecs.BOM_UTF8 is set to the empty string (you didn't say if you > have tried this yet?) then it would be due to a bug in IronPython somewhere - > but at least you would know what was causing it. > > If it is the empty string, purely speculating, it could be due to the way the > .NET framework treats the BOM at the start of strings. Pure speculation > though - that might not be the problem at all or it could be caused by > something entirely different. > > In .NET it would be more normal to check for the BOM with bytes, as by the > time you have a string you have (usually) decoded already. > IronPython 2.X is a bit odd for the .NET framework in this respect. > > Michael > > >> Thanks! >> >> -- Leo >> >> -----Original Message----- >> From: users-boun...@lists.ironpython.com >> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael >> Foord >> Sent: 2009?11?10? 13:17 >> To: Discussion of IronPython >> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8 >> >> Leonides Saguisag wrote: >> >> >>> Hi everyone, >>> >>> I am encountering a weird issue with getting to codecs.BOM_UTF8 to work >>> correctly. I am using SharpDevelop 3.1. >>> >>> Here is the test script that I put together: >>> >>> >>> import sys >>> sys.path.append(r'D:\Python25\Lib') >>> import codecs >>> >>> print sys.version >>> myfile = open(r'D:\Temp\text_file_with_utf8_bom.txt', 'r') lines = >>> myfile.readlines() >>> myfile.close() >>> if lines[0].startswith(codecs.BOM_UTF8): >>> print ('UTF-8 BOM detected!') >>> else: >>> print ('UTF-8 BOM not detected!') >>> >>> myfile = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r') lines >>> = >>> myfile.readlines() >>> myfile.close() >>> if lines[0].startswith(codecs.BOM_UTF8): >>> print ('UTF-8 BOM detected!') >>> else: >>> print ('UTF-8 BOM not detected!') >>> >>> >>> If I run the executable that I get from SharpDevelop this is what I get: >>> bin\Debug> Test.exe >>> 2.5.0 () >>> UTF-8 BOM detected! >>> UTF-8 BOM detected! >>> >>> >>> But if I run the same script using the standard python interpreter, this is >>> what I get: >>> bin\Debug> D:\Python25\python.exe ..\..\Program.py >>> 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit >>> (Intel)] >>> UTF-8 BOM detected! >>> UTF-8 BOM not detected! >>> >>> >>> The script works correctly with the standard python interpreter but for >>> some reason is not working right with IronPython. >>> >>> Any ideas what is going wrong? >>> >>> >>> >> I'm not in a position to check right now, but this could happen if >> codes.UTF8_BOM is set to the empty string. >> >> Michael >> >> >> >>> Thanks! >>> >>> Best regards, >>> -- Leo >>> _______________________________________________ >>> Users mailing list >>> Users@lists.ironpython.com >>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >>> >>> >>> >> -- >> http://www.ironpythoninaction.com/ >> >> _______________________________________________ >> Users mailing list >> Users@lists.ironpython.com >> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >> _______________________________________________ >> Users mailing list >> Users@lists.ironpython.com >> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com >> >> > > > -- > http://www.ironpythoninaction.com/ > > _______________________________________________ > Users mailing list > Users@lists.ironpython.com > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > _______________________________________________ > Users mailing list > Users@lists.ironpython.com > http://lists.ironpython.com/listinfo.cgi/users-ironpython.com > -- http://www.ironpythoninaction.com/ _______________________________________________ Users mailing list Users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com _______________________________________________ Users mailing list Users@lists.ironpython.com http://lists.ironpython.com/listinfo.cgi/users-ironpython.com