Hi Michael,

It seems to be a bug with IronPython 2.0.x then.  I just installed IronPython 
2.0.3 and this is what I found:


C:\>"C:\Program Files\IronPython 2.0.3\ipy.exe"
IronPython 2.0.3 (2.0.0.0) on .NET 2.0.50727.3603
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.append(r'D:\Python25\lib')
>>> import codecs
>>> lines = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r').readlines()
>>> print lines
['This is a text file without a UTF-8 BOM.\n', 'Line 2\n', 'Line 3']
>>> lines[0].startswith(codecs.BOM_UTF8)
True
>>> ^Z

C:\>
 

It returned 'True' even though the text file did not have a UTF-8 BOM.  
Contrasted with standard Python 2.5:


C:\> D:\Python25\python.exe
Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on 
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> import codecs
>>> lines = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r').readlines()
>>> print lines
['This is a text file without a UTF-8 BOM.\n', 'Line 2\n', 'Line 3']
>>> lines[0].startswith(codecs.BOM_UTF8)
False
>>> ^Z

C:\>


So it looks like there was a bug in IronPython 2.0.x with regards to the 
handling of codecs.BOM_UTF8 that now appears to be fixed in IronPython 2.6.  
Does that sound like a fair assessment?

Thanks!

-- Leo

-----Original Message-----
From: users-boun...@lists.ironpython.com 
[mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord
Sent: 2009?11?10? 14:30
To: Discussion of IronPython
Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8

Leonides Saguisag wrote:
> Hi Michael,
>
> I am using SharpDevelop 3.1 which comes with "2.5.0 (IronPython 2.0.2 
> (2.0.0.0) on .NET 2.0.50727.3603)".
>
>   
Yeah, that's IronPython 2.0 - which is fine but not as good as IronPython 2.6. 
;-)

> So this issue is resolved with IronPython 2.6, then?
>
>   

No idea. I can't reproduce the problem with IronPython 2.6 though. Try 
installing IronPython 2 and seeing what happens from the interactive 
interpreter (whether you can reproduce the problem or not). It is
*possible* that it's caused by the way SharpDevelop generates its executables, 
but that's highly unlikely to be the cause of the problem.

All the best,

Michael Foord

> Thanks!
>
> -- Leo
>
> -----Original Message-----
> From: users-boun...@lists.ironpython.com 
> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord
> Sent: 2009?11?10? 14:05
> To: Discussion of IronPython
> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8
>
> Leonides Saguisag wrote:
>   
>> Hi Michael,
>>
>> I just verified the empty string theory that you mentioned and 
>> Python25\lib\codecs.py (comes with the standard library in Python 2.5) has 
>> the following defined:
>>
>>   
>>     
> You're using IronPython 2.0 then?
>
> If I use IronPython 2.6 it correctly reports a text file as not starting with 
> the BOM:
>
>  >>> import codecs
>  >>> codecs.BOM_UTF8
> u'\xef\xbb\xbf'
>  >>> lines = open('foo.txt').readlines()  >>> lines ['foo']  >>> 
> lines[0].startswith(codecs.BOM_UTF8)
> False
>
>
> All the best,
>
> Michael Foord
>   
>> # UTF-8
>> BOM_UTF8 = '\xef\xbb\xbf'
>>
>>
>> So it is not an empty string.
>>
>> Maybe I am approaching this wrong and you guys can provide me with an 
>> alternative way of doing this.  I am trying to read a file and determine if 
>> the file is encoded in UTF-8 or not.  The approach I took was to use 
>> python's built-in open function to read the text file into an array of 
>> strings and check if the first line starts with the UTF-8 byte order mark by 
>> using line.startswith(codecs.BOM_UTF8).  As I noted below, this works fine 
>> in Python 2.5 but in IronPython it just keeps saying it found a UTF-8 BOM 
>> even though there is none present.
>>
>> Thanks!
>>
>> -- Leo
>>
>> -----Original Message-----
>> From: users-boun...@lists.ironpython.com
>> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael 
>> Foord
>> Sent: 2009?11?10? 13:32
>> To: Discussion of IronPython
>> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8
>>
>> Leonides Saguisag wrote:
>>   
>>     
>>> Thank you for taking the time to reply.  Any idea why this would happen in 
>>> IronPython but not with the standard Python interpreter?  What is weirding 
>>> me out is that the exact same script behaves differently depending on 
>>> whether I use IronPython or the standard Python interpreter.
>>>   
>>>     
>>>       
>> Well, if codecs.BOM_UTF8 is set to the empty string (you didn't say if you 
>> have tried this yet?) then it would be due to a bug in IronPython somewhere 
>> - but at least you would know what was causing it.
>>
>> If it is the empty string, purely speculating, it could be due to the way 
>> the .NET framework treats the BOM at the start of strings. Pure speculation 
>> though - that might not be the problem at all or it could be caused by 
>> something entirely different.
>>
>> In .NET it would be more normal to check for the BOM with bytes, as by the 
>> time you have a string you have (usually) decoded already. 
>> IronPython 2.X is a bit odd for the .NET framework in this respect.
>>
>> Michael
>>
>>   
>>     
>>> Thanks!
>>>
>>> -- Leo
>>>
>>> -----Original Message-----
>>> From: users-boun...@lists.ironpython.com
>>> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael 
>>> Foord
>>> Sent: 2009?11?10? 13:17
>>> To: Discussion of IronPython
>>> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8
>>>
>>> Leonides Saguisag wrote:
>>>   
>>>     
>>>       
>>>> Hi everyone,
>>>>
>>>> I am encountering a weird issue with getting to codecs.BOM_UTF8 to work 
>>>> correctly.  I am using SharpDevelop 3.1.
>>>>
>>>> Here is the test script that I put together:
>>>>
>>>>
>>>> import sys
>>>> sys.path.append(r'D:\Python25\Lib')
>>>> import codecs
>>>>
>>>> print sys.version
>>>> myfile = open(r'D:\Temp\text_file_with_utf8_bom.txt', 'r') lines =
>>>> myfile.readlines()
>>>> myfile.close()
>>>> if lines[0].startswith(codecs.BOM_UTF8):
>>>>    print ('UTF-8 BOM detected!')
>>>> else:
>>>>    print ('UTF-8 BOM not detected!')
>>>>
>>>> myfile = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r') lines 
>>>> =
>>>> myfile.readlines()
>>>> myfile.close()
>>>> if lines[0].startswith(codecs.BOM_UTF8):
>>>>    print ('UTF-8 BOM detected!')
>>>> else:
>>>>    print ('UTF-8 BOM not detected!')
>>>>
>>>>
>>>> If I run the executable that I get from SharpDevelop this is what I get:
>>>> bin\Debug> Test.exe
>>>> 2.5.0 ()
>>>> UTF-8 BOM detected!
>>>> UTF-8 BOM detected!
>>>>
>>>>
>>>> But if I run the same script using the standard python interpreter, this 
>>>> is what I get:
>>>> bin\Debug> D:\Python25\python.exe ..\..\Program.py
>>>> 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit 
>>>> (Intel)]
>>>> UTF-8 BOM detected!
>>>> UTF-8 BOM not detected!
>>>>
>>>>
>>>> The script works correctly with the standard python interpreter but for 
>>>> some reason is not working right with IronPython.
>>>>
>>>> Any ideas what is going wrong?
>>>>   
>>>>     
>>>>       
>>>>         
>>> I'm not in a position to check right now, but this could happen if 
>>> codes.UTF8_BOM is set to the empty string.
>>>
>>> Michael
>>>
>>>   
>>>     
>>>       
>>>> Thanks!
>>>>
>>>> Best regards,
>>>> -- Leo
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users@lists.ironpython.com
>>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>>   
>>>>     
>>>>       
>>>>         
>>> --
>>> http://www.ironpythoninaction.com/
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users@lists.ironpython.com
>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>> _______________________________________________
>>> Users mailing list
>>> Users@lists.ironpython.com
>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>   
>>>     
>>>       
>> --
>> http://www.ironpythoninaction.com/
>>
>> _______________________________________________
>> Users mailing list
>> Users@lists.ironpython.com
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>> _______________________________________________
>> Users mailing list
>> Users@lists.ironpython.com
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>   
>>     
>
>
> --
> http://www.ironpythoninaction.com/
>
> _______________________________________________
> Users mailing list
> Users@lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users@lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>   


--
http://www.ironpythoninaction.com/

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Reply via email to