Hi Michael,

I am using SharpDevelop 3.1 which comes with "2.5.0 (IronPython 2.0.2 (2.0.0.0) 
on .NET 2.0.50727.3603)".

So this issue is resolved with IronPython 2.6, then?

Thanks!

-- Leo 

-----Original Message-----
From: users-boun...@lists.ironpython.com 
[mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord
Sent: 2009?11?10? 14:05
To: Discussion of IronPython
Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8

Leonides Saguisag wrote:
> Hi Michael,
>
> I just verified the empty string theory that you mentioned and 
> Python25\lib\codecs.py (comes with the standard library in Python 2.5) has 
> the following defined:
>
>   
You're using IronPython 2.0 then?

If I use IronPython 2.6 it correctly reports a text file as not starting with 
the BOM:

 >>> import codecs
 >>> codecs.BOM_UTF8
u'\xef\xbb\xbf'
 >>> lines = open('foo.txt').readlines()  >>> lines ['foo']  >>> 
 >>> lines[0].startswith(codecs.BOM_UTF8)
False


All the best,

Michael Foord
> # UTF-8
> BOM_UTF8 = '\xef\xbb\xbf'
>
>
> So it is not an empty string.
>
> Maybe I am approaching this wrong and you guys can provide me with an 
> alternative way of doing this.  I am trying to read a file and determine if 
> the file is encoded in UTF-8 or not.  The approach I took was to use python's 
> built-in open function to read the text file into an array of strings and 
> check if the first line starts with the UTF-8 byte order mark by using 
> line.startswith(codecs.BOM_UTF8).  As I noted below, this works fine in 
> Python 2.5 but in IronPython it just keeps saying it found a UTF-8 BOM even 
> though there is none present.
>
> Thanks!
>
> -- Leo
>
> -----Original Message-----
> From: users-boun...@lists.ironpython.com 
> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael Foord
> Sent: 2009?11?10? 13:32
> To: Discussion of IronPython
> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8
>
> Leonides Saguisag wrote:
>   
>> Thank you for taking the time to reply.  Any idea why this would happen in 
>> IronPython but not with the standard Python interpreter?  What is weirding 
>> me out is that the exact same script behaves differently depending on 
>> whether I use IronPython or the standard Python interpreter.
>>   
>>     
> Well, if codecs.BOM_UTF8 is set to the empty string (you didn't say if you 
> have tried this yet?) then it would be due to a bug in IronPython somewhere - 
> but at least you would know what was causing it.
>
> If it is the empty string, purely speculating, it could be due to the way the 
> .NET framework treats the BOM at the start of strings. Pure speculation 
> though - that might not be the problem at all or it could be caused by 
> something entirely different.
>
> In .NET it would be more normal to check for the BOM with bytes, as by the 
> time you have a string you have (usually) decoded already. 
> IronPython 2.X is a bit odd for the .NET framework in this respect.
>
> Michael
>
>   
>> Thanks!
>>
>> -- Leo
>>
>> -----Original Message-----
>> From: users-boun...@lists.ironpython.com
>> [mailto:users-boun...@lists.ironpython.com] On Behalf Of Michael 
>> Foord
>> Sent: 2009?11?10? 13:17
>> To: Discussion of IronPython
>> Subject: Re: [IronPython] Weird issue with codecs.BOM_UTF8
>>
>> Leonides Saguisag wrote:
>>   
>>     
>>> Hi everyone,
>>>
>>> I am encountering a weird issue with getting to codecs.BOM_UTF8 to work 
>>> correctly.  I am using SharpDevelop 3.1.
>>>
>>> Here is the test script that I put together:
>>>
>>>
>>> import sys
>>> sys.path.append(r'D:\Python25\Lib')
>>> import codecs
>>>
>>> print sys.version
>>> myfile = open(r'D:\Temp\text_file_with_utf8_bom.txt', 'r') lines =
>>> myfile.readlines()
>>> myfile.close()
>>> if lines[0].startswith(codecs.BOM_UTF8):
>>>     print ('UTF-8 BOM detected!')
>>> else:
>>>     print ('UTF-8 BOM not detected!')
>>>
>>> myfile = open(r'D:\Temp\text_file_without_utf8_bom.txt', 'r') lines 
>>> =
>>> myfile.readlines()
>>> myfile.close()
>>> if lines[0].startswith(codecs.BOM_UTF8):
>>>     print ('UTF-8 BOM detected!')
>>> else:
>>>     print ('UTF-8 BOM not detected!')
>>>
>>>
>>> If I run the executable that I get from SharpDevelop this is what I get:
>>> bin\Debug> Test.exe
>>> 2.5.0 ()
>>> UTF-8 BOM detected!
>>> UTF-8 BOM detected!
>>>
>>>
>>> But if I run the same script using the standard python interpreter, this is 
>>> what I get:
>>> bin\Debug> D:\Python25\python.exe ..\..\Program.py
>>> 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit 
>>> (Intel)]
>>> UTF-8 BOM detected!
>>> UTF-8 BOM not detected!
>>>
>>>
>>> The script works correctly with the standard python interpreter but for 
>>> some reason is not working right with IronPython.
>>>
>>> Any ideas what is going wrong?
>>>   
>>>     
>>>       
>> I'm not in a position to check right now, but this could happen if 
>> codes.UTF8_BOM is set to the empty string.
>>
>> Michael
>>
>>   
>>     
>>> Thanks!
>>>
>>> Best regards,
>>> -- Leo
>>> _______________________________________________
>>> Users mailing list
>>> Users@lists.ironpython.com
>>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>>   
>>>     
>>>       
>> --
>> http://www.ironpythoninaction.com/
>>
>> _______________________________________________
>> Users mailing list
>> Users@lists.ironpython.com
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>> _______________________________________________
>> Users mailing list
>> Users@lists.ironpython.com
>> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>>   
>>     
>
>
> --
> http://www.ironpythoninaction.com/
>
> _______________________________________________
> Users mailing list
> Users@lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
> _______________________________________________
> Users mailing list
> Users@lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
>   


--
http://www.ironpythoninaction.com/

_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
Users mailing list
Users@lists.ironpython.com
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com

Reply via email to