Re: [python-win32] String weirdness on python 2.4 / windows

Steve Holden Thu, 20 Oct 2005 00:01:52 -0700

Kinsley Turner wrote:
> 
>>>Similarly I have a string with the IBM-extended-ASCII degrees symbol
>>>(ascii 0xb0)
>>>that is read in from a network-connected field device.  Somehow this
> 
> ends
> 
>>>up with
>>>an extended 'A' (with a single dot over it.) prepended before it.
> 
> 
>>This question is inappropriate for this mailing list, which is for the
>>pywin32 extensions, which you don't appear to be using. You should send
>>the question to python-list.
> 
> 
> Really?  Ok.
> The list index (http://mail.python.org/mailman/listinfo) says
> this list is for "Python on win32".
> 
Which understandably misled you, but this list is specifically for 
issues concerning the use of the win32all extension modules. General 
Python help is best obtained fro python-list@python.org, whether on 
Windows or any other platform.
> 
>>What does "spit it back down on a web request for /favicon.ico" mean???
>>What "it" comes back from "where" corrupted?
> 
> 
> When I say 'spit', you can read this as "transmits to the requesting
> web-browser using HTTP protocol over a BSD-style socket layering
> atop TCP/IP".
> 
> 
>>What does corrupted mean?
> 
> 
> No longer in it's original form.
> In this case it looks like bytes have been removed & changed.
> 
> 
>>It would help greatly if you showed the actual code that causes the
>>alleged problem. Even better would be to cut that out and make it
>>into a small standalone script that demonstrates the problem. Also
>>stating the expected or preferred result would be a good idea.
> 
> 
> Yes I agree, but not all problems can be rendered down into a simple
> example easily.  You see in this case the string is rendered into
> an image and served back to a web browser.  I had hoped for a simple
> answer like "Add encoding header XYZ to your script". Alas.
> 
> 
>>AFAIK ASCII describes only characters with ordinals in range(0x80).
> 
> 
> Yes that's true, original ASCII was only 7 bit.
> Obviously I was referring to ISO-8859-1, commonly referred to as
> 'Extended ASCII' and IIRC popularised by IBM in the 80's (70's?).
> It's been the dominant latin character set for 20 years or so.
> But you knew this already.
> 
> 
>>Perhaps you mean that you have a string which contains '\xb0'.
> 
> 
> No.
> The string is encoded as a single symbol within the python string,
> not a trigraph (quadgraph?).
> 
In Python, of course, '\xb0' is a single-byte string literal containing 
only the character whose integer value is hex B0.
> 
>>"Somehow" -- unless you have pixies at the bottom of your garden, it
>>got mangled because *YOU* did something to it. If you can't tell us
>>what you did to it, we can't help you.
> 
> 
> The string in question is a test-string.  Basically it contains a
> few words, then "1234567890 !C" where the '!' is the degrees-symbol
> as specified in Extended ASCII / ISO-8859-*.  The text is then rendered
> into an image (using a supplied GNU TTF font) via the Python Imaging
> Library
> (PIL).  Under UNIX this comes out as expected, with the degrees symbol
> rendered appropriately.  When run as a Win32 service the rendering
> comes out with an accented 'A' in front of it.
> 
> 
>>"Extended A with a single dot over it"?? Are you sure that's a dot? I'd
>>like to know what language uses A with dot above *and* what the pixies
>>are using to render it on your screen. Could it possibly be a circumflex
> 
> 
>>accent?
> 
> 
> I thought it was a 'Å' (pasted in an 'A' with a dot above it,
> well, ok, maybe I should have said 'circle above it') Depending
> on your font & size sometimes the dot is connected to the top of
> the A, in other's it hovers.  Try it: print u'\xc5'.  This is from an
> ISO-8859-1 encoding, YMMV.  Sorry, I don't know what language
> uses this character either.  It's being rendered by the PIL.
> 
> 
> 
>>Hint (1):  u'\xb0'.encode('utf-8') produces '\xc2\xb0'. If that is
>>displayed by a gadget that's expecting iso-8859-1 (or cp1252) instead of
> 
> 
>>utf-8, '\xc2' will show as Latin capital A with circumflex.
> 
> 
> Hmmm, I wonder if PIL is doing some kind of modification to the string
> before rendering it.  This question might better be posed to the PIL list.
> I can't really control what the device sends back, but I think this
> is not the only Extended ASCII / ISO-8859-* character it delivers.
> 
> <lightbulb>Ahhh... I think I've got it.</lightbulb>
> 
>>Hint (2): print repr(allegedly_mangled_string)
> 
> 
> This gives me '1234567890\xb0C' from a UNIX python (2.4.2 #1)
> Win32 (python 2.4.2 #64) gives the same.  So it mustn't be something
> to do with the python / string representation.
> 
> 
>>"prepended before it" -- as opposed to "prepended after it"?
> 
> 
> That would be "'appended' after it", wouldn't it?
> (or are you just trying to pick a spoonerism)
> 
> 
> 
>>Perhaps we should avoid Westpac ATMs until you sound the all-clear :-)
> 
> 
> Last time I looked these ran on OS/2[1], so I think you'll be safe.
> 
> 
> 
> thanks for the hints,
> -kt
> 
> 
> 
> [1] At least the old ones anyway.
> 
> 
>


Anyway, it looks like you are on the track of what appears to be a 
character set or encoding issue. Good luck.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC                     www.holdenweb.com
PyCon TX 2006                  www.python.org/pycon/
_______________________________________________
Python-win32 mailing list
Python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32

Re: [python-win32] String weirdness on python 2.4 / windows

Reply via email to