Kinsley Turner wrote: > >>>Similarly I have a string with the IBM-extended-ASCII degrees symbol >>>(ascii 0xb0) >>>that is read in from a network-connected field device. Somehow this > > ends > >>>up with >>>an extended 'A' (with a single dot over it.) prepended before it. > > >>This question is inappropriate for this mailing list, which is for the >>pywin32 extensions, which you don't appear to be using. You should send >>the question to python-list. > > > Really? Ok. > The list index (http://mail.python.org/mailman/listinfo) says > this list is for "Python on win32". > Which understandably misled you, but this list is specifically for issues concerning the use of the win32all extension modules. General Python help is best obtained fro python-list@python.org, whether on Windows or any other platform. > >>What does "spit it back down on a web request for /favicon.ico" mean??? >>What "it" comes back from "where" corrupted? > > > When I say 'spit', you can read this as "transmits to the requesting > web-browser using HTTP protocol over a BSD-style socket layering > atop TCP/IP". > > >>What does corrupted mean? > > > No longer in it's original form. > In this case it looks like bytes have been removed & changed. > > >>It would help greatly if you showed the actual code that causes the >>alleged problem. Even better would be to cut that out and make it >>into a small standalone script that demonstrates the problem. Also >>stating the expected or preferred result would be a good idea. > > > Yes I agree, but not all problems can be rendered down into a simple > example easily. You see in this case the string is rendered into > an image and served back to a web browser. I had hoped for a simple > answer like "Add encoding header XYZ to your script". Alas. > > >>AFAIK ASCII describes only characters with ordinals in range(0x80). > > > Yes that's true, original ASCII was only 7 bit. > Obviously I was referring to ISO-8859-1, commonly referred to as > 'Extended ASCII' and IIRC popularised by IBM in the 80's (70's?). > It's been the dominant latin character set for 20 years or so. > But you knew this already. > > >>Perhaps you mean that you have a string which contains '\xb0'. > > > No. > The string is encoded as a single symbol within the python string, > not a trigraph (quadgraph?). > In Python, of course, '\xb0' is a single-byte string literal containing only the character whose integer value is hex B0. > >>"Somehow" -- unless you have pixies at the bottom of your garden, it >>got mangled because *YOU* did something to it. If you can't tell us >>what you did to it, we can't help you. > > > The string in question is a test-string. Basically it contains a > few words, then "1234567890 !C" where the '!' is the degrees-symbol > as specified in Extended ASCII / ISO-8859-*. The text is then rendered > into an image (using a supplied GNU TTF font) via the Python Imaging > Library > (PIL). Under UNIX this comes out as expected, with the degrees symbol > rendered appropriately. When run as a Win32 service the rendering > comes out with an accented 'A' in front of it. > > >>"Extended A with a single dot over it"?? Are you sure that's a dot? I'd >>like to know what language uses A with dot above *and* what the pixies >>are using to render it on your screen. Could it possibly be a circumflex > > >>accent? > > > I thought it was a 'Å' (pasted in an 'A' with a dot above it, > well, ok, maybe I should have said 'circle above it') Depending > on your font & size sometimes the dot is connected to the top of > the A, in other's it hovers. Try it: print u'\xc5'. This is from an > ISO-8859-1 encoding, YMMV. Sorry, I don't know what language > uses this character either. It's being rendered by the PIL. > > > >>Hint (1): u'\xb0'.encode('utf-8') produces '\xc2\xb0'. If that is >>displayed by a gadget that's expecting iso-8859-1 (or cp1252) instead of > > >>utf-8, '\xc2' will show as Latin capital A with circumflex. > > > Hmmm, I wonder if PIL is doing some kind of modification to the string > before rendering it. This question might better be posed to the PIL list. > I can't really control what the device sends back, but I think this > is not the only Extended ASCII / ISO-8859-* character it delivers. > > <lightbulb>Ahhh... I think I've got it.</lightbulb> > >>Hint (2): print repr(allegedly_mangled_string) > > > This gives me '1234567890\xb0C' from a UNIX python (2.4.2 #1) > Win32 (python 2.4.2 #64) gives the same. So it mustn't be something > to do with the python / string representation. > > >>"prepended before it" -- as opposed to "prepended after it"? > > > That would be "'appended' after it", wouldn't it? > (or are you just trying to pick a spoonerism) > > > >>Perhaps we should avoid Westpac ATMs until you sound the all-clear :-) > > > Last time I looked these ran on OS/2[1], so I think you'll be safe. > > > > thanks for the hints, > -kt > > > > [1] At least the old ones anyway. > > >
Anyway, it looks like you are on the track of what appears to be a character set or encoding issue. Good luck. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ _______________________________________________ Python-win32 mailing list Python-win32@python.org http://mail.python.org/mailman/listinfo/python-win32