Liam Clarke wrote: > I have a base object, which reads the unicode string as bytes like so, > this ignores all but important bits. > > class Mhod: > def __init__(self, f): > self.payload = struct.unpack("36s", f.read(36)) > > Which in turn, is utilised in a Song object, which works like this - > > class Song: > def __init__(self, mhod): > self.location = unicode(mhod.payload, "UTF-16") > self.mhod = mhod > def gLoc(self): > return self.location > def sLoc(self, value): > #Need to coerce data into UTF-16 here > self.mhod.payload = value.encode("UTF-16")
I'm confused about what sLoc is supposed to do. Shouldn't it be setting self.location? ISTM sLoc should parallel what __init__ does. What is value here? If it is an Mhod then you should just do self.location = unicode(mhod.payload, "UTF-16") again. OTOH if you are trying to modify Mhod.payload then you should make a method in mhod and (assuming value is ascii texs) it should be something like self.payload = unicode(value, 'ascii').encode('UTF-16') (though see my previous reply about utf-16 vs utf-16be and utf-16le) > > location = property(gLoc, sLoc) > > If I were to do a > > >>>>x = Mhod(open("test", "rb")) >>>>y = Song(x) > > > I get > > >>>>x.payload > > ':\x00i\x00P\x00o\x00d\x00_\x00C\x00o\x00n\x00t\x00r\x00o\x00l > \x00:\x00M\x00u\x00s\x00i\x00c\x00:\x00F\x004\x004\x00:\x00L > \x00W\x00B\x00R\x00.\x00m\x00p\x003\x00' #Line breaks added. This is utf-16le > > >>>>y.location > > u':iPod_Control:Music:F44:LWBR.mp3' > > Which is what I'm after. What I'm struggling with is coercing the > string that's being passed to sLoc() into UTF-16, and actually > creating any form of unicode string at all without using > > >>>>foo = u'Monkies!' > > > Which I'm sure is going to be in UTF-8, just to spite me. No, it will be a unicode string, and what's wrong with that as a way to create a unicode string anyway? > > So far, the best I've come up with is - > > >>>>foo = unicode("Hi Bob!".encode("UTF-16"), "UTF-16") You are still confused about when to use encode vs decode encode goes *away* from unicode decode goes *towards* unicode So any of these will work: foo = u'Hi Bob!' foo = 'Hi Bob!'.decode('ascii') foo = unicode('Hi Bob!', 'ascii') and, assuming sys.defaultencoding is set to 'ascii', the last can be written foo = unicode('Hi Bob!') > Which, as you mention above, is likely to cause me errors. And > apparently "Hi Bob!" is an 8 bit string encoded in UTF-16... No, what gives you that idea? It is an 8-bit string encoded in ASCII. > *sigh* I suppose I could go the XP route and expect any further users > to just deal with it and pass in a UTF-16 string, but there's got to > be a simple way to handle it., and I'm not having too much luck with > this. > > I've been working from the below document, if anyone can recommend > something further, I'd much appreciate it. > > http://www.amk.ca/python/howto/unicode The references are good too, particularly Roman Czyborra wrote another explanation of Unicode's basic principles; it's at <http://czyborra.com/unicode/characters.html>. Czyborra has written a number of other Unicode-related documentation, available from <http://www.cyzborra.com>. Two other good introductory articles were written by Joel Spolsky <http://www.joelonsoftware.com/articles/Unicode.html> and Jason Orendorff <http://www.jorendorff.com/articles/unicode/>. If this introduction didn't make things clear to you, you should try reading one of these alternate articles before continuing. And my own essay has more references at the end: http://personalpages.tds.net/~kent37/blog/stories/14.html Keep trying, eventually the mists will clear...this is confusing stuff. Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor