Re: Python parsing iTunes XML/COM

2008-08-01 Thread william tanksley
John Machin [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: Cool. Sorry for the misunderstanding. Thank you for helping again! Postscript: your request to print the actual data did the trick. I'd back inspecting actual data against armchair philosophy any time :-) Heh.

Re: Python parsing iTunes XML/COM

2008-08-01 Thread John Machin
On Aug 2, 10:02 am, william tanksley [EMAIL PROTECTED] wrote: Given that the input file was Unicode, You mean something like encoded in UTF-8. Here's another reference for you to read: http://www.amk.ca/python/howto/unicode -- http://mail.python.org/mailman/listinfo/python-list

Re: Python parsing iTunes XML/COM

2008-07-31 Thread Stefan Behnel
william tanksley wrote: william tanksley [EMAIL PROTECTED] wrote: I'm still puzzled why I'm getting some non-Unicode out of an ElementTree's text, though. Now I know. Okay, my answer is that cElementTree (in Python 2.5) is simply deranged when it comes to Unicode. It assumes everything's

Re: Python parsing iTunes XML/COM

2008-07-31 Thread John Machin
On Jul 31, 12:58 am, william tanksley [EMAIL PROTECTED] wrote: Thank you for the response. Here's some more info, including a little that you didn't ask me for but which might be useful. John Machin [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: To ask another way: how

Re: Python parsing iTunes XML/COM

2008-07-31 Thread william tanksley
Stefan Behnel [EMAIL PROTECTED] wrote: william tanksley wrote: Okay, my answer is that ElementTree (in Python 2.5) is simply deranged when it comes to Unicode. It assumes everything's ASCII. It does not assume that. It *requires* byte strings to be ASCII. You can't encode Unicode into an

Re: Python parsing iTunes XML/COM

2008-07-31 Thread william tanksley
John Machin [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: Buffett Time - Annual Shareholders\xc2\xa0L.mp3 1. This isn't Unicode; it's missing the u (I printed using repr). 2. It's got the UTF-8 bytes there in the middle. In addition to the above results, *WHAT*

Re: Python parsing iTunes XML/COM

2008-07-31 Thread Stefan Behnel
william tanksley wrote: I didn't pass a string. I passed a file. It didn't error out; instead, it produced bytestring-encoded output (not Unicode). From my experience (and from the source code I have seen so far), ElementTree does not return UTF-8 encoded strings at the API level. Can you

Re: Python parsing iTunes XML/COM

2008-07-31 Thread John Machin
On Jul 31, 11:54 pm, william tanksley [EMAIL PROTECTED] wrote: John Machin [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: Buffett Time - Annual Shareholders\xc2\xa0L.mp3 1. This isn't Unicode; it's missing the u (I printed using repr). 2. It's got the UTF-8 bytes

Re: Python parsing iTunes XML/COM

2008-07-31 Thread william tanksley
John Machin [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: Let's try again: Cool. Sorry for the misunderstanding. Thank you for helping again! Postscript: your request to print the actual data did the trick. I'm including the rest of my reply just to provide context, but

Re: Python parsing iTunes XML/COM

2008-07-31 Thread Jerry Hill
On Thu, Jul 31, 2008 at 9:44 AM, william tanksley [EMAIL PROTECTED] wrote: I'm using a file, a file that's correctly encoded as UTF-8, and it returns some text elements that are raw bytes (undecoded). I have to manually decode them. I can't reproduce this behavior. Here's a simple test case:

Re: Python parsing iTunes XML/COM

2008-07-31 Thread John Machin
On Aug 1, 7:44 am, william tanksley [EMAIL PROTECTED] wrote: John Machin [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: Let's try again: Cool. Sorry for the misunderstanding. Thank you for helping again! Postscript: your request to print the actual data did the trick.

Re: Python parsing iTunes XML/COM

2008-07-30 Thread pyshib
If you want to convert the file names which use standard URL encoding (with %20 for space, etc) use: from urllib import unquote new_filename = unquote(filename) I have found this does not convert encoded characters of the form '#CC;' so you may have to do that manually. I think these are just

Re: Python parsing iTunes XML/COM

2008-07-30 Thread william tanksley
Thank you for the response. Here's some more info, including a little that you didn't ask me for but which might be useful. John Machin [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: To ask another way: how do I convert from a file:// URL to a local path in a standard

Re: Python parsing iTunes XML/COM

2008-07-30 Thread Jerry Hill
On Wed, Jul 30, 2008 at 10:58 AM, william tanksley [EMAIL PROTECTED] wrote: Here's one example. The others are similar -- they have the same things that look like problems to me. Buffett Time - Annual Shareholders\xc2\xa0L.mp3 Note some problems here: 1. This isn't Unicode; it's missing

Re: Python parsing iTunes XML/COM

2008-07-30 Thread william tanksley
Jerry Hill [EMAIL PROTECTED] wrote: william tanksley [EMAIL PROTECTED] wrote: Here's one example. The others are similar -- they have the same things that look like problems to me. Buffett Time - Annual Shareholders\xc2\xa0L.mp3 I tried doing track_id.encode(utf-8), but it doesn't seem

Re: Python parsing iTunes XML/COM

2008-07-30 Thread Stefan Behnel
william tanksley wrote: Okay, so you decode to go from raw byes into a given encoding, and you encode to go from a given encoding to raw bytes. No, decoding goes from a byte sequence to a Unicode string and encoding goes from a Unicode string to a byte sequence. Unicode is not an encoding. A

Re: Python parsing iTunes XML/COM

2008-07-30 Thread Jerry Hill
On Wed, Jul 30, 2008 at 2:27 PM, william tanksley [EMAIL PROTECTED] wrote: Awesome... Thank you! I had my mental model of Python turned around backwards. That's an odd feeling. Okay, so you decode to go from raw byes into a given encoding, and you encode to go from a given encoding to raw

Re: Python parsing iTunes XML/COM

2008-07-30 Thread william tanksley
Jerry Hill [EMAIL PROTECTED] wrote: On Wed, Jul 30, 2008 at 2:27 PM, william tanksley [EMAIL PROTECTED] wrote: Awesome... Thank you! I had my mental model of Python turned around backwards. That's an odd feeling. Okay, so you decode to go from raw byes into a given encoding, and you encode

Re: Python parsing iTunes XML/COM

2008-07-30 Thread william tanksley
william tanksley [EMAIL PROTECTED] wrote: I'm still puzzled why I'm getting some non-Unicode out of an ElementTree's text, though. Now I know. Okay, my answer is that cElementTree (in Python 2.5) is simply deranged when it comes to Unicode. It assumes everything's ASCII. Reference:

Re: Python parsing iTunes XML/COM

2008-07-29 Thread william tanksley
To ask another way: how do I convert from a file:// URL to a local path in a standard way, so that filepaths from two different sources will work the same way in a dictionary? Right now I'm using the following source: track_id = url2pathname(urlparse(track_id).path) url2pathname is from urllib;

Re: Python parsing iTunes XML/COM

2008-07-29 Thread John Machin
On Jul 30, 3:53 am, william tanksley [EMAIL PROTECTED] wrote: To ask another way: how do I convert from a file:// URL to a local path in a standard way, so that filepaths from two different sources will work the same way in a dictionary? Right now I'm using the following source: track_id =

Python parsing iTunes XML/COM

2008-07-28 Thread william tanksley
I'm trying to convert the URLs contained in iTunes' XML file into a form comparable with the filenames returned by iTunes' COM interface. I'm writing a podcast sorter in Python; I'm using iTunes under Windows right now. iTunes' COM provides most of my data input and all of my mp3/aac editing