On 04/29/2013 05:47 AM, c...@isbd.net wrote:
A couple of generic comments: your email program made a mess of the
traceback by appending each source line to the location information.
Please mention your Python version & OS. Apparently you're running 2.7
on Linux or similar.
I am debugging some code that creates a static HTML gallery from a
directory hierarchy full of images. It's this package:-
https://pypi.python.org/pypi/Gallery2.py/2.0
It's basically working and does pretty much what I want so I'm happy to
put some effort into it and fix things.
The problem I'm currently chasing is that it can't cope with directory
names that have accented characters in them, it fails when it tries to
write the HTML that creates the page with the thumbnails on.
The code that's failing is:-
raw = os.path.join(directory, self.getNameNoExtension()) + ".html"
file = open(raw, "w")
file.write("".join(html).encode('utf-8'))
You can't encode byte data, it's already encoded. So you're forcing the
Python system to implicitly decode it (using ASCII codec) before letting
you encode it to utf-8. If you think it's already in utf-8, then omit
the encode() call there.
Additionally, you can debug things with some simple print statements, at
least if you decompose your 3-function line so you can get at the
intermediate data. Split the line into three parts;
temp1 = "".join(html) #temp1 is byte data
temp2 = temp1.decode() #temp2 is unicode data
temp3 = temp2.encode("utf-8") #temp3 is byte data again
file.write(temp3)
Now, you'll presumably get the error on the second line, so examine the
bytes around byte 783. Make sure it's really in utf-8, and if it is,
then skip the decode and the encode. If it's not, then Andrew's advice
is pertinent.
I would also look at the variable html. It's a list, but what are the
types of the elements in it?
file.close()
The variable html is a list containing the lines of HTML to write to the
file. It fails when it contains accented characters (an é in this
case). Here's the traceback:-
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/gallery/galleries.py", line 41,
in run self._recurse()
File "/usr/local/lib/python2.7/dist-packages/gallery/galleries.py", line 272, in
_recurse os.path.walk(self.props["sourcedir"], self.processDir, None)
File "/usr/lib/python2.7/posixpath.py", line 246, in walk walk(name, func, arg) File
"/usr/lib/python2.7/posixpath.py", line 246, in walk walk(name, func, arg)
File "/usr/lib/python2.7/posixpath.py", line 246, in walk walk(name, func, arg) File
"/usr/lib/python2.7/posixpath.py", line 238, in walk func(arg, top, names)
File "/usr/local/lib/python2.7/dist-packages/gallery/galleries.py", line
263, in processDir self.createGallery()
File "/usr/local/lib/python2.7/dist-packages/gallery/galleries.py", line
215, in createGallery self.picturemanager.createPictureHTMLs(self.footer)
File "/usr/local/lib/python2.7/dist-packages/gallery/picturemanager.py",
line 84, in createPictureHTMLs curPic.createPictureHTML(self.galleryDirectory,
self.getStylesheet(), self.fullsize, footer)
File "/usr/local/lib/python2.7/dist-packages/gallery/picture.py", line 361, in
createPictureHTML file.write("".join(html).encode('utf-8')) UnicodeDecodeError: 'ascii'
codec can't decode byte 0xc3 in position 783: ordinal not in range(128)
If I understand correctly the encode() is saying that it can't
understand the data in the html because there's a character 0xc3 in it.
I *think* this means that the é is encoded in UTF-8 already in the
incoming data stream (should be as my system is wholly UTF-8 as far as I
know and I created the directory name).
So how do I change the code so I don't get the error? Do I just
decode() the data first and then encode() it?
--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list