Re: [Zope3-Users] TAL expression and ascii
Top replying to my own post... I believe I figured out the problem. In my latest tests I was using unicode(variable, encoding="utf-8") for all of the variables _except_ the return of BeautifulSoup... I my haste I overlooked it... which is silly because I know BeautifulSoup returns UTF-8. After fixing the conversion of that data the accented characters are appearing correctly in the browser and all is well. Thanks again for the help! Character encoding was something I had only rarely dealt with so the pointers provided offered a great learning opportunity. -Justin On Fri, 2007-12-14 at 14:12 +0100, Justin Fletcher wrote: > On Fri, 2007-12-14 at 12:23 +0200, Marius Gedminas wrote: > > On Fri, Dec 14, 2007 at 01:20:52AM +0100, Justin Fletcher wrote: > > > > > > > There's no magic. "Zope 3 strings are Unicode" is a convention, and > > Zope makes it easy to follow by decoding all HTTP request strings into > > Unicode objects. If you're migrating existing non-Unicode data into > > ZODB with a simple Python script, you'll have to take care of converting > > your binary strings to Unicode yourself. > > > > > Am I misunderstanding something or doing something wrong? > > > > I think so. If you could show us how you're migrating "some data" into > > the ZODB, we could give you more advice. > > > > Regards, > > Marius Gedminas > > Thanks for your help so far. It has already given me some things to > try. Unfortunately I am still unable to get it to completely work, > though I have made progress. > ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users
Re: [Zope3-Users] TAL expression and ascii
On Fri, 2007-12-14 at 12:23 +0200, Marius Gedminas wrote: > On Fri, Dec 14, 2007 at 01:20:52AM +0100, Justin Fletcher wrote: > > There's no magic. "Zope 3 strings are Unicode" is a convention, and > Zope makes it easy to follow by decoding all HTTP request strings into > Unicode objects. If you're migrating existing non-Unicode data into > ZODB with a simple Python script, you'll have to take care of converting > your binary strings to Unicode yourself. > > > Am I misunderstanding something or doing something wrong? > > I think so. If you could show us how you're migrating "some data" into > the ZODB, we could give you more advice. > > Regards, > Marius Gedminas Thanks for your help so far. It has already given me some things to try. Unfortunately I am still unable to get it to completely work, though I have made progress. I started to write a long description of how I am retrieving, parsing, and storing the data, but instead I'll just post code. I just added the unicode() conversions around the variables, so the code as it is below will throw an exception because the strings contain characters outside of the ascii range. Before converting to Unicode running 'type(variable)' on all the variables says that they are . I have tried passing different 'encoding=' settings to unicode(), but while it will allow the code to run, and the ZPT pages to display, the accented characters are not displayed correctly in the browser. The encoding= settings I have tried so far are 'utf-8', and 'latin-1'. One last piece of the puzzle is that if I use the 'mdb-export' command and dump the results to a file, the unix 'file' command says that the file is encoded UTF-8. Lastly, I run the code below like this: $ zopectl debug >>> from mysite.migrateFromMDB import migrateFromMDB >>> migrateFromMDB(root['mysite']) Thanks again for any help or pointers you might be able to provide, -Justin mysite/migrateFromMDB.py: === from BeautifulSoup import BeautifulSoup from os import popen4, walk import csv from mysite.app import Customer, Page def migrateFromMDB(mysite): mdb_export = '/usr/bin/mdb-export -H' tables = ['setings', 'pages'] mysite_data = '/home/justin/web/mysite/' for root, dirs, files in walk(mysite_data): data_file_name = None if 'DATA.mdb' in files: # We have a database data_file_name = 'DATA.mdb' cur_file = "%s/%s" % (root, data_file_name) infile, outfile = popen4("%s %s %s" % (mdb_export, cur_file, 'pages')) infile.close() print "Reading: %s/%s" % (root, data_file_name) reader = csv.reader(outfile) for row in reader: if row[0] == "Can't alloc filename": continue site = root.split('/')[-1] if site not in mysite.keys(): mysite[site] = Customer() page_id = unicode(row[0]) name = unicode(row[1]) content = unicode(row[2]) order_by = unicode(row[3]) hidden = unicode(row[18]) title = unicode(row[19]) keywords = unicode(row[20]) description = unicode(row[21]) mysite[site][page_id] = Page() cursite = mysite[site][page_id] cursite.keywords = keywords cursite.description = description cursite.order = order_by cursite.title = title cursite.name = name soup = BeautifulSoup(content) cursite.content = unicode(soup.prettify()) outfile.close() elif 'data.mdb' in files: data_file_name = 'data.mdb' ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users
Re: [Zope3-Users] TAL expression and ascii
On Fri, Dec 14, 2007 at 01:20:52AM +0100, Justin Fletcher wrote: > I am trying to migrate some data into the ZODB. I am storing this data > in a string, and later presenting the data in a ZPT similar to this: > > > > This works fine for text that does not have any characters outside of > the ascii range, but for text that does I receive this error: > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > 3283: ordinal not in range(128) You are mixing non-ASCII str objects with Unicode strings somewhere. > The characters are typically things like what the HTML code é > would generate, but it is a fairly significant amount of text so I am > unsure specifically what characters are causing the problem. Any that are non-ASCII. > My understanding is that Zope3 strings are Unicode so I don't understand > why the ascii range is a restriction. Shouldn't these characters be > stored and presented transparently? There's no magic. "Zope 3 strings are Unicode" is a convention, and Zope makes it easy to follow by decoding all HTTP request strings into Unicode objects. If you're migrating existing non-Unicode data into ZODB with a simple Python script, you'll have to take care of converting your binary strings to Unicode yourself. > Am I misunderstanding something or doing something wrong? I think so. If you could show us how you're migrating "some data" into the ZODB, we could give you more advice. Regards, Marius Gedminas -- lg_PC.gigacharset (lg = little green men language, PC = proxima centauri) -- Markus Kuhn provides an example of a locale signature.asc Description: Digital signature ___ Zope3-users mailing list Zope3-users@zope.org http://mail.zope.org/mailman/listinfo/zope3-users