Re: [Zope3-Users] TAL expression and ascii

2007-12-14 Thread Justin Fletcher
Top replying to my own post...  I believe I figured out the problem.

In my latest tests I was using unicode(variable, encoding="utf-8") for
all of the variables _except_ the return of BeautifulSoup...  I my haste
I overlooked it... which is silly because I know BeautifulSoup returns
UTF-8.  After fixing the conversion of that data the accented characters
are appearing correctly in the browser and all is well.

Thanks again for the help!  Character encoding was something I had only
rarely dealt with so the pointers provided offered a great learning
opportunity.

-Justin


On Fri, 2007-12-14 at 14:12 +0100, Justin Fletcher wrote:
> On Fri, 2007-12-14 at 12:23 +0200, Marius Gedminas wrote:
> > On Fri, Dec 14, 2007 at 01:20:52AM +0100, Justin Fletcher wrote:
> 
> 
> 
> > 
> > There's no magic.  "Zope 3 strings are Unicode" is a convention, and
> > Zope makes it easy to follow by decoding all HTTP request strings into
> > Unicode objects.  If you're migrating existing non-Unicode data into
> > ZODB with a simple Python script, you'll have to take care of converting
> > your binary strings to Unicode yourself.
> > 
> > > Am I misunderstanding something or doing something wrong?
> > 
> > I think so.  If you could show us how you're migrating "some data" into
> > the ZODB, we could give you more advice.
> > 
> > Regards,
> > Marius Gedminas
> 
> Thanks for your help so far.  It has already given me some things to
> try.  Unfortunately I am still unable to get it to completely work,
> though I have made progress.
> 


___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users


Re: [Zope3-Users] TAL expression and ascii

2007-12-14 Thread Justin Fletcher

On Fri, 2007-12-14 at 12:23 +0200, Marius Gedminas wrote:
> On Fri, Dec 14, 2007 at 01:20:52AM +0100, Justin Fletcher wrote:



> 
> There's no magic.  "Zope 3 strings are Unicode" is a convention, and
> Zope makes it easy to follow by decoding all HTTP request strings into
> Unicode objects.  If you're migrating existing non-Unicode data into
> ZODB with a simple Python script, you'll have to take care of converting
> your binary strings to Unicode yourself.
> 
> > Am I misunderstanding something or doing something wrong?
> 
> I think so.  If you could show us how you're migrating "some data" into
> the ZODB, we could give you more advice.
> 
> Regards,
> Marius Gedminas

Thanks for your help so far.  It has already given me some things to
try.  Unfortunately I am still unable to get it to completely work,
though I have made progress.

I started to write a long description of how I am retrieving, parsing,
and storing the data, but instead I'll just post code.

I just added the unicode() conversions around the variables, so the code
as it is below will throw an exception because the strings contain
characters outside of the ascii range.  Before converting to Unicode
running 'type(variable)' on all the variables says that they are .  I have tried passing different 'encoding=' settings to
unicode(), but while it will allow the code to run, and the ZPT pages to
display, the accented characters are not displayed correctly in the
browser.  The encoding= settings I have tried so far are 'utf-8', and
'latin-1'.

One last piece of the puzzle is that if I use the 'mdb-export' command
and dump the results to a file, the unix 'file' command says that the
file is encoded UTF-8.

Lastly, I run the code below like this:
$ zopectl debug
>>> from mysite.migrateFromMDB import migrateFromMDB
>>> migrateFromMDB(root['mysite'])


Thanks again for any help or pointers you might be able to provide,
-Justin

mysite/migrateFromMDB.py:
===
from BeautifulSoup import BeautifulSoup
from os import popen4, walk
import csv
from mysite.app import Customer, Page


def migrateFromMDB(mysite):
mdb_export = '/usr/bin/mdb-export -H'
tables = ['setings', 'pages']
mysite_data = '/home/justin/web/mysite/'

for root, dirs, files in walk(mysite_data):
data_file_name = None
if 'DATA.mdb' in files:
# We have a database
data_file_name = 'DATA.mdb'

cur_file = "%s/%s" % (root, data_file_name)
infile, outfile = popen4("%s %s %s" % (mdb_export, cur_file,
'pages'))
infile.close()
print "Reading: %s/%s" % (root, data_file_name)
reader = csv.reader(outfile)
for row in reader:
if row[0] == "Can't alloc filename": continue
site = root.split('/')[-1]
if site not in mysite.keys():
mysite[site] = Customer() 
page_id = unicode(row[0])
name = unicode(row[1])
content = unicode(row[2])
order_by = unicode(row[3])
hidden = unicode(row[18])
title = unicode(row[19])
keywords = unicode(row[20])
description = unicode(row[21])

mysite[site][page_id] = Page()
cursite = mysite[site][page_id]
cursite.keywords = keywords
cursite.description = description
cursite.order = order_by
cursite.title = title
cursite.name = name

soup = BeautifulSoup(content)
cursite.content = unicode(soup.prettify())

outfile.close()

elif 'data.mdb' in files:
data_file_name = 'data.mdb'


___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users


Re: [Zope3-Users] TAL expression and ascii

2007-12-14 Thread Marius Gedminas
On Fri, Dec 14, 2007 at 01:20:52AM +0100, Justin Fletcher wrote:
> I am trying to migrate some data into the ZODB.  I am storing this data
> in a string, and later presenting the data in a ZPT similar to this:
> 
> 
> 
> This works fine for text that does not have any characters outside of
> the ascii range, but for text that does I receive this error:
> 
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 3283: ordinal not in range(128)

You are mixing non-ASCII str objects with Unicode strings somewhere.

> The characters are typically things like what the HTML code é
> would generate, but it is a fairly significant amount of text so I am
> unsure specifically what characters are causing the problem.

Any that are non-ASCII.

> My understanding is that Zope3 strings are Unicode so I don't understand
> why the ascii range is a restriction.  Shouldn't these characters be
> stored and presented transparently?

There's no magic.  "Zope 3 strings are Unicode" is a convention, and
Zope makes it easy to follow by decoding all HTTP request strings into
Unicode objects.  If you're migrating existing non-Unicode data into
ZODB with a simple Python script, you'll have to take care of converting
your binary strings to Unicode yourself.

> Am I misunderstanding something or doing something wrong?

I think so.  If you could show us how you're migrating "some data" into
the ZODB, we could give you more advice.

Regards,
Marius Gedminas
-- 
lg_PC.gigacharset (lg = little green men language, PC = proxima centauri)
-- Markus Kuhn provides an example of a locale


signature.asc
Description: Digital signature
___
Zope3-users mailing list
Zope3-users@zope.org
http://mail.zope.org/mailman/listinfo/zope3-users