Re: [Sqlalchemy-users] question about convert_unicode

Qvx Wed, 19 Apr 2006 03:19:02 -0700

I didn't look at your patch. I just gave a few general observations.

I'm not sure that I would set 'ascii' as default value. I would set it to None (meening "avoid using it" or "inherit value from encoding param").

I guess that flag called "client_encoding" could make things work more explicitely in SA if you *must* use plain strings instead of unicode. But after looking at types.py I'm not sure that String class is correct, and adding client_encoding into the mix makes it even more obscure. Although, it has a potential of actually making it better.

My observations of types.py by looking at code:

Unicode:
- good
- unicode on client side (bind params and column values),
- explicit conversion to encoded string when talking to engine

String:
- strange beast
- it can be unicode as well as string on client side (bind params and column values) depending on convert_unicode param
- it uses both unicode and strings when talking to engine depending on convert_unicode param
- or, in other words: pass unchanged data (be it unicode or string) if there is no convert_unicode param

Your additions could make it into a better thing if done differently:

String:
- string on client side (bind params and column values), no unicode in sight
- talk to database in expected encoding
- use encoding / client_encoding pair to do conversions between client / db side
- remove convert_unicode param (If you want to use unicode there is Unicode class)

I'm not sure what else would break, or what other use case I'm braking with this proposal, but the current String (with or without your additions) leaves a bad taste in my mouth.

I would do it like this (not tested):

Index: lib/sqlalchemy/types.py
===================================================================
--- lib/sqlalchemy/types.py    (revision 1294)
+++ lib/sqlalchemy/types.py    (working copy)
@@ -96,15 +96,24 @@
     def get_constructor_args(self):
         return {'length':self.length}
     def convert_bind_param(self, value, engine):
-        if not engine.convert_unicode or value is None or not isinstance(value, unicode):
+        if value is None:
+            return None
+        elif isinstance(value, unicode):
+            return value.encode(engine.encoding)
+            # or even raise exception (but I wouldn't go that far)
+        elif engine.client_encoding != engine.encoding:
+            return unicode(value, engine.client_encoding).encode(engine.encoding)
+        else:
             return value
+    def convert_result_value(self, value, engine):
+        if value is None:
+            return None
+        elif isinstance(value, unicode):
+            return value.encode(engine.client_encoding)
+        elif engine.client_encoding != engine.encoding :
+            return unicode(value, engine.encoding).encode(engine.client_encoding)
         else:
-            return value.encode(engine.encoding)
-    def convert_result_value(self, value, engine):
-        if not engine.convert_unicode or value is None or isinstance(value, unicode):
             return value
-        else:
-            return value.decode(engine.encoding)
     def adapt_args(self):
         if self.length is None:
             return TEXT()
Index: lib/sqlalchemy/engine.py
===================================================================
--- lib/sqlalchemy/engine.py    (revision 1294)
+++ lib/sqlalchemy/engine.py    (working copy)
@@ -227,7 +227,7 @@
     SQLEngines are constructed via the create_engine() function inside this package.
     """

-    def __init__(self, pool=None, echo=False, logger=None, default_ordering=False, echo_pool=False, echo_uow=False, convert_unicode=False, encoding='utf-8', **params):
+    def __init__(self, pool=None, echo=False, logger=None, default_ordering=False, echo_pool=False, echo_uow=False, encoding='utf-8', client_encoding=None, **params):
         """constructs a new SQLEngine.   SQLEngines should be constructed via the create_engine()
         function which will construct the appropriate subclass of SQLEngine."""
         # get a handle on the connection pool via the connect arguments
@@ -246,8 +246,8 @@
         self.default_ordering=default_ordering
         self.echo = echo
         self.echo_uow = echo_uow
-        self.convert_unicode = convert_unicode
         self.encoding = encoding
+        self.client_encoding = client_encoding or encoding
         self.context = util.ThreadLocal()
         self._ischema = None
         self._figure_paramstyle()

Kind regards,
Tvrtko

On 4/19/06, Vasily Sulatskov <[EMAIL PROTECTED]> wrote:

Hello Qvx,

Well, perhaps you are right. But let's then define what the "right way" is.

Second version of patch that I submitted included default value "ascii" for
new engine parameter "client_encoding" it works in the following way: If user
specifies conver_unicode=True, and doesn't specify client_encoding it will be
ascii, and new types.Sring will try to convert regular strings to unicode
using specifed client_encoding if it unable to convert to unicode it will
produce exception during construction of unicode object.

That guarantee's that any string going to database will get converted to
proper encoding. But I dont't say that it's the best or even "right way".

I also think that the more strictly you enforce unicode usage the better, but
unfortunately there are many places in python where regular string is used
(like str() function e.t.c) so for some time we have to live with regular
strings.

What do you think how it should be in sqlalchemy?

> I'm also the unfortunate one who has to use encodings other than ascii. I'm
> sure that your patch helps, but I'm not sure that this is the "right way".
>
> The thing that I learned from my dealing with unicode and string encodings
> is: always use unicode. What I mean is when you write your source:
> * make all your data (variables, literals) as unicode
> * put the -*- coding: -*- directive so that interpreter knows how to
> convert your u"" strings
>
> Those two rules lead to the following:
>
> # -*- coding: cp1251 -*-
>
> import sqlalchemy
>
> # note that there is no convert_unicode flag, but there is encoding flag
> db = sqlalchemy.create_engine('sqlite://', encoding='cp1251')
>
> # note a change in type of "name" column from String to Unicode
> companies = sqlalchemy.Table('companies', db,
>    sqlalchemy.Column('company_id', sqlalchemy.Integer, primary_key=True),
>    sqlalchemy.Column('name', sqlalchemy.Unicode(50)))
>
> # ....
>
> # OK, unicode
> Company(name=u'Какой-то текст в кодировке cp1251')
>
> # Avoid plain strings
> Company(name='Some text in ascii')
>
>
> This becomes necessity if you have for example more than one database
> driver using different encoding. You get back unicode strings which you can
> combine and copy from one database to another without worrying.
>
> db1 = sqlalchemy.create_engine('mysql://', encoding='latin2')
> db2 = sqlalchemy.create_engine('oracle://', encoding='windows-1250')
>
> ob1 = db1_mapper.select(...)
> ob2 = db2_mapper.select(...)
>
> ob1.name = ob1.name + ob2.name # All unicode, no problems
>
> On 4/17/06, Vasily Sulatskov <[EMAIL PROTECTED]> wrote:
> > Hello Michael,
> >
> > I  know  there's  a  database  engine  parameter  "encoding". It tells
> > sqlalchemy  in  which  encoding  Unicode  objects  should  be saved to
> > database.
> >
> > I  suggest  adding another encoding, let's say "client_encoding" which
> > will  be  used  when  convert_unicode  is True and user assigns string
> > object  to  object attribute. Currently even if convert_unicode is set
> > to True string go to database as-is, bypassing convertion to unicode.
> >
> > This  option  will  allow  to  assign  string's  in  national/platform
> > specific  encodings, like cp1251 straigt to object attributes and they
> > will be properly converted to database encoding (engine.encoding).
> >
> >
> > See,  encoding  on  client  machine  may be different from encoding in
> > database. You can see changes that I suggest from attached diff.
> >
> > Suggested    changes    will    can    make    life    of   users   of
> > multilingual/multienconding  enviromnents  a  little  easier while not
> > affexcting all other users of SQLAlchemy.
> >
> > MB> On Apr 17, 2006, at 5:47 AM, Vasily Sulatskov wrote:
> > >> In my opinion that's a bug and that behaviour should be changed to
> > >> something
> > >> like that:
> > >> 1. If object is unicode then convert it to engine specified
> > >> encoding (like
> > >> utf8) as it happens now
> > >> 2. If it's a string then convert it to unicode using some another
> > >> specifed
> > >> encoding (it should be added to engine parameters). This encoding
> > >> specifies
> > >> client-side encoding. It's often handy to have different encodings
> > >> in database
> > >> and on client machines (at least for people with "alternate
> > >> languages" :-)
> >
> > MB> there already is an encoding parameter for the engine.
> >
> > MB> http://www.sqlalchemy.org/docs/dbengine.myt#database_options
> >
> > MB> does that solve your problem ?
> >
> > --
> > Best regards,
> > Vasily                            mailto:[EMAIL PROTECTED]

encodings3.diff
Description: Binary data

Re: [Sqlalchemy-users] question about convert_unicode

Reply via email to