Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-06-12 Thread ysamlan

@mike orr

$ sudo apt-get install command-not-found
$ mysql_config
The program 'mysql_config' is currently not installed.  You can
install it by typing:
apt-get install libmysqlclient15-dev

$ sudo apt-get install libmysqlclient15-dev

or
$ sudo apt-get install apt-file
$ apt-file update
$ apt-file search mysql_config
libmysqlclient15-dev: usr/bin/mysql_config
...

The problem is that a default repository install of mysql doesn't
include the development files, and mysql_config is really just a
program to point to the right configuration options and development
headers for things that need to link to them (like the compilation of
mysqldb).

Would be nicer if it said couldn't find mysql_config; do you have the
mysql client development headers installed? but I'm not sure it's
worth submitting a bugreport/patch just for that.

...
 I have MySQLdb 1.2.1 because Kubuntu is still on the older version and
 I can't get the newer one to compile:

 $ easy_install MySQL-python
   File /tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py,
 line 24, in mysql_config
 EnvironmentError: mysql_config not found

 Thank you, now what is mysql_config?
 ...
 Mike Orr [EMAIL PROTECTED]


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-31 Thread Mike Orr

On 5/29/07, David Smith [EMAIL PROTECTED] wrote:
 There was one point you made that I believe is inaccurate and so did not
 include it in the above. Where you said that if you want to use the
 convert_unicode option, you must know what encoding the database is in, but
 my tests don't suggest that.

Oh, you're right.  SQLAlchemy checks the character set of the
connection.  I thought of that after I sent my email.

2007-05-31 14:11:35,356 INFO sqlalchemy.engine.base.Engine.0x..8c show
variables like 'character_set_results'

 1) The MySQL specific option of use_unicode is not needed anymore. The MySQL
 specific charset option also is deprecated.

I suppose.  I'm still holding on to use_unicode for now though.  The
important thing is to use either convert_unicode or use_unicode.

The MySQL charset option wouldn't work for me at all; it didn't
recognize utf8 as a valid choice.

-- 
Mike Orr [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-29 Thread David Smith

jj, I actually agree strongly with you regarding the potential dangers. My
counter argument is that if you specify the encoding of the database in the
config file to a different encoding than is in the mysql config file, don't
you risk double-encoding anyway? Now that the bug is fixed in MySQLdb, and
no other database needs such options, I'd prefer to promote the Unicode
field as the proper way forward. SQLAlchemy should, via the MySQLdb module,
determine at runtime the encoding of the database and translate from that
encoding to unicode objects as necessary (as should any other ORM).

Cheers,
David

Shannon -jj Behrens wrote:

 
 I'm not necessarily disagreeing with you, but I do think it's really
 bad practice to have your database confused about the encoding of the
 data that it's storing.  At one point, I had things messed up and the
 data in the database was double encoded.  Sure, it worked as long as I
 was using SQLAlchemy, but if I ever decided to use something else, I
 would have been in really bad shape.  Fortunately, they fixed that bug
 in MySQLdb.
 
 Happy Hacking!
 -jj




--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-29 Thread Max Ischenko
Hi David,

On 5/29/07, David Smith [EMAIL PROTECTED] wrote:


 Max, I have no problems by only using Unicode field-types in my model;
 that
 is, my sqlalchemy connection string does not contain anything related to
 UTF-8. I haven't investigated my MySQL setup so thoroughly but it seems to
 work with MySQL 4 and 5 and stable and development versions of mysqldb. It
 is quite possible that the data being stored in the database is in
 iso-8859-1 or whatever MySQL uses by default but SQLAlchemy is
 automatically manipulating the binary stirngs into Unicode objects based
 on
 the encoding of the database.


My exprience shows that it may be necessarily to specify encoding  for mysql
in dburi.

The reason is that mysql server may be configured with default encoding that
doesn't match the database you're connecting to. E.g. if you open mysql
console and type \s you can see something like latin1 while database is
utf8-encoded. In a situation like this you do need to tell db encoding
explicitly. So I guess it makes sense to always be explicit here.

Ideally, I'd like to see a best practice of, basically, stop worrying
 specifically about MySQL database encoding and all of its nastiness and
 simply use Unicode fields instead of String fields for data that holds
 natural language characters.


You surely want to use Unicode instead of String. It will do the right thing
(converting to and from unicode objects). Also, if the setup is wrong,
you'll discover it sooner when it'll try to decode utf8 data assuming latin1
charset.

Max.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-29 Thread David Smith

Hi Max, thanks for the quick response,

Max Ischenko wrote:


 My exprience shows that it may be necessarily to specify encoding  for
 mysql in dburi.
 
 The reason is that mysql server may be configured with default encoding
 that doesn't match the database you're connecting to. E.g. if you open
 mysql console and type \s you can see something like latin1 while database
 is utf8-encoded. In a situation like this you do need to tell db encoding
 explicitly. So I guess it makes sense to always be explicit here.

I guess I should have been more explicit. The point is that it shouldn't
matter if your database uses e.g. iso-8859-1 internally, as long as it can
hold the bytes. UTF-8 is just another encoding, and the point is that it
should be the job of SQLAlchemy (or insert your other platform's ORM) to
convert into the database's encoding and back out to unencoded Python
unicode objects. Thus, if you want the DB internal data to be stored in
UTF-8, you should configure that in your database server settings, not in
your application.

 
 Ideally, I'd like to see a best practice of, basically, stop worrying
 specifically about MySQL database encoding and all of its nastiness and
 simply use Unicode fields instead of String fields for data that holds
 natural language characters.
 
 
 You surely want to use Unicode instead of String. It will do the right
 thing (converting to and from unicode objects). Also, if the setup is
 wrong, you'll discover it sooner when it'll try to decode utf8 data
 assuming latin1 charset.

This again is the same point as above; SQLAlchemy should determine that the
DB is not utf-8 and is indeed latin1 or iso-8859-1 or whatever.

Cheers,
David



--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-29 Thread Shannon -jj Behrens

On 5/29/07, David Smith [EMAIL PROTECTED] wrote:
 jj, I actually agree strongly with you regarding the potential dangers. My
 counter argument is that if you specify the encoding of the database in the
 config file to a different encoding than is in the mysql config file, don't
 you risk double-encoding anyway?

Double encoding should *theoretically* never happen because MySQLdb
knows if it has a unicode object or a UTF-8 string.

 Now that the bug is fixed in MySQLdb, and
 no other database needs such options, I'd prefer to promote the Unicode
 field as the proper way forward. SQLAlchemy should, via the MySQLdb module,
 determine at runtime the encoding of the database and translate from that
 encoding to unicode objects as necessary (as should any other ORM).

To tell you the truth, I don't feel strong either way.  As long as *I*
the application developer don't need to do the encoding, and as long
as the database and I agree about the encoding, it doesn't matter to
me who does it.

All the database drivers SHOULD support encoding for you.  After all,
not everyone uses SQLAlchemy.  Personally, I think this should have
gone in the DBAPI spec.  However, if someone wants to let SQLAlchemy
do the work, I don't have any objections.

Happy Hacking!
-jj

 Shannon -jj Behrens wrote:

 
  I'm not necessarily disagreeing with you, but I do think it's really
  bad practice to have your database confused about the encoding of the
  data that it's storing.  At one point, I had things messed up and the
  data in the database was double encoded.  Sure, it worked as long as I
  was using SQLAlchemy, but if I ever decided to use something else, I
  would have been in really bad shape.  Fortunately, they fixed that bug
  in MySQLdb.
 
  Happy Hacking!
  -jj




 



-- 
http://jjinux.blogspot.com/

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-29 Thread Max Ischenko
On 5/29/07, David Smith [EMAIL PROTECTED] wrote:


  My exprience shows that it may be necessarily to specify encoding  for
  mysql in dburi.
 
  The reason is that mysql server may be configured with default encoding
  that doesn't match the database you're connecting to. E.g. if you open
  mysql console and type \s you can see something like latin1 while
 database
  is utf8-encoded. In a situation like this you do need to tell db
 encoding
  explicitly. So I guess it makes sense to always be explicit here.

 I guess I should have been more explicit. The point is that it shouldn't
 matter if your database uses e.g. iso-8859-1 internally, as long as it can
 hold the bytes. UTF-8 is just another encoding, and the point is that it
 should be the job of SQLAlchemy (or insert your other platform's ORM) to
 convert into the database's encoding and back out to unencoded Python
 unicode objects. Thus, if you want the DB internal data to be stored in
 UTF-8, you should configure that in your database server settings, not in
 your application.


Hmm. Not sure I agree completely.

One database server supports multiple clients / databases which means every
one may have different settings. Encoding spec is part of dburi syntax in
sqlalchemy, along with other connections settings, such as username. This
seems right to me.

OTOH, the application doesn't need to know database encoding, it's just an
implementation detail. UnicodeString gives you precisely this.


 You surely want to use Unicode instead of String. It will do the right
  thing (converting to and from unicode objects). Also, if the setup is
  wrong, you'll discover it sooner when it'll try to decode utf8 data
  assuming latin1 charset.

 This again is the same point as above; SQLAlchemy should determine that
 the
 DB is not utf-8 and is indeed latin1 or iso-8859-1 or whatever.


I don't want the tool to outsmart me. ;)

Max.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-29 Thread Mike Orr

The first thing is to see what MySQL thinks its server and client
character sets are:

$ paster shell development.ini
 import pylons.database
 engine = pylons.database.create_engine()
 engine.execute(SHOW VARIABLES).fetchall()

Look for the variables that start with character_set_ and
collation_.  On my workstation they show:

 (u'character_set_client', u'utf8'),
 (u'character_set_connection', u'utf8'),
 (u'character_set_database', u'utf8'),
 (u'character_set_filesystem', u'binary'),
 (u'character_set_results', u'utf8'),
 (u'character_set_server', u'utf8'),
 (u'character_set_system', u'utf8'),
 (u'character_sets_dir', u'/usr/share/mysql/charsets/'),
 (u'collation_connection', u'utf8_general_ci'),
 (u'collation_database', u'utf8_general_ci'),
 (u'collation_server', u'utf8_general_ci'),

Then you have a choice of ?use_unicode=1 (a MySQLdb option in the DSN) or:

create_engine(dsn, convert_unicode=True, encoding=utf8)

SQLAlchemy wants to standardize on 'convert_unicode' because it's the
same for all engines, but currently there are some problems with that:

1) You have to know what character set the MySQL connection is.

2) If you're using pylons.database.session_context, you can't add
arguments to the create_engine call without hacking pylons.database or
reimplementing the functions yourself.  Hopefully this will be
improved in a future version of Pylons -- as soon as we figure out
what to do.

3) convert_unicode works at the Table level, so raw engine.execute()
queries will still be 'str' strings.

* * * *
I have MySQLdb 1.2.1 because Kubuntu is still on the older version and
I can't get the newer one to compile:

$ easy_install MySQL-python
  File /tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py,
line 24, in mysql_config
EnvironmentError: mysql_config not found

Thank you, now what is mysql_config?

* * * *
With autoloaded tables you'll get unicode values if you use either
use_unicode or convert_unicode, or str values if you don't.  With
declared columns you'll also have to use the Unicode column type.

I had to use a my.cnf file to get the client and server charsets to
match.  Otherwise I was getting:

 (u'character_set_client', u'latin1'),
 (u'character_set_connection', u'latin1'),
 (u'character_set_database', u'utf8'),
 (u'character_set_filesystem', u'binary'),
 (u'character_set_results', u'latin1'),
 (u'character_set_server', u'utf8'),
 (u'character_set_system', u'utf8'),
 (u'character_sets_dir', u'/usr/share/mysql/charsets/'),
 (u'collation_connection', u'latin1_swedish_ci'),
 (u'collation_database', u'utf8_general_ci'),
 (u'collation_server', u'utf8_general_ci'),

The my.cnf is a symlink to /etc/mysql/my.cnf and contains among other things:

[client]
default-character-set  = utf8

[mysqld]
character-set-server = utf8
default-character-set = utf8

The DSN looks like this:

sqlalchemy.dbapi =
mysql://...?read_default_file=%(here)s/my.cnfuse_unicode=1

Unfortunately MySQLdb does not raise an exception if the file is
missing, so it's worth creating a unit test that checks whether all
the character sets are correct.  I haven't tried that yet.

* * * *
If you have existing records in the database which are in a different
charset than what the column purports to be, and you want to keep the
actual values as is, change the column type to BINARY in MySQL, then
change it to the desired charset:

http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html

If you want to change the data to the charset the column purports to
be, write a Python program that connects WITHOUT Unicode support, read
the values, decode them from the source charset, encode them into the
destination charset, and write them back to the database.

With mysql://...?use_unicode=1 and reflected tables, I get Unicode
values.  Without use_unicode, I get string values.  If I hack
pylons.database to pass 'convert_unicode=True' to create_engine() in a
way that it'll be propagated to the session_context,  I get Unicode
values again.  Curiously, in all three cases the column instances are
MSString with convert_unicode=False:

Column(u'name',MSString(length=255,convert_unicode=False),nullable=False)

* * * *
The following didn't work:

mysql://...?use_unicode=1charset=utf8

sqlalchemy.exceptions.DBAPIError: (Connection failed)
(OperationalError) (2019, 'Can\'t initialize character set utf8
(path: /usr/share/mysql/charsets/)')

This may be because I'm using the older MySQLdb, or maybe it's a bug
Ubuntu's packaging of MySQL.

-- 
Mike Orr [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-29 Thread David Smith

Hi Mike,

Thank you for all the details. I've seen the merits of the arguments and I
guess the conclusion is these points:

1) The MySQL specific option of use_unicode is not needed anymore. The MySQL
specific charset option also is deprecated.

2) The SQLAlchemy convert_unicode option is recommended, and is needed when
you're autoloading tables and you want to get unicode objects. It obsoletes
the use_unicode option. Correspondingly, the SQLAlchemy encoding option
obsoletes the MySQL specific charset option.

3) No options will help you when doing raw engine.execute operations; you
must convert for yourself.

4) You should use Unicode fields inside of your application in order for
your internal encoding/decoding logic to not depend on the database
configuration. I.e., whether you're using convert_unicode or not, when
using Unicode field you will always get a unicode object from SQLAlchemy.

5) Pylons does not properly pass the convert_unicode option to
create_engine. This has been discussed on this list in many forms, most
recently by Mike himself, and everyone seems to be agreed it needs fixing
and soon. I've created ticket #250 for this.
 
There was one point you made that I believe is inaccurate and so did not
include it in the above. Where you said that if you want to use the
convert_unicode option, you must know what encoding the database is in, but
my tests don't suggest that. My MySQL settings are currently:

character_set_client latin1
character_set_connection latin1
character_set_database latin1
character_set_filesystem binary
character_set_results latin1
character_set_server latin1
character_set_system utf8
character_sets_dir /usr/share/mysql/charsets/
collation_connection latin1_swedish_ci
collation_database latin1_swedish_ci
collation_server latin1_swedish_ci

which are the defaults on Debian, and I'm happy in my ignorance. AFAIK as
long as the character_set_system is utf-8, the data itself is stored in
utf-8, and other variables like client and server encoding can be changed
at will to force MySQL (client or server respectively) to convert from
utf-8 to whatever else.

I've verified that with the above MySQL settings and only using Unicode
field types, no covert_unicode or use_unicode or any other settings, the
data in the DB is utf-8 and I get unicode objects and everything is groovy.

That suggests to me that if you really care, you should set the backend
encoding to utf-8 or whatever with the SQLAlchemy encoding settings, but it
is not necessary nor especially recommended.

Does that tie up the loose ends on this discussion?

Thanks again,
David

Mike Orr wrote:

 
 The first thing is to see what MySQL thinks its server and client
 character sets are:
 
 $ paster shell development.ini
 import pylons.database
 engine = pylons.database.create_engine()
 engine.execute(SHOW VARIABLES).fetchall()
 
 Look for the variables that start with character_set_ and
 collation_.  On my workstation they show:
 
  (u'character_set_client', u'utf8'),
  (u'character_set_connection', u'utf8'),
  (u'character_set_database', u'utf8'),
  (u'character_set_filesystem', u'binary'),
  (u'character_set_results', u'utf8'),
  (u'character_set_server', u'utf8'),
  (u'character_set_system', u'utf8'),
  (u'character_sets_dir', u'/usr/share/mysql/charsets/'),
  (u'collation_connection', u'utf8_general_ci'),
  (u'collation_database', u'utf8_general_ci'),
  (u'collation_server', u'utf8_general_ci'),
 
 Then you have a choice of ?use_unicode=1 (a MySQLdb option in the DSN)
 or:
 
 create_engine(dsn, convert_unicode=True, encoding=utf8)
 
 SQLAlchemy wants to standardize on 'convert_unicode' because it's the
 same for all engines, but currently there are some problems with that:
 
 1) You have to know what character set the MySQL connection is.
 
 2) If you're using pylons.database.session_context, you can't add
 arguments to the create_engine call without hacking pylons.database or
 reimplementing the functions yourself.  Hopefully this will be
 improved in a future version of Pylons -- as soon as we figure out
 what to do.
 
 3) convert_unicode works at the Table level, so raw engine.execute()
 queries will still be 'str' strings.
 
 * * * *
 I have MySQLdb 1.2.1 because Kubuntu is still on the older version and
 I can't get the newer one to compile:
 
 $ easy_install MySQL-python
   File /tmp/easy_install-p8zEac/MySQL-python-1.2.2/setup_posix.py,
 line 24, in mysql_config
 EnvironmentError: mysql_config not found
 
 Thank you, now what is mysql_config?
 
 * * * *
 With autoloaded tables you'll get unicode values if you use either
 use_unicode or convert_unicode, or str values if you don't.  With
 declared columns you'll also have to use the Unicode column type.
 
 I had to use a my.cnf file to get the client and server charsets to
 match.  Otherwise I was getting:
 
  (u'character_set_client', u'latin1'),
  (u'character_set_connection', u'latin1'),
  (u'character_set_database', u'utf8'),
  

Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-11 Thread Max Ischenko
On 5/11/07, andyprog [EMAIL PROTECTED] wrote:


 Hi all,

 (I'm using Pylons for the first time, so pls bear with me...)

 I'm setting up a Pylons app with Mako + SQLAlchemy + MySQL + UTF-8
 support
 -- but I've had trouble finding one place with clear instructions on
 how to do this,
 so I've gathered pieces of info from here and there, but this, coupled
 with the fact
 that some instructions are outdated, and others not so clear, has left
 me quite confused... :(


It is outdated indeed.

If you have the latest stable version of Pylons, SA and MySQLdb (1.2.2) the
following should be enough:

sqlalchemy.dburi = mysql://...?charset=utf8

And you should get properly decoded unicode strings from SA.

Btw, make sure your mysql db really contains utf8-encoded data (use mysql
console in utf8 locale to poke around and show full columns to check
collation.

Max.

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---



Re: SQLAlchemy + MySQL + UTF-8 support - how?

2007-05-11 Thread andyprog

Thank you for your answers, Max and BJPirt.

I'm going to set things up and I'll let you know how it went.

Andy P


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
pylons-discuss group.
To post to this group, send email to pylons-discuss@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~--~~~~--~~--~--~---