>I was hoping that web2py could transparently communicate with
>databases that are UTF8 encoded and that I would be able to do
>operations on strings retrieved from databases without thinking about
>their encodings.
That is the goal. It will never be 100% as it is somewhat dabase/
version dependant. As you yourself write, it uses utf-8 encoded
strings (which is the python 2.x norm and this won't change to unicode
objects at least until web2py support for Python 3.0 arrives) and uses
utf8 data in the database. That being told, a quick glance at the
IS_LENGTH validator shows that it might not be entirely correctly
using len(), I think Massimo should take a look at it.
>>> len('a')
1
>>> len('á')
2
>>> len(u'á')
1
>>> len('á'.decode('utf-8'))
1
On Mar 21, 1:59 pm, Alexei Vinidiktov <[email protected]>
wrote:
> Unfortunately, due to the nature of the web application I'm planning
> on using web2py for, I can't use a single-byte encoding for the
> database or most tables.
>
> The tables are going to store strings in many different languages of the
> world.
>
> I was hoping that web2py could transparently communicate with
> databases that are UTF8 encoded and that I would be able to do
> operations on strings retrieved from databases without thinking about
> their encodings.
>
> Does web2py retrieve strings from databases as unicode Python objects
> or single-byte strings? I assume that it's the latter and the
> single-byte strings are UTF-8 encoded. Is that so?
>
> I'll have to look into that much more closely.
>
> 21 марта 2009 г. 18:01 пользователь AchipA <[email protected]> написал:
>
>
>
>
>
> > Characters vs byte is possible (see unicode objects in python), but
> > characters are problematic in databases (think record sizes, index
> > structures, collation, etc). That's why most databases either 'cheat'
> > by using byte counts in some places or suffer from a feature/
> > performance point. Also, there might be encodings that do not have a
> > predefined maximum number of bytes per character so you cannot predict
> > the number of required bytes (a special case, I admit, but once you go
> > down the multibyte char path it's all or nothing).
>
> > These are also the reasons why a lot of people with large databases
> > prefer single-character encodings *inside* the database. So, for
> > example if you deal with russian, you could use code page 1250 on the
> > table level (note that you can still talk to the database in unicode,
> > it's just a question of storage !). The important thing is to have the
> > data in correct format in the DB and avoid any conversions at all if
> > possible (leave it to the database or the browser).
>
> > On Mar 21, 10:28 am, Alexei Vinidiktov <[email protected]>
> > wrote:
> >> Hi Yarko,
>
> >> Thanks for your help.
>
> >> I've tried setting the name field length to 32, and it worked fine
> >> with a name such as Олег Зимний.
>
> >> It was to be expected though.
>
> >> The question is, in what units should the field length be measured -
> >> bytes or characters?
>
> >> I think it should be measured in characters, because you never know
> >> know many bytes a string with international characters will be. I
> >> understand it may not be possible, so I'd like to know what's the
> >> practical advice? Should I asign a string field double the number of
> >> bytes the longest name (or other information stored in the field) can
> >> have? For instance, if I want a string field to contain the maximum of
> >> 20 characters, I should set it to 40 units (bytes). Is that correct?
>
> >> I think this approach is error prone, because one can forget to do so
> >> every time one adds a string field to a db definition.
>
> >> 21 марта 2009 г. 15:06 пользователь Yarko Tymciurak <[email protected]>
> >> написал:
>
> >> > Hi Alexei -
> >> > web2py uses UTF8 internally; this means Cyrillica will encode in 2-bytes
> >> > per
> >> > character
> >> > (have a look
> >> > at http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design,
> >> > or http://ru.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design)
>
> >> > I copy/pasted "Oleg Zumniy" from your note into development copy
> >> > (sqlite) of
> >> > the PyCon2009 conference server...
> >> >>>> s=db(db.contacts.id>0).select()
> >> >>>> s[0].name
> >> > '\xd0\x9e\xd0\xbb\xd0\xb5\xd0\xb3
> >> > \xd0\x97\xd0\xb8\xd0\xbc\xd0\xbd\xd0\xb8\xd0\xb9'
> >> > As you can see - 2-bytes per character ...
> >> > SQLField defaults are shown on p.138 - 'string', length=32 is default.
> >> > Try
> >> > that, see if that works for you.
> >> > Hope that helps.
> >> > Regards,
> >> > Yarko
> >> > 2009/3/21 Alexei Vinidiktov <[email protected]>
>
> >> >> Hello,
>
> >> >> I'm just beginning to learn web2py. I've bought the web2py manual and
> >> >> am reading Chapter 1.
>
> >> >> I've defined a model through the admin interface:
>
> >> >> db = SQLDB('sqlite://storage.db')
> >> >> db.define_table('contacts',
> >> >> SQLField('name', 'string', length=20),
> >> >> SQLField('phone', 'string', length=12))
>
> >> >> When I go to the admin interface to add some records, I can add names
> >> >> that are written with Latin characters just fine, but when I try to
> >> >> enter a name written with Cyrillic characters, I get an error that
> >> >> says that the name is too long, although it is not.
>
> >> >> For example, if I enter the name Олег Зимний, which is 11 characters
> >> >> long, I get that error.
>
> >> >> If I enter a short name such as Олег, the record is added fine.
>
> >> >> The maximum length is set to 20 in the table definition and names with
> >> >> Latin characters whose length is up to 20 characters can be added
> >> >> fine.
>
> >> >> Is it a web2py bug? If it is, can it be easily fixed?
>
> >> >> --
> >> >> Alexei Vinidiktov
>
> >> --
> >> Alexei Vinidiktov
>
> --
> Alexei Vinidiktov
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"web2py Web Framework" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---