Oops, wrong copypaste, here is the correct one:
In [1]: validator = IS_LENGTH(1)
In [2]: validator('a')
Out[2]: ('a', None)
In [3]: validator('aa')
Out[3]: ('aa', 'too long!')
In [4]: validator('á')
Out[4]: ('\xc3\xa1', 'too long!')
In [5]: validator('á'.decode('utf-8'))
Out[5]: (u'\xe1', None)
In [6]: validator('ж'.decode('utf-8'))
Out[6]: (u'\u0436', None)
I say Alexei found a bug :)
On Mar 21, 3:37 pm, AchipA <[email protected]> wrote:
> >I was hoping that web2py could transparently communicate with
> >databases that are UTF8 encoded and that I would be able to do
> >operations on strings retrieved from databases without thinking about
> >their encodings.
>
> That is the goal. It will never be 100% as it is somewhat dabase/
> version dependant. As you yourself write, it uses utf-8 encoded
> strings (which is the python 2.x norm and this won't change to unicode
> objects at least until web2py support for Python 3.0 arrives) and uses
> utf8 data in the database. That being told, a quick glance at the
> IS_LENGTH validator shows that it might not be entirely correctly
> using len(), I think Massimo should take a look at it.
>
> >>> len('a')
> 1
> >>> len('á')
> 2
> >>> len(u'á')
> 1
> >>> len('á'.decode('utf-8'))
>
> 1
>
> On Mar 21, 1:59 pm, Alexei Vinidiktov <[email protected]>
> wrote:
>
> > Unfortunately, due to the nature of the web application I'm planning
> > on using web2py for, I can't use a single-byte encoding for the
> > database or most tables.
>
> > The tables are going to store strings in many different languages of the
> > world.
>
> > I was hoping that web2py could transparently communicate with
> > databases that are UTF8 encoded and that I would be able to do
> > operations on strings retrieved from databases without thinking about
> > their encodings.
>
> > Does web2py retrieve strings from databases as unicode Python objects
> > or single-byte strings? I assume that it's the latter and the
> > single-byte strings are UTF-8 encoded. Is that so?
>
> > I'll have to look into that much more closely.
>
> > 21 марта 2009 г. 18:01 пользователь AchipA <[email protected]> написал:
>
> > > Characters vs byte is possible (see unicode objects in python), but
> > > characters are problematic in databases (think record sizes, index
> > > structures, collation, etc). That's why most databases either 'cheat'
> > > by using byte counts in some places or suffer from a feature/
> > > performance point. Also, there might be encodings that do not have a
> > > predefined maximum number of bytes per character so you cannot predict
> > > the number of required bytes (a special case, I admit, but once you go
> > > down the multibyte char path it's all or nothing).
>
> > > These are also the reasons why a lot of people with large databases
> > > prefer single-character encodings *inside* the database. So, for
> > > example if you deal with russian, you could use code page 1250 on the
> > > table level (note that you can still talk to the database in unicode,
> > > it's just a question of storage !). The important thing is to have the
> > > data in correct format in the DB and avoid any conversions at all if
> > > possible (leave it to the database or the browser).
>
> > > On Mar 21, 10:28 am, Alexei Vinidiktov <[email protected]>
> > > wrote:
> > >> Hi Yarko,
>
> > >> Thanks for your help.
>
> > >> I've tried setting the name field length to 32, and it worked fine
> > >> with a name such as Олег Зимний.
>
> > >> It was to be expected though.
>
> > >> The question is, in what units should the field length be measured -
> > >> bytes or characters?
>
> > >> I think it should be measured in characters, because you never know
> > >> know many bytes a string with international characters will be. I
> > >> understand it may not be possible, so I'd like to know what's the
> > >> practical advice? Should I asign a string field double the number of
> > >> bytes the longest name (or other information stored in the field) can
> > >> have? For instance, if I want a string field to contain the maximum of
> > >> 20 characters, I should set it to 40 units (bytes). Is that correct?
>
> > >> I think this approach is error prone, because one can forget to do so
> > >> every time one adds a string field to a db definition.
>
> > >> 21 марта 2009 г. 15:06 пользователь Yarko Tymciurak <[email protected]>
> > >> написал:
>
> > >> > Hi Alexei -
> > >> > web2py uses UTF8 internally; this means Cyrillica will encode in
> > >> > 2-bytes per
> > >> > character
> > >> > (have a look
> > >> > athttp://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design,
> > >> > orhttp://ru.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design)
>
> > >> > I copy/pasted "Oleg Zumniy" from your note into development copy
> > >> > (sqlite) of
> > >> > the PyCon2009 conference server...
> > >> >>>> s=db(db.contacts.id>0).select()
> > >> >>>> s[0].name
> > >> > '\xd0\x9e\xd0\xbb\xd0\xb5\xd0\xb3
> > >> > \xd0\x97\xd0\xb8\xd0\xbc\xd0\xbd\xd0\xb8\xd0\xb9'
> > >> > As you can see - 2-bytes per character ...
> > >> > SQLField defaults are shown on p.138 - 'string', length=32 is default.
> > >> > Try
> > >> > that, see if that works for you.
> > >> > Hope that helps.
> > >> > Regards,
> > >> > Yarko
> > >> > 2009/3/21 Alexei Vinidiktov <[email protected]>
>
> > >> >> Hello,
>
> > >> >> I'm just beginning to learn web2py. I've bought the web2py manual and
> > >> >> am reading Chapter 1.
>
> > >> >> I've defined a model through the admin interface:
>
> > >> >> db = SQLDB('sqlite://storage.db')
> > >> >> db.define_table('contacts',
> > >> >> SQLField('name', 'string', length=20),
> > >> >> SQLField('phone', 'string', length=12))
>
> > >> >> When I go to the admin interface to add some records, I can add names
> > >> >> that are written with Latin characters just fine, but when I try to
> > >> >> enter a name written with Cyrillic characters, I get an error that
> > >> >> says that the name is too long, although it is not.
>
> > >> >> For example, if I enter the name Олег Зимний, which is 11 characters
> > >> >> long, I get that error.
>
> > >> >> If I enter a short name such as Олег, the record is added fine.
>
> > >> >> The maximum length is set to 20 in the table definition and names with
> > >> >> Latin characters whose length is up to 20 characters can be added
> > >> >> fine.
>
> > >> >> Is it a web2py bug? If it is, can it be easily fixed?
>
> > >> >> --
> > >> >> Alexei Vinidiktov
>
> > >> --
> > >> Alexei Vinidiktov
>
> > --
> > Alexei Vinidiktov
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"web2py Web Framework" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---