2009/3/21 AchipA <[email protected]> > > Oops, wrong copypaste, here is the correct one: > > In [1]: validator = IS_LENGTH(1) > > In [2]: validator('a') > Out[2]: ('a', None) > > In [3]: validator('aa') > Out[3]: ('aa', 'too long!') > > In [4]: validator('á') > Out[4]: ('\xc3\xa1', 'too long!') > > In [5]: validator('á'.decode('utf-8')) > Out[5]: (u'\xe1', None) > > In [6]: validator('ж'.decode('utf-8')) > Out[6]: (u'\u0436', None) > > I say Alexei found a bug :)
Yes - it would look like you are right.... I'll let Massimo digest this better after PyCon... - Yarko > > > On Mar 21, 3:37 pm, AchipA <[email protected]> wrote: > > >I was hoping that web2py could transparently communicate with > > >databases that are UTF8 encoded and that I would be able to do > > >operations on strings retrieved from databases without thinking about > > >their encodings. > > > > That is the goal. It will never be 100% as it is somewhat dabase/ > > version dependant. As you yourself write, it uses utf-8 encoded > > strings (which is the python 2.x norm and this won't change to unicode > > objects at least until web2py support for Python 3.0 arrives) and uses > > utf8 data in the database. That being told, a quick glance at the > > IS_LENGTH validator shows that it might not be entirely correctly > > using len(), I think Massimo should take a look at it. > > > > >>> len('a') > > 1 > > >>> len('á') > > 2 > > >>> len(u'á') > > 1 > > >>> len('á'.decode('utf-8')) > > > > 1 > > > > On Mar 21, 1:59 pm, Alexei Vinidiktov <[email protected]> > > wrote: > > > > > Unfortunately, due to the nature of the web application I'm planning > > > on using web2py for, I can't use a single-byte encoding for the > > > database or most tables. > > > > > The tables are going to store strings in many different languages of > the world. > > > > > I was hoping that web2py could transparently communicate with > > > databases that are UTF8 encoded and that I would be able to do > > > operations on strings retrieved from databases without thinking about > > > their encodings. > > > > > Does web2py retrieve strings from databases as unicode Python objects > > > or single-byte strings? I assume that it's the latter and the > > > single-byte strings are UTF-8 encoded. Is that so? > > > > > I'll have to look into that much more closely. > > > > > 21 марта 2009 г. 18:01 пользователь AchipA <[email protected]> > написал: > > > > > > Characters vs byte is possible (see unicode objects in python), but > > > > characters are problematic in databases (think record sizes, index > > > > structures, collation, etc). That's why most databases either 'cheat' > > > > by using byte counts in some places or suffer from a feature/ > > > > performance point. Also, there might be encodings that do not have a > > > > predefined maximum number of bytes per character so you cannot > predict > > > > the number of required bytes (a special case, I admit, but once you > go > > > > down the multibyte char path it's all or nothing). > > > > > > These are also the reasons why a lot of people with large databases > > > > prefer single-character encodings *inside* the database. So, for > > > > example if you deal with russian, you could use code page 1250 on the > > > > table level (note that you can still talk to the database in unicode, > > > > it's just a question of storage !). The important thing is to have > the > > > > data in correct format in the DB and avoid any conversions at all if > > > > possible (leave it to the database or the browser). > > > > > > On Mar 21, 10:28 am, Alexei Vinidiktov <[email protected]> > > > > wrote: > > > >> Hi Yarko, > > > > > >> Thanks for your help. > > > > > >> I've tried setting the name field length to 32, and it worked fine > > > >> with a name such as Олег Зимний. > > > > > >> It was to be expected though. > > > > > >> The question is, in what units should the field length be measured - > > > >> bytes or characters? > > > > > >> I think it should be measured in characters, because you never know > > > >> know many bytes a string with international characters will be. I > > > >> understand it may not be possible, so I'd like to know what's the > > > >> practical advice? Should I asign a string field double the number of > > > >> bytes the longest name (or other information stored in the field) > can > > > >> have? For instance, if I want a string field to contain the maximum > of > > > >> 20 characters, I should set it to 40 units (bytes). Is that correct? > > > > > >> I think this approach is error prone, because one can forget to do > so > > > >> every time one adds a string field to a db definition. > > > > > >> 21 марта 2009 г. 15:06 пользователь Yarko Tymciurak < > [email protected]> написал: > > > > > >> > Hi Alexei - > > > >> > web2py uses UTF8 internally; this means Cyrillica will encode in > 2-bytes per > > > >> > character > > > >> > (have a look > > > >> > athttp:// > en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design, > > > >> > orhttp:// > ru.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design) > > > > > >> > I copy/pasted "Oleg Zumniy" from your note into development copy > (sqlite) of > > > >> > the PyCon2009 conference server... > > > >> >>>> s=db(db.contacts.id>0).select() > > > >> >>>> s[0].name > > > >> > '\xd0\x9e\xd0\xbb\xd0\xb5\xd0\xb3 > \xd0\x97\xd0\xb8\xd0\xbc\xd0\xbd\xd0\xb8\xd0\xb9' > > > >> > As you can see - 2-bytes per character ... > > > >> > SQLField defaults are shown on p.138 - 'string', length=32 is > default. Try > > > >> > that, see if that works for you. > > > >> > Hope that helps. > > > >> > Regards, > > > >> > Yarko > > > >> > 2009/3/21 Alexei Vinidiktov <[email protected]> > > > > > >> >> Hello, > > > > > >> >> I'm just beginning to learn web2py. I've bought the web2py manual > and > > > >> >> am reading Chapter 1. > > > > > >> >> I've defined a model through the admin interface: > > > > > >> >> db = SQLDB('sqlite://storage.db') > > > >> >> db.define_table('contacts', > > > >> >> SQLField('name', 'string', length=20), > > > >> >> SQLField('phone', 'string', length=12)) > > > > > >> >> When I go to the admin interface to add some records, I can add > names > > > >> >> that are written with Latin characters just fine, but when I try > to > > > >> >> enter a name written with Cyrillic characters, I get an error > that > > > >> >> says that the name is too long, although it is not. > > > > > >> >> For example, if I enter the name Олег Зимний, which is 11 > characters > > > >> >> long, I get that error. > > > > > >> >> If I enter a short name such as Олег, the record is added fine. > > > > > >> >> The maximum length is set to 20 in the table definition and names > with > > > >> >> Latin characters whose length is up to 20 characters can be added > > > >> >> fine. > > > > > >> >> Is it a web2py bug? If it is, can it be easily fixed? > > > > > >> >> -- > > > >> >> Alexei Vinidiktov > > > > > >> -- > > > >> Alexei Vinidiktov > > > > > -- > > > Alexei Vinidiktov > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py Web Framework" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---

