Characters vs byte is possible (see unicode objects in python), but characters are problematic in databases (think record sizes, index structures, collation, etc). That's why most databases either 'cheat' by using byte counts in some places or suffer from a feature/ performance point. Also, there might be encodings that do not have a predefined maximum number of bytes per character so you cannot predict the number of required bytes (a special case, I admit, but once you go down the multibyte char path it's all or nothing).
These are also the reasons why a lot of people with large databases prefer single-character encodings *inside* the database. So, for example if you deal with russian, you could use code page 1250 on the table level (note that you can still talk to the database in unicode, it's just a question of storage !). The important thing is to have the data in correct format in the DB and avoid any conversions at all if possible (leave it to the database or the browser). On Mar 21, 10:28 am, Alexei Vinidiktov <[email protected]> wrote: > Hi Yarko, > > Thanks for your help. > > I've tried setting the name field length to 32, and it worked fine > with a name such as Олег Зимний. > > It was to be expected though. > > The question is, in what units should the field length be measured - > bytes or characters? > > I think it should be measured in characters, because you never know > know many bytes a string with international characters will be. I > understand it may not be possible, so I'd like to know what's the > practical advice? Should I asign a string field double the number of > bytes the longest name (or other information stored in the field) can > have? For instance, if I want a string field to contain the maximum of > 20 characters, I should set it to 40 units (bytes). Is that correct? > > I think this approach is error prone, because one can forget to do so > every time one adds a string field to a db definition. > > 21 марта 2009 г. 15:06 пользователь Yarko Tymciurak <[email protected]> > написал: > > > > > Hi Alexei - > > web2py uses UTF8 internally; this means Cyrillica will encode in 2-bytes per > > character > > (have a look > > at http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design, > > or http://ru.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design) > > > I copy/pasted "Oleg Zumniy" from your note into development copy (sqlite) of > > the PyCon2009 conference server... > >>>> s=db(db.contacts.id>0).select() > >>>> s[0].name > > '\xd0\x9e\xd0\xbb\xd0\xb5\xd0\xb3 \xd0\x97\xd0\xb8\xd0\xbc\xd0\xbd\xd0\xb8\xd0\xb9' > > As you can see - 2-bytes per character ... > > SQLField defaults are shown on p.138 - 'string', length=32 is default. Try > > that, see if that works for you. > > Hope that helps. > > Regards, > > Yarko > > 2009/3/21 Alexei Vinidiktov <[email protected]> > > >> Hello, > > >> I'm just beginning to learn web2py. I've bought the web2py manual and > >> am reading Chapter 1. > > >> I've defined a model through the admin interface: > > >> db = SQLDB('sqlite://storage.db') > >> db.define_table('contacts', > >> SQLField('name', 'string', length=20), > >> SQLField('phone', 'string', length=12)) > > >> When I go to the admin interface to add some records, I can add names > >> that are written with Latin characters just fine, but when I try to > >> enter a name written with Cyrillic characters, I get an error that > >> says that the name is too long, although it is not. > > >> For example, if I enter the name Олег Зимний, which is 11 characters > >> long, I get that error. > > >> If I enter a short name such as Олег, the record is added fine. > > >> The maximum length is set to 20 in the table definition and names with > >> Latin characters whose length is up to 20 characters can be added > >> fine. > > >> Is it a web2py bug? If it is, can it be easily fixed? > > >> -- > >> Alexei Vinidiktov > > -- > Alexei Vinidiktov --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py Web Framework" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---

