Unfortunately, due to the nature of the web application I'm planning on using web2py for, I can't use a single-byte encoding for the database or most tables.
The tables are going to store strings in many different languages of the world. I was hoping that web2py could transparently communicate with databases that are UTF8 encoded and that I would be able to do operations on strings retrieved from databases without thinking about their encodings. Does web2py retrieve strings from databases as unicode Python objects or single-byte strings? I assume that it's the latter and the single-byte strings are UTF-8 encoded. Is that so? I'll have to look into that much more closely. 21 марта 2009 г. 18:01 пользователь AchipA <[email protected]> написал: > > Characters vs byte is possible (see unicode objects in python), but > characters are problematic in databases (think record sizes, index > structures, collation, etc). That's why most databases either 'cheat' > by using byte counts in some places or suffer from a feature/ > performance point. Also, there might be encodings that do not have a > predefined maximum number of bytes per character so you cannot predict > the number of required bytes (a special case, I admit, but once you go > down the multibyte char path it's all or nothing). > > These are also the reasons why a lot of people with large databases > prefer single-character encodings *inside* the database. So, for > example if you deal with russian, you could use code page 1250 on the > table level (note that you can still talk to the database in unicode, > it's just a question of storage !). The important thing is to have the > data in correct format in the DB and avoid any conversions at all if > possible (leave it to the database or the browser). > > On Mar 21, 10:28 am, Alexei Vinidiktov <[email protected]> > wrote: >> Hi Yarko, >> >> Thanks for your help. >> >> I've tried setting the name field length to 32, and it worked fine >> with a name such as Олег Зимний. >> >> It was to be expected though. >> >> The question is, in what units should the field length be measured - >> bytes or characters? >> >> I think it should be measured in characters, because you never know >> know many bytes a string with international characters will be. I >> understand it may not be possible, so I'd like to know what's the >> practical advice? Should I asign a string field double the number of >> bytes the longest name (or other information stored in the field) can >> have? For instance, if I want a string field to contain the maximum of >> 20 characters, I should set it to 40 units (bytes). Is that correct? >> >> I think this approach is error prone, because one can forget to do so >> every time one adds a string field to a db definition. >> >> 21 марта 2009 г. 15:06 пользователь Yarko Tymciurak <[email protected]> >> написал: >> >> >> >> > Hi Alexei - >> > web2py uses UTF8 internally; this means Cyrillica will encode in 2-bytes >> > per >> > character >> > (have a look >> > at http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design, >> > or http://ru.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design) >> >> > I copy/pasted "Oleg Zumniy" from your note into development copy (sqlite) >> > of >> > the PyCon2009 conference server... >> >>>> s=db(db.contacts.id>0).select() >> >>>> s[0].name >> > '\xd0\x9e\xd0\xbb\xd0\xb5\xd0\xb3 \xd0\x97\xd0\xb8\xd0\xbc\xd0\xbd\xd0\xb8\xd0\xb9' >> > As you can see - 2-bytes per character ... >> > SQLField defaults are shown on p.138 - 'string', length=32 is default. >> > Try >> > that, see if that works for you. >> > Hope that helps. >> > Regards, >> > Yarko >> > 2009/3/21 Alexei Vinidiktov <[email protected]> >> >> >> Hello, >> >> >> I'm just beginning to learn web2py. I've bought the web2py manual and >> >> am reading Chapter 1. >> >> >> I've defined a model through the admin interface: >> >> >> db = SQLDB('sqlite://storage.db') >> >> db.define_table('contacts', >> >> SQLField('name', 'string', length=20), >> >> SQLField('phone', 'string', length=12)) >> >> >> When I go to the admin interface to add some records, I can add names >> >> that are written with Latin characters just fine, but when I try to >> >> enter a name written with Cyrillic characters, I get an error that >> >> says that the name is too long, although it is not. >> >> >> For example, if I enter the name Олег Зимний, which is 11 characters >> >> long, I get that error. >> >> >> If I enter a short name such as Олег, the record is added fine. >> >> >> The maximum length is set to 20 in the table definition and names with >> >> Latin characters whose length is up to 20 characters can be added >> >> fine. >> >> >> Is it a web2py bug? If it is, can it be easily fixed? >> >> >> -- >> >> Alexei Vinidiktov >> >> -- >> Alexei Vinidiktov > > > -- Alexei Vinidiktov --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "web2py Web Framework" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/web2py?hl=en -~----------~----~----~----~------~----~------~--~---

