Unfortunately, due to the nature of the web application I'm planning
on using web2py for, I can't use a single-byte encoding for the
database or most tables.

The tables are going to store strings in many different languages of the world.

I was hoping that web2py could transparently communicate with
databases that are UTF8 encoded and that I would be able to do
operations on strings retrieved from databases without thinking about
their encodings.

Does web2py retrieve strings from databases as unicode Python objects
or single-byte strings? I assume that it's the latter and the
single-byte strings are UTF-8 encoded. Is that so?

I'll have to look into that much more closely.

21 марта 2009 г. 18:01 пользователь AchipA <[email protected]> написал:
>
> Characters vs byte is possible (see unicode objects in python), but
> characters are problematic in databases (think record sizes, index
> structures, collation, etc). That's why most databases either 'cheat'
> by using byte counts in some places or suffer from a feature/
> performance point. Also, there might be encodings that do not have a
> predefined maximum number of bytes per character so you cannot predict
> the number of required bytes (a special case, I admit, but once you go
> down the multibyte char path it's all or nothing).
>
> These are also the reasons why a lot of people with large databases
> prefer single-character encodings *inside* the database. So, for
> example if you deal with russian, you could use code page 1250 on the
> table level (note that you can still talk to the database in unicode,
> it's just a question of storage !). The important thing is to have the
> data in  correct format in the DB and avoid any conversions at all if
> possible (leave it to the database or the browser).
>
> On Mar 21, 10:28 am, Alexei Vinidiktov <[email protected]>
> wrote:
>> Hi Yarko,
>>
>> Thanks for your help.
>>
>> I've tried setting the name field length to 32, and it worked fine
>> with a name such as Олег Зимний.
>>
>> It was to be expected though.
>>
>> The question is, in what units should the field length be measured -
>> bytes or characters?
>>
>> I think it should be measured in characters, because you never know
>> know many bytes a string with international characters will be.  I
>> understand it may not be possible, so I'd like to know what's the
>> practical advice? Should I asign a string field double the number of
>> bytes the longest name (or other information stored in the field) can
>> have? For instance, if I want a string field to contain the maximum of
>> 20 characters, I should set it to 40 units (bytes). Is that correct?
>>
>> I think this approach is error prone, because one can forget to do so
>> every time one adds a string field to a db definition.
>>
>> 21 марта 2009 г. 15:06 пользователь Yarko Tymciurak <[email protected]> 
>> написал:
>>
>>
>>
>> > Hi Alexei -
>> > web2py uses UTF8 internally; this means Cyrillica will encode in 2-bytes 
>> > per
>> > character
>> > (have a look
>> > at http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design,
>> > or http://ru.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design)
>>
>> > I copy/pasted "Oleg Zumniy" from your note into development copy (sqlite) 
>> > of
>> > the PyCon2009 conference server...
>> >>>> s=db(db.contacts.id>0).select()
>> >>>> s[0].name
>> > '\xd0\x9e\xd0\xbb\xd0\xb5\xd0\xb3 \xd0\x97\xd0\xb8\xd0\xbc\xd0\xbd\xd0\xb8\xd0\xb9'
>> > As you can see - 2-bytes per character ...
>> > SQLField defaults are shown on p.138 - 'string', length=32 is default.   
>> > Try
>> > that, see if that works for you.
>> > Hope that helps.
>> > Regards,
>> > Yarko
>> > 2009/3/21 Alexei Vinidiktov <[email protected]>
>>
>> >> Hello,
>>
>> >> I'm just beginning to learn web2py. I've bought the web2py manual and
>> >> am reading Chapter 1.
>>
>> >> I've defined a model through the admin interface:
>>
>> >> db = SQLDB('sqlite://storage.db')
>> >> db.define_table('contacts',
>> >>    SQLField('name', 'string', length=20),
>> >>    SQLField('phone', 'string', length=12))
>>
>> >> When I go to the admin interface to add some records, I can add names
>> >> that are written with Latin characters just fine, but when I try to
>> >> enter a name written with Cyrillic characters, I get an error that
>> >> says that the name is too long, although it is not.
>>
>> >> For example, if I enter the name Олег Зимний, which is 11 characters
>> >> long, I get that error.
>>
>> >> If I enter a short name such as Олег, the record is added fine.
>>
>> >> The maximum length is set to 20 in the table definition and names with
>> >> Latin characters whose length is up to 20 characters can be added
>> >> fine.
>>
>> >> Is it a web2py bug? If it is, can it be easily fixed?
>>
>> >> --
>> >> Alexei Vinidiktov
>>
>> --
>> Alexei Vinidiktov
> >
>



-- 
Alexei Vinidiktov

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to