2009/3/21 Alexei Vinidiktov <[email protected]>

>
> The thing is the project that I'm intending to use web2py for is a web
> application for language learners, and I need to be sure that as many
> languages as possible are correctly treated by the application.
>
> So, I don't think it would be safe to use a Russian character for
> calculating the length of a field as in charlen = lambda n:
> n*len('л').


>From the link I sent (
http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design,)<http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design>

3 bytes covers "the basic multilingual plane" which covers all characters in
common use.  four bytes are needed for characters.... which are rarely used
in practice."

I think you can probably start w/ 3 byte assumption, most times that will be
more than you need, so statistically, the rarely used characters either will
not come into play at all, or will fit regardless.   You can collect data
(my guess is *3 will be too much anyway).

Looking forward to hearing more about this interesting project!

Regards,
Yarko

>
>
> I'm not an advanced Python programmer, so if I'm wrong, please correct me.
>
> 21 марта 2009 г. 17:15 пользователь Yarko Tymciurak <[email protected]>
> написал:
> > Hi Alexei -
> > Since UTF8 is variable length, and data is cheap, you can be generous.
> > when a field is too short, you invariably have at least an unhappy
> customer
> > by some measure.
> > You saw your test  name was not 22 bytes, but 21 ... so 2x is mildly
> > conservative.  You should be ok w/ something like:
> > charlen = lambda n: n*len('л')
> > db.define_table( 'mytable',
> >      ...
> >      SQLField( 'something', length=charlen(32) ),
> >      ...
> >
> >
> > This may be a good pattern to use regardless...  what do you think?
> >
> > Regards,
> > Yarko
> >
> >
> >
> > 2009/3/21 Alexei Vinidiktov <[email protected]>
> >>
> >> Hi Yarko,
> >>
> >> Thanks for your help.
> >>
> >> I've tried setting the name field length to 32, and it worked fine
> >> with a name such as Олег Зимний.
> >>
> >> It was to be expected though.
> >>
> >> The question is, in what units should the field length be measured -
> >> bytes or characters?
> >>
> >> I think it should be measured in characters, because you never know
> >> know many bytes a string with international characters will be.  I
> >> understand it may not be possible, so I'd like to know what's the
> >> practical advice? Should I asign a string field double the number of
> >> bytes the longest name (or other information stored in the field) can
> >> have? For instance, if I want a string field to contain the maximum of
> >> 20 characters, I should set it to 40 units (bytes). Is that correct?
> >>
> >> I think this approach is error prone, because one can forget to do so
> >> every time one adds a string field to a db definition.
> >>
> >> 21 марта 2009 г. 15:06 пользователь Yarko Tymciurak <[email protected]>
> >> написал:
> >> > Hi Alexei -
> >> > web2py uses UTF8 internally; this means Cyrillica will encode in
> 2-bytes
> >> > per
> >> > character
> >> > (have a look
> >> > at
> http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design,
> >> > or
> http://ru.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design)
> >> >
> >> > I copy/pasted "Oleg Zumniy" from your note into development copy
> >> > (sqlite) of
> >> > the PyCon2009 conference server...
> >> >>>> s=db(db.contacts.id>0).select()
> >> >>>> s[0].name
> >> >
> >> >
> '\xd0\x9e\xd0\xbb\xd0\xb5\xd0\xb3 
> \xd0\x97\xd0\xb8\xd0\xbc\xd0\xbd\xd0\xb8\xd0\xb9'
> >> > As you can see - 2-bytes per character ...
> >> > SQLField defaults are shown on p.138 - 'string', length=32 is default.
> >> > Try
> >> > that, see if that works for you.
> >> > Hope that helps.
> >> > Regards,
> >> > Yarko
> >> > 2009/3/21 Alexei Vinidiktov <[email protected]>
> >> >>
> >> >> Hello,
> >> >>
> >> >> I'm just beginning to learn web2py. I've bought the web2py manual and
> >> >> am reading Chapter 1.
> >> >>
> >> >> I've defined a model through the admin interface:
> >> >>
> >> >> db = SQLDB('sqlite://storage.db')
> >> >> db.define_table('contacts',
> >> >>    SQLField('name', 'string', length=20),
> >> >>    SQLField('phone', 'string', length=12))
> >> >>
> >> >> When I go to the admin interface to add some records, I can add names
> >> >> that are written with Latin characters just fine, but when I try to
> >> >> enter a name written with Cyrillic characters, I get an error that
> >> >> says that the name is too long, although it is not.
> >> >>
> >> >> For example, if I enter the name Олег Зимний, which is 11 characters
> >> >> long, I get that error.
> >> >>
> >> >> If I enter a short name such as Олег, the record is added fine.
> >> >>
> >> >> The maximum length is set to 20 in the table definition and names
> with
> >> >> Latin characters whose length is up to 20 characters can be added
> >> >> fine.
> >> >>
> >> >> Is it a web2py bug? If it is, can it be easily fixed?
> >> >>
> >> >> --
> >> >> Alexei Vinidiktov
> >> >>
> >> >>
> >> >
> >> >
> >> > >
> >> >
> >>
> >>
> >>
> >> --
> >> Alexei Vinidiktov
> >>
> >>
> >
> >
> > >
> >
>
>
>
> --
> Alexei Vinidiktov
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py Web Framework" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to