On Mar 29, 10:54 am, Yarko Tymciurak <[email protected]>
wrote:
> anyway,  I am sure this is about encoding to unicode - someone who has
> done this will hopefully add comments.

For example, looking at 
http://docs.python.org/library/codecs.html#standard-encodings

and searching for Chinese, from your (pasted) example, I found two
decodings that result in unicode results (that is, the codecs
recognize):

In [37]: value=r"老李"
In [38]: value
Out[38]: '\xe8\x80\x81\xe6\x9d\x8e'
In [39]: value.decode('gbk')
Out[39]: u'\u9470\u4f79\u6f55'
In [40]: value.decode('gb18030')
Out[40]: u'\u9470\u4f79\u6f55'


IMPORTANT:  both of these results show a unicode result  (i.e.
u'xxxx')

I am not sure if you need to set LOCALE for your environment / browser
so that the regular expression to work as it is (but, with this
encoding, it correctly produces the unicode match when callred with
re.UNICODE flag - but this is without locale set  (off the top of my
head, I am not sure of the proper way to setlocale within an
interpreter, to test this...)

In [44]: val=value.decode('gbk')
In [45]: re.compile(r"[\w\-:]+",re.U).findall(val)
Out[45]: [u'\u9470\u4f79\u6f55']


I hope this helps begin to show the beginning of the way:   All your
strings in your app need to be converted to unicode (one way or
another), and your locale set (normally provided from the browser, in
the request).

- Yarko
>
> On Mar 29, 10:04 am, Yarko Tymciurak <[email protected]>
> wrote:
>
> > On Mar 29, 8:33 am, hywang <[email protected]> wrote:
>
> > > -------model file is like this ---------------------
> > > db.define_table('options_contain_chinease',
> > >     Field('student_name', requires = IS_IN_SET(["Jim","小长","老李"],
> > > multiple=True)),
>
> > Using this last string from your IS_IN_SET example (I hope my copy/
> > paste did this correctly into iPython!):
>
> > In [31]: value=r"老李"
> > In [32]: value
> > Out[32]: '\xe8\x80\x81\xe6\x9d\x8e'
> > In [33]: str(value)
> > Out[33]: '\xe8\x80\x81\xe6\x9d\x8e'
> > In [34]: re.compile(r"[\w\-:]+").findall(value)
> > Out[34]: []
> > In [35]: re.compile(r"[\w\-:]+").findall(value, re.U)
> > Out[35]: []
> > In [36]: re.compile(r"[\w\-:]+",re.U).findall(value)
> > Out[36]: ['\xe8', '\xe6']
> > In [37]: re.compile(r"[\w\-:]+",re.U).findall(value,re.U)
> > Out[37]: []
>
> > --->
>
> > So it would seem you may need to setup something with LOCALE;  I have
> > played around with this for just a little bit, but am not sure what it
> > takes (zh-CN?  zh-cn?  zh_CN.gb2312?   etc.)
>
> > Maybe others can add to this...
>
> > Regards,
> > - Yarko
>
> > > )
> > > db.options_contain_chinease.student_name.widget =
> > > CheckboxesWidget.widget
>
> > > ------controller file is like this ---------------------
> > > def options_contain_chinease():
> > >     form = SQLFORM(db.options_contain_chinease)
> > >     if form.accepts(request.vars, session):
> > >         pass
> > >     return dict(form=form)
>
> > > if checked one item and submit, everything is ok, however, when
> > > checked more than one items and submit the form, an error will occur .
> > > Is it a bug ?
>
> > > thanks !

-- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en.

Reply via email to