Thank you massimo,

i have try

"""
re.UNICODE
Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode
character properties database.
"""

el_list = html.elements(find=re.compile(r'pers\wnliche', re.I|re.U))

but no match.

when i do:

    html = TAG(data)
    res = []
    res_add = res.append
    for el in html.components:
        if type(el) != type(str()):
            text = el.flatten()
            match = re.compile('persönliche', re.I).search(text)
            if match:
                res_add(text)

i get the match.

Best regards

On 7 Jun., 20:00, Massimo Di Pierro <[email protected]>
wrote:
> You will need to encode utff8 and I am not sure how to encode utf8 in
> regex.
>
> On Jun 7, 11:46 am, pubu <[email protected]> wrote:
>
>
>
>
>
>
>
> > Hi,
> > i am trying to search inside html data, which i get from DB(sqllite).
>
> > Everythink works well, but if i try to search for 'äöü', i get no
> > match.
>
> >    #before ö -> &ouml;
> >     data = '<a>pers&ouml;nliche bitte</a>' # data from db created with
> > wysiwyg editor
> >     html = TAG(data)
> >     #after  &ouml; -> ?
> >     el_list = html.elements(find=re.compile('pers&ouml;nliche',
> > re.IGNORECASE)) or html.elements(find=re.compile('persönliche',
> > re.IGNORECASE))
>
> > Can anybody tell me what i am doing wrong?
>
> > Thanks in advance,
> > paul

Reply via email to