Thank you massimo,
i have try
"""
re.UNICODE
Make \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode
character properties database.
"""
el_list = html.elements(find=re.compile(r'pers\wnliche', re.I|re.U))
but no match.
when i do:
html = TAG(data)
res = []
res_add = res.append
for el in html.components:
if type(el) != type(str()):
text = el.flatten()
match = re.compile('persönliche', re.I).search(text)
if match:
res_add(text)
i get the match.
Best regards
On 7 Jun., 20:00, Massimo Di Pierro <[email protected]>
wrote:
> You will need to encode utff8 and I am not sure how to encode utf8 in
> regex.
>
> On Jun 7, 11:46 am, pubu <[email protected]> wrote:
>
>
>
>
>
>
>
> > Hi,
> > i am trying to search inside html data, which i get from DB(sqllite).
>
> > Everythink works well, but if i try to search for 'äöü', i get no
> > match.
>
> > #before ö -> ö
> > data = '<a>persönliche bitte</a>' # data from db created with
> > wysiwyg editor
> > html = TAG(data)
> > #after ö -> ?
> > el_list = html.elements(find=re.compile('persönliche',
> > re.IGNORECASE)) or html.elements(find=re.compile('persönliche',
> > re.IGNORECASE))
>
> > Can anybody tell me what i am doing wrong?
>
> > Thanks in advance,
> > paul