[issue26784] regular expression problem at umlaut handling

2016-04-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Because "[\s\w]*" matches only a part of "Bläh": "Bl\xc3". -- ___ Python tracker ___

[issue26784] regular expression problem at umlaut handling

2016-04-16 Thread Marcus
Marcus added the comment: When I replace the first "ä" with a random letter the untouched expression has not problems to match the second word which contains also an "ä" s = "E-112233-555-11 | Bläh - Bläh" #untuched string s = "E-112233-555-11 | Bloh - Bläh" #string where the first ä is

[issue26784] regular expression problem at umlaut handling

2016-04-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: Sorry, I don't understand you. If the regex failed to match the first "ä", it can't match the second "ä". Do you have an example? -- ___ Python tracker

[issue26784] regular expression problem at umlaut handling

2016-04-16 Thread Marcus
Marcus added the comment: Thx for your explanation. You explained why [\s\w] didn't match for "ä". In my situation it didn't matches for the first "ä" but the second time I used [\s\w] in the same regex it matched at the second "ä". What's the explanation for this? --

[issue26784] regular expression problem at umlaut handling

2016-04-16 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: First, in the context of Python a crash means a core dump or an analogue on Windows. In this case the code just works not as you expected. The short answer: s should be a unicode. In your code "ä" is encoded as 8-bit string '\xc3\xa4'. When matched, every

[issue26784] regular expression problem at umlaut handling

2016-04-16 Thread SilentGhost
Changes by SilentGhost : -- components: +Regular Expressions nosy: +ezio.melotti, mrabarnett, pitrou, serhiy.storchaka ___ Python tracker

[issue26784] regular expression problem at umlaut handling

2016-04-16 Thread Marcus
New submission from Marcus: Working with this example string "E-112233-555-11 | Bläh - Bläh" with the following code leeds under python 2.7.10 (OSX) to an exception whereas the same code works under python 3.5.1 (OSX). s = "E-112233-555-11 | Bläh - Bläh" expr =