[issue32198] \b reports false-positives in Indic strings involving combining marks

2017-12-02 Thread Serhiy Storchaka
Serhiy Storchaka added the comment: This is a known issue. See also issue1693050, issue12731, issue25743. I hope it will be solved in 3.7 and maybe the solution will be backported to 2.7 and 3.6 (but not to 3.5, 3.5 takes only security fixes). As a workaround I suggest you to use the third-pa

[issue32198] \b reports false-positives in Indic strings involving combining marks

2017-12-02 Thread Shriramana Sharma
New submission from Shriramana Sharma : Code: import re cons_taml = "[கஙசஞடணதநபமயரலவழளறன]" print(re.findall("\\b" + cons_taml + "ை|ஐ", "ஐவர் பையன் இசை சிவிகை இல்லை இவ்ஐ")) cons_deva = "[कखगघङचछजझञटठडढणतथदधनपफबभमयरलवशषसह]" print(re.findall("\\b" + cons_deva + "ै|ऐ", "ऐषमः तैलम् ईडै समीशै ईक्षै ईक