[issue17668] re.split loses characters matching ungrouped parts of a pattern

2014-09-14 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- resolution: - not a bug stage: needs patch - resolved status: pending - closed ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17668 ___

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-10-27 Thread Serhiy Storchaka
Changes by Serhiy Storchaka storch...@gmail.com: -- status: open - pending ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17668 ___ ___

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-14 Thread Mike Hoy
Changes by Mike Hoy mho...@gmail.com: -- nosy: +mikehoy ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17668 ___ ___ Python-bugs-list mailing list

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-10 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: The example I gave was the simplest possible to illustrate my point but yes, you are correct, I often match the whole string as I do recursive matches. I do use non-capturing groups but they would not solve the problem I talked about. Anyway, I had

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-09 Thread R. David Murray
R. David Murray added the comment: Only group the stuff you want to see in the result: re.split(r'(^.*$)', 'Homo sapiens catenin (cadherin-associated)') ['', 'Homo sapiens catenin (cadherin-associated)', ''] re.split(r'^(.*)$', 'Homo sapiens catenin (cadherin-associated)') ['', 'Homo sapiens

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
New submission from Tomasz J. Kotarba: Tested in 2.7 but possibly affects the other versions as well. A real life example (note the first character '' being lost): import re re.split(r'^(.*)$', 'Homo sapiens catenin (cadherin-associated)') produces: ['', 'Homo sapiens catenin

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Matthew Barnett
Matthew Barnett added the comment: It's not a bug. The documentation says Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list. You're splitting on r'^(.*)$', but

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread R. David Murray
R. David Murray added the comment: Thanks for the report, but as Matt said it doesn't look like there is any bug here. The behavior you report is what the docs say it is, and it seems to me that your most useful suggestion would discard the information about the group match, making

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Hi Matthew, Thanks for such a quick reply. I know I can get the by putting it in grouping parentheses. That's not the issue here. The documentation you quoted says that it splits the string by the occurrences _OF_PATTERN_ and that texts of all groups

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Hi R. David Murray, Thanks for your reply. I just explained in my previous message to Matthew that documentation does actually support my view (i.e. it is an issue according to the documentation). Re. the issue you mentioned (discarding information

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Marking as open till I get your response. I hope you reconsider. -- resolution: invalid - status: closed - open ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue17668

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread R. David Murray
R. David Murray added the comment: re.split('-', 'abc-def-jlk') ['abc', 'def', 'jlk'] re.split('(-)', 'abc-def-jlk') ['abc', '-', 'def', '-', 'jlk'] Does that make it a bit clearer? Maybe we need an actual example in the docs. -- assignee: - docs@python components: +Documentation

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: I agree that introducing an example like that plus making some slight changes in wording would be a welcome change to the docs to clearly explain the current behaviour. Still, I maintain it would be useful to give users the option I described to allow

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread R. David Murray
R. David Murray added the comment: As you pointed out, you can already get that behavior by enclosing the entire split expression in a group. I don't see that there is any functionality missing here. -- ___ Python tracker rep...@bugs.python.org

[issue17668] re.split loses characters matching ungrouped parts of a pattern

2013-04-08 Thread Tomasz J. Kotarba
Tomasz J. Kotarba added the comment: Hi, I can still see one piece of functionality I have mentioned missing. Using my first example, even when one uses '^((.*))$' one cannot get ['', 'Homo sapiens catenin (cadherin-associated)', ''] as one will get a four-element list and need to deal with