Re: [Python-Dev] New string method - splitquoted
Am Donnerstag 18 Mai 2006 06:06 schrieb Dave Cinege: This is useful, but possibly better put into practice as a separate method?? I personally don't think it's particularily useful, at least not in the special case that your patch tries to address. 1) Generally, you won't only have one character that does quoting, but several. Think of the Python syntax, where you have , ', and ''', which all behave slightly differently. The logic for and ' is simple enough to implement (basically that's what your patch does, and I'm sure it's easy enough to extend it to accept a range of characters as splitters), but if you have more complicated quoting operators (such as ), are you sure it's sensible to implement the logic in split()? 2) What should the result of this is a \test string.split(None,-1,'') be? An exception (ParseError)? Silently ignoring the missing delimiter, and returning ['this','is','a','test string']? Ignoring the delimiter altogether, returning ['this','is','a','test','string']? I don't think there's one case to satisfy all here... 3) What about escapes of the delimiter? Your current patch doesn't address them at all (AFAICT) at the moment, but what should the escaping character be? Should escape processing take place, i.E. what should the result of this is a \\\delimiter \\test.split(None,-1,'') be? Don't get me wrong, I personally find this functionality very, very interesting (I'm +0.5 on adding it in some way or another), especially as a part of the standard library (not necessarily as an extension to .split()). But there's quite a lot of semantic stuff to get right before you can implement it properly; see the complexity of the csv module, where you have to define pretty much all of this in the dialect you use to parse the csv file... Why not write up a PEP? --- Heiko. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
On Thursday 18 May 2006 11:11, Guido van Rossum wrote: This is not an apropriate function to add as a string methods. There are too many conventions for quoting and too many details to get right. One method can't possibly handle them all without an enormous number of weird options. It's better to figure out how to do this with regexps or use some of the other approaches that have been suggested. (Did anyone mention the csv module yet? It deals with this too.) Maybe my idea is better called splitexcept instead of splitquoted, as my goal is to (simply) provide a way to limit the split by delimiters, and not dive into an all encompassing quoting algorithm. It me this is in the spirit of the maxsplit option already present. Dave ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
On Thursday 18 May 2006 16:13, you wrote: Dave Cinege wrote: For example: s = ' Chan: 11 SNR: 22 ESSID: Spaced Out Wifi Enc: On' My complaint with this example is that you are just using the wrong tool to do this job. If I was going to do this, I would've immediately jumped on the regex-press train. wifi_info = re.match('^\s+' 'Chan:\s+(?Pchannel[0-9]+)\s+' 'SNR:\s+(?Psnr[0-9]+)\s+' 'ESSID:\s+(?Pessid[^]*)\s+' 'Enc:\s+(?Pencryption[a-zA-Z]+)' , s) For the 5 years of been pythoning, I've used re probably twice. I find regex to be a tool of last resort, and quite a bit of effort to get right, as regex (for me) is quite prone it giving unintended results without a good deal of thought. I don't want to have to think. That's why I use python. : ) .split() and slicing has always been python's holy grail for me, and I find it a lot easier to .replace() 'stray' chars with spaces or a delimiter and then split() that. It's easier to read and (should be) a lot quicker to process then regex. (Which I care about, as I'm also often on embedded CPU's of a few hundred MHz) So .split works just super duper.but I keep running in to situations where I'd like a substr to be excluded from the split'ing. The clearest one is excluding a 'quoted' string that has whitespace. Here's another, be it, a very poor example: s = '\t\tFrequency:2.462 GHz (Channel 11)' # This is real output from iwlist: s.replace(':',')').replace(' (','))').split(None,-1,')') ['Frequency', '2.462 GHz', 'Channel 11'] I wanted to preserve the '2.462 GHz' substr. Let's assume, that could come out as '900 MHz' or '11.3409 GHz'. The above code gets what I want in 1 shot, either way. Show me an easier way, that doesn't need multiple splits, and string re-assembly, and I'll use it. Dave ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
I'm sorry Dave, I'm afraid I can't do that. We hear you, Dave, but this is not a suitable function to add to the standard library. Many respondents are trying to tell you that in many different ways. If you keep arguing for it, we'll just ignore you. --Guido PS. Give up TDMA. Try Spambayes instead. It works much better and is less annoying for your correspondents. On 5/18/06, Dave Cinege [EMAIL PROTECTED] wrote: On Thursday 18 May 2006 16:13, you wrote: Dave Cinege wrote: For example: s = ' Chan: 11 SNR: 22 ESSID: Spaced Out Wifi Enc: On' My complaint with this example is that you are just using the wrong tool to do this job. If I was going to do this, I would've immediately jumped on the regex-press train. wifi_info = re.match('^\s+' 'Chan:\s+(?Pchannel[0-9]+)\s+' 'SNR:\s+(?Psnr[0-9]+)\s+' 'ESSID:\s+(?Pessid[^]*)\s+' 'Enc:\s+(?Pencryption[a-zA-Z]+)' , s) For the 5 years of been pythoning, I've used re probably twice. I find regex to be a tool of last resort, and quite a bit of effort to get right, as regex (for me) is quite prone it giving unintended results without a good deal of thought. I don't want to have to think. That's why I use python. : ) .split() and slicing has always been python's holy grail for me, and I find it a lot easier to .replace() 'stray' chars with spaces or a delimiter and then split() that. It's easier to read and (should be) a lot quicker to process then regex. (Which I care about, as I'm also often on embedded CPU's of a few hundred MHz) So .split works just super duper.but I keep running in to situations where I'd like a substr to be excluded from the split'ing. The clearest one is excluding a 'quoted' string that has whitespace. Here's another, be it, a very poor example: s = '\t\tFrequency:2.462 GHz (Channel 11)' # This is real output from iwlist: s.replace(':',')').replace(' (','))').split(None,-1,')') ['Frequency', '2.462 GHz', 'Channel 11'] I wanted to preserve the '2.462 GHz' substr. Let's assume, that could come out as '900 MHz' or '11.3409 GHz'. The above code gets what I want in 1 shot, either way. Show me an easier way, that doesn't need multiple splits, and string re-assembly, and I'll use it. Dave ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Am Donnerstag 18 Mai 2006 06:06 schrieb Dave Cinege: This is useful, but possibly better put into practice as a separate method?? I personally don't think it's particularily useful, at least not in the special case that your patch tries to address. 1) Generally, you won't only have one character that does quoting, but several. Think of the Python syntax, where you have , ', and ''', which all behave slightly differently. The logic for and ' is simple enough to implement (basically that's what your patch does, and I'm sure it's easy enough to extend it to accept a range of characters as splitters), but if you have more complicated quoting operators (such as ), are you sure it's sensible to implement the logic in split()? 2) What should the result of this is a \test string.split(None,-1,'') be? An exception (ParseError)? Silently ignoring the missing delimiter, and returning ['this','is','a','test string']? Ignoring the delimiter altogether, returning ['this','is','a','test','string']? I don't think there's one case to satisfy all here... 3) What about escapes of the delimiter? Your current patch doesn't address them at all (AFAICT) at the moment, but what should the escaping character be? Should escape processing take place, i.E. what should the result of this is a \\\delimiter \\test.split(None,-1,'') be? Don't get me wrong, I personally find this functionality very, very interesting (I'm +0.5 on adding it in some way or another), especially as a part of the standard library (not necessarily as an extension to .split()). But there's quite a lot of semantic stuff to get right before you can implement it properly; see the complexity of the csv module, where you have to define pretty much all of this in the dialect you use to parse the csv file... Why not write up a PEP? --- Heiko. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
On 5/17/06, Dave Cinege [EMAIL PROTECTED] wrote: Very oftenmake that very very very very very very very very very often, I find myself processing text in python that when .split()'ing a line, I'd like to exclude the split for a 'quoted' item...quoted because it contains whitespace or the sep char. For example: s = ' Chan: 11 SNR: 22 ESSID: Spaced Out Wifi Enc: On' If I want to yank the essid in the above example, it's a pain. But with my new dandy split quoted method, we have a 3rd argument to .split() that we can spec the quote delimiter where no splitting will occur, and the quote char will be dropped: s.split(None,-1,'')[5] 'Spaced Out Wifi' Attached is a proof of concept patch against Python-2.4.1/Objects/stringobject.c that implements this. It is limited to whitespace splitting only. (sep == None) As implemented the quote delimiter also doubles as an additional separator for the spliting out a substr. For example: 'There isno whitespace before thesequotes'.split(None,-1,'') ['There', 'is', 'no whitespace before these', 'quotes'] This is useful, but possibly better put into practice as a separate method?? Comments please. What's wrong with: re.findall(r'[^]*|[^\s]+', s) YMMV, n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Heiko Wundram [EMAIL PROTECTED] wrote: Don't get me wrong, I personally find this functionality very, very interesting (I'm +0.5 on adding it in some way or another), especially as a part of the standard library (not necessarily as an extension to .split()). It's already there. It's called shlex.split(), and follows the semantic of a standard UNIX shell, including escaping and other things. import shlex shlex.split(rHey I\'m a bad guy for you) ['Hey', I'm, 'a', 'bad guy', 'for', 'you'] Giovanni Bajo ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Am Donnerstag 18 Mai 2006 10:21 schrieb Giovanni Bajo: Heiko Wundram [EMAIL PROTECTED] wrote: Don't get me wrong, I personally find this functionality very, very interesting (I'm +0.5 on adding it in some way or another), especially as a part of the standard library (not necessarily as an extension to .split()). It's already there. It's called shlex.split(), and follows the semantic of a standard UNIX shell, including escaping and other things. I knew about *nix shell escaping, but that isn't necessarily what I find in input I have to process (although generally it's what you see, yeah). That's why I said that it would be interesting to have a generalized method, sort of like the csv module but only for string interpretation, which takes a dialect, and parses a string for the specified dialect. Remember, there also escaping by doubling the end of string marker (for example, 'this is not a single argument'.split() should be parsed as ['this','is','not','a',]), and I know programs that use exactly this format for file storage. Maybe, one could simply export the function the csv module uses to parse the actual data fields as a more prominent method, which accepts keyword arguments, instead of a Dialect-derived class. --- Heiko. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Heiko Wundram [EMAIL PROTECTED] wrote: Don't get me wrong, I personally find this functionality very, very interesting (I'm +0.5 on adding it in some way or another), especially as a part of the standard library (not necessarily as an extension to .split()). It's already there. It's called shlex.split(), and follows the semantic of a standard UNIX shell, including escaping and other things. I knew about *nix shell escaping, but that isn't necessarily what I find in input I have to process (although generally it's what you see, yeah). That's why I said that it would be interesting to have a generalized method, sort of like the csv module but only for string interpretation, which takes a dialect, and parses a string for the specified dialect. Remember, there also escaping by doubling the end of string marker (for example, 'this is not a single argument'.split() should be parsed as ['this','is','not','a',]), and I know programs that use exactly this format for file storage. I never met this one. Anyway, I don't think it's harder than: def mysplit(s): ... Allow double quotes to escape a quotes ... return shlex.split(s.replace(r'', r'\')) ... mysplit('This is not a single argument') ['This', 'is', 'not', 'a', 'single', 'argument'] Maybe, one could simply export the function the csv module uses to parse the actual data fields as a more prominent method, which accepts keyword arguments, instead of a Dialect-derived class. I think you're over-generalizing a very simple problem. I believe that str.split, shlex.split, and some simple variation like the one above (maybe using regular expressions to do the substitution if you have slightly more complex cases) can handle 99.99% of the splitting cases. They surely handle 100% of those I myself had to parse. I believe the standard library already covers common usage. There will surely be cases where a custom lexer/splitetr will have to be written, but that's life :) Giovanni Bajo ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Am Donnerstag 18 Mai 2006 12:26 schrieb Giovanni Bajo: I believe the standard library already covers common usage. There will surely be cases where a custom lexer/splitetr will have to be written, but that's life The csv data field parser handles all common usage I have encountered so far, yes. ;-) But, generally, you can't (easily) get at the method that parses a data field directly, that's why I proposed to publish that method with keyword arguments. (actually, I've only tried getting at it when the csv module was still plain-python, I wouldn't even know whether the method is exported now that the module is written in C). I've had the need to write a custom lexer time and again, and generally, I'd love to have a little more general string interpretation facility available to spare me from writing a state automaton... But as I said before, the simple patch that was proposed here won't do for my case. But I don't know if it's worth the trouble to actually write a more general version, because there are quite some different pitfalls that have to be overcome... I still remain +0.5 for adding something like this to the stdlib, but only if it's overly general so that it can handle all cases the csv module can handle. --- Heiko. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Dave Cinege wrote: Very oftenmake that very very very very very very very very very often, I find myself processing text in python that when .split()'ing a line, I'd like to exclude the split for a 'quoted' item...quoted because it contains whitespace or the sep char. For example: s = ' Chan: 11 SNR: 22 ESSID: Spaced Out Wifi Enc: On' Even if you don't like Neal's more efficient regex-based version, the necessary utility function to do a two-pass split operation really isn't that tricky: def split_quoted(text, sep=None, quote=''): sections = text.split(quote) result = [] for idx, unquoted_text in enumerate(sections[::2]): result.extend(unquoted_text.split(sep)) quoted = 2*idx+1 quoted_text = sections[quoted:quoted+1] result.extend(quoted_text) return result split_quoted(' Chan: 11 SNR: 22 ESSID: Spaced Out Wifi Enc: On') ['Chan:', '11', 'SNR:', '22', 'ESSID:', 'Spaced Out Wifi', 'Enc:', 'On'] Given that this function (or a regex based equivalent) is easy enough to add if you do need it, I don't find the idea of increasing the complexity of the basic split API particularly compelling. Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --- http://www.boredomandlaziness.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
This is not an apropriate function to add as a string methods. There are too many conventions for quoting and too many details to get right. One method can't possibly handle them all without an enormous number of weird options. It's better to figure out how to do this with regexps or use some of the other approaches that have been suggested. (Did anyone mention the csv module yet? It deals with this too.) --Guido On 5/17/06, Dave Cinege [EMAIL PROTECTED] wrote: Very oftenmake that very very very very very very very very very often, I find myself processing text in python that when .split()'ing a line, I'd like to exclude the split for a 'quoted' item...quoted because it contains whitespace or the sep char. For example: s = ' Chan: 11 SNR: 22 ESSID: Spaced Out Wifi Enc: On' If I want to yank the essid in the above example, it's a pain. But with my new dandy split quoted method, we have a 3rd argument to .split() that we can spec the quote delimiter where no splitting will occur, and the quote char will be dropped: s.split(None,-1,'')[5] 'Spaced Out Wifi' Attached is a proof of concept patch against Python-2.4.1/Objects/stringobject.c that implements this. It is limited to whitespace splitting only. (sep == None) As implemented the quote delimiter also doubles as an additional separator for the spliting out a substr. For example: 'There isno whitespace before thesequotes'.split(None,-1,'') ['There', 'is', 'no whitespace before these', 'quotes'] This is useful, but possibly better put into practice as a separate method?? Comments please. Dave ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Am Donnerstag 18 Mai 2006 17:11 schrieb Guido van Rossum: (Did anyone mention the csv module yet? It deals with this too.) Yes, mentioned it thrice. ;-) --- Heiko. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
On Thursday 18 May 2006 03:00, Heiko Wundram wrote: Am Donnerstag 18 Mai 2006 06:06 schrieb Dave Cinege: This is useful, but possibly better put into practice as a separate method?? I personally don't think it's particularily useful, at least not in the special case that your patch tries to address. Well I'm thinking along the lines of a method to extract only quoted substr's: ' this is something andnothing elsebut junk'.splitout('') ['something ', 'nothing else'] Useful? I dunno splitters), but if you have more complicated quoting operators (such as ), are you sure it's sensible to implement the logic in split()? Probably not. See below... 2) What should the result of this is a \test string.split(None,-1,'') be? An exception (ParseError)? I'd probably vote for that. However my current patch will simply play dumb and stop split'ing the rest of the line, dropping the first quote. 'this is a test string'.split(None,-1,'') ['this', 'is', 'a', 'test string'] Silently ignoring the missing delimiter, and returning ['this','is','a','test string']? Ignoring the delimiter altogether, returning ['this','is','a','test','string']? I don't think there's one case to satisfy all here... Well the point to the patch is a KISS approach to extending the split() method just slightly to exclude a range of substr from split'ing by delimiter, not to engage in further text processing. I'm dealing with this ALL the time, while processing output from other programs. (Windope) fIlenames, (poorly considered) wifi network names, etc. For me it's always some element with whitespace in it and double quotes surrounding it, that otherwise I could just use a slice to dump the quotes for the needed element 'filename: /root/tmp.txt'.split()[1] [1:-1] '/root/tmp.txt' OK 'filename: /root/is a bit slow.txt'.split()[1] [1:-1] '/root/i' NOT OK This exact bug just zapped me in a product I have, that I didn't forsee whitespace turning up in that element. Thus my patch: 'filename: /root/is a bit slow.txt'.split(None,-1,'')[1] '/root/is a bit slow.txt' LIFE IS GOOD 3) What about escapes of the delimiter? Your current patch doesn't address them at all (AFAICT) at the moment, And it wouldn't, just like the current split doesn't. 'this is a \ test string'.split() ['this', 'is', 'a', '\\', 'test', 'string'] Don't get me wrong, I personally find this functionality very, very interesting (I'm +0.5 on adding it in some way or another), especially as a part of the standard library (not necessarily as an extension to .split()). I'd be happy to have this in as .splitquoted(), but once you use it, it seems more to me like a natural 'ought to be there' extension to split itself. Why not write up a PEP? Because I have no idea of the procedure. : ) URL? Dave ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
On Thursday 18 May 2006 04:21, Giovanni Bajo wrote: It's already there. It's called shlex.split(), and follows the semantic of a standard UNIX shell, including escaping and other things. Not quite. As I said in my other post, simple is the idea for this, just like the split method itself. (no escaping, etc.just recognizing delimiters as an exception to the split seperatation) shlex.split() does not let one choose the separator or use a maxsplit, nor is it a pure method to strings. Dave ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] New string method - splitquoted
Dave Cinege wrote: It's already there. It's called shlex.split(), and follows the semantic of a standard UNIX shell, including escaping and other things. Not quite. As I said in my other post, simple is the idea for this, just like the split method itself. (no escaping, etc.just recognizing delimiters as an exception to the split seperatation) And what's the actual problem? You either have a syntax which does not support escaping or one that it does. If it can't be escaped, there won't be any weird characters in the way, and shlex.split() will do it. If it does support escaping in a decent way, you can either use shlex.split() directly or modify the string before (like I've shown in the other message). In any case, you get your job done. Do you have any real-world case where you are still not able to split a string? And if you do, are they really so many to warrant a place in the standard library? As I said before, I think that split() and shlex.split() cover the majority of real world usage cases. shlex.split() does not let one choose the separator or use a maxsplit Real-world use case? Show me what you need to parse, and I assume this weird format is generated by a program you have not written yourself (or you could just change it to generate a more standard and simple format!) , nor is it a pure method to strings. This is a totally different problem. It doesn't make it less useful nor it does provide a need for adding a new method to the string. -- Giovanni Bajo ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] New string method - splitquoted
Very oftenmake that very very very very very very very very very often, I find myself processing text in python that when .split()'ing a line, I'd like to exclude the split for a 'quoted' item...quoted because it contains whitespace or the sep char. For example: s = ' Chan: 11 SNR: 22 ESSID: Spaced Out Wifi Enc: On' If I want to yank the essid in the above example, it's a pain. But with my new dandy split quoted method, we have a 3rd argument to .split() that we can spec the quote delimiter where no splitting will occur, and the quote char will be dropped: s.split(None,-1,'')[5] 'Spaced Out Wifi' Attached is a proof of concept patch against Python-2.4.1/Objects/stringobject.c that implements this. It is limited to whitespace splitting only. (sep == None) As implemented the quote delimiter also doubles as an additional separator for the spliting out a substr. For example: 'There isno whitespace before thesequotes'.split(None,-1,'') ['There', 'is', 'no whitespace before these', 'quotes'] This is useful, but possibly better put into practice as a separate method?? Comments please. Dave --- stringobject.c.orig 2006-05-17 16:12:13.0 -0400 +++ stringobject.c 2006-05-17 23:49:52.0 -0400 @@ -1336,6 +1336,85 @@ return NULL; } +// dc: split quoted example +// 'This string has not only this and this butthis mixed in stringas well as this empty one and two more at the end'.split(None,-1,'') +// CORRECT: ['This', 'string', 'has', 'not only this', 'and this', 'but', 'this mixed in string', 'as', 'well', 'as', 'this', '', 'empty', 'one', 'and', 'two', 'more', 'at', 'the', 'end', '', ''] +static PyObject * +split_whitespace_quoted(const char *s, int len, int maxsplit, const char *qsub) +{ + int i, j, quoted = 0; + PyObject *str; + PyObject *list = PyList_New(0); + + if (list == NULL) + return NULL; + + for (i = j = 0; i len; ) { + + if (!quoted) { + while (i len isspace(Py_CHARMASK(s[i])) ) +i++; + } + + if (Py_CHARMASK(s[i]) == Py_CHARMASK(qsub[0])) { + quoted = 1; + i++; + } + + j = i; + + while (i len) { + if (Py_CHARMASK(s[i]) == Py_CHARMASK(qsub[0])) { +if (quoted) + quoted = 2; // End of quotes found +else { + quoted = 1; // Else start of new quotes in the middle of a string +} +break; + } else if (!quoted isspace(Py_CHARMASK(s[i]))) + break; + i++; + } + + if (quoted == 2 j == i) { // Empty string in quotes + SPLIT_APPEND(, 0, 0); + quoted = 0; + i++; + j = i; + + } else if (j i) { + if (maxsplit-- = 0) +break; + SPLIT_APPEND(s, j, i); + + if (quoted == 2) { +quoted = 0; +i++; + } else if (quoted == 1) { +i++; +if (Py_CHARMASK(s[i]) == Py_CHARMASK(qsub[0])) { // Embedded empty string in quotes (at end of string?) + SPLIT_APPEND(, 0, 0); + quoted = 0; + i++; +} + } else { +while (i len isspace(Py_CHARMASK(s[i]))) + i++; + } + + j = i; + } + } + if (j len) { + SPLIT_APPEND(s, j, len); + } + return list; + onError: + Py_DECREF(list); + return NULL; +} + + static PyObject * split_char(const char *s, int len, char ch, int maxcount) { @@ -1376,15 +1455,27 @@ static PyObject * string_split(PyStringObject *self, PyObject *args) { - int len = PyString_GET_SIZE(self), n, i, j, err; + int len = PyString_GET_SIZE(self), n, qn, i, j, err; int maxsplit = -1; - const char *s = PyString_AS_STRING(self), *sub; - PyObject *list, *item, *subobj = Py_None; + const char *s = PyString_AS_STRING(self), *sub, *qsub; + PyObject *list, *item, *subobj = Py_None, *qsubobj = Py_None; - if (!PyArg_ParseTuple(args, |Oi:split, subobj, maxsplit)) + if (!PyArg_ParseTuple(args, |OiO:split, subobj, maxsplit, qsubobj)) return NULL; if (maxsplit 0) maxsplit = INT_MAX; + if (qsubobj != Py_None) { + if (PyString_Check(qsubobj)) { + qsub = PyString_AS_STRING(qsubobj); + qn = PyString_GET_SIZE(qsubobj); + } + if (qn == 0) { + PyErr_SetString(PyExc_ValueError, empty delimiter); + return NULL; + } + if (subobj == Py_None) + return split_whitespace_quoted(s, len, maxsplit, qsub); + } if (subobj == Py_None) return split_whitespace(s, len, maxsplit); if (PyString_Check(subobj)) { ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com