On 3/12/2011 7:21 PM, Terry Reedy wrote:
(Ok, I assumed that the 'word' field does not include any of !"#$%&'()*+. If that is not true, replace comma with space or even a control char such as '\a' which even precedes \t and \n.)

OK, I agree the above was your worst assumption, although you need to add "," to your list also, because that allows for the data puns.

You also rewrote Guido's text from "shortstring" to "word" and assumed it had certain content semantics, but since only integer is after the ",", rsplit would work to separate the fields even if shortstring contains ",".

And the choice of delimiter really determines whether data puns can exist. If and only if you know that there is a character that is lower in sort order than any of the characters in the sort strings, can you "cheat" and put a variable length string into a sort key field, by terminating it with such a character. The safest such character is \0, unless you are coding in C, then \a as you now suggest, but only if you can be 100% sure it is not found in the data. If you cannot guarantee the data doesn't contain them, there will be the possibility of data puns among variable length strings, and the algorithms will sort wrong in pathological cases.

I wouldn't have called you on this, except that it really is important not to give people the idea that you can blithely use a variable length string anywhere except at the tail of a multi-field sort string. In general, you can't. I've long since lost track of the number of times I've helped people understand the fix to programs that tried that.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to