On Wed, 21 Nov 2012 23:01:47 -0800, Giacomo Alzetta wrote: > Il giorno giovedì 22 novembre 2012 05:00:39 UTC+1, MRAB ha scritto: >> On 2012-11-22 03:41, Terry Reedy wrote: It can't return 5 because 5 >> isn't an index in 'spam'. >> >> >> >> It can't return 4 because 4 is below the start index. > > Uhm. Maybe you are right, because returning a greater value would cause > an IndexError, but then, *why* is 4 returned??? > >>>> 'spam'.find('', 4) > 4 >>>> 'spam'[4] > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > IndexError: string index out of range > > 4 is not a valid index either. I do not think the behaviour was > completely intentional.
The behaviour is certainly an edge case, but I think it is correct. (Correct or not, it has been the same going all the way back to Python 1.5, before strings even had methods, so it almost certainly will not be changed. Changing the behaviour now will very likely break hundreds, maybe thousands, of Python programs that expect the current behaviour.) Consider your string as a sequence of boxes, with index positions labelled above the string: 0-1-2-3-4 |s|p|a|m| The indexing model is that positions represent where you would cut *between* characters, not the character itself. Slices are the substring between cuts: "spam"[1:3] => "pa" while single indexes return the character to the right of the cut: "spam"[1] => "p" If there is no character to the right of the cut, indexing raises an error. Now, consider "spam".find(substring, start). This should return the number of the first cut immediately to the left of the substring, beginning the search at cut #start. "spam".find("pa", 1) => 1 because cut #1 is immediately to the left of "pa" at index 1. By this logic, "spam".find("", 4) should return 4, because cut #4 is immediately to the left of the empty string. So Python's current behaviour is justified. What about "spam".find("", 5)? Well, if you look at the string with the cuts marked as before: 0-1-2-3-4 |s|p|a|m| you will see that there is no cut #5. Since there is no cut #5, we can't sensibly say we found *anything* there, not even the empty string. If you have four boxes, you can't say that you found anything in the fifth box. I realise that this behaviour clashes somewhat with the slicing rule that says that if the slice indexes go past the end of the string, you get an empty string. But that rule is more for convenience than a fundamental rule about strings. I think there is legitimate room for disagreement about the "right" behaviour here, but backwards compatibility trumps logical correctness here, and it is very unlikely to be changed. > The docstring does not describe this edge case, so I think it could be > improved. If the first sentence(being an index in S) is kept, than it > shouldn't say that start and end are treated as in slice notation, > because that's actually not true. +1 I think that you are right that the documentation needs to be improved. -- Steven -- http://mail.python.org/mailman/listinfo/python-list