Ton Hospel wrote: > >> Currently unpack("A") strips \0 and all "classic" whitespace > >> from the end of the string. Now that pack/unpack are encoding > >> neutral, the question arises what whitespace is in case of > >> unicode. Unicode has a lot more whitespace than the classic set > >> recognized by C isspace(3). > >> > >> I could: > >> > >> 1) If the original string was unicode, strip all unicode > >> whitespace from the end. > >> 2) Only strip real space in all cases > > > > 3) Leave well enough alone. > > In which case the documentation should be changed, because that only > mentions stripping of spaces, not (a restricted set of) whitespace.
Yes. Actually, (putting my exegetist hat here), I have the impression that the original intent of the documentation was to document the current behavior -- strip all classic whitespace. > So I proposed to either do: > > 1) generalize it to do s/[\s\0]+\z// again, which means detecting unicode > spaces in case the result string is upgraded (and update the docs which > were always inconsistent with the code). This would be backward compatible > for the cases where the output string (without stripping) is the same as > before (a downgraded string) > > or > > 2) make it do what was documented: s/[ \0]+\z// > (which I also think makes more sense anyways since pack "A" only > ever pads with real spaces) > > So now you propose > > 3) leave it at s/[ \t\r\n\f\0]+\z// and document that (I assume at least > you'd want it documented) > > If people don't like 2) I think we should then do 1) . 3) is just a weird > inbetween. I'd prefer 3, because it's the more conservative. However, having to choose between 1 and 2, I think 1 is more right, since it continues what the current implementation does more consistently. --