Ton Hospel wrote:
> >> Currently unpack("A") strips \0 and all "classic" whitespace
> >> from the end of the string. Now that pack/unpack are encoding
> >> neutral, the question arises what whitespace is in case of
> >> unicode. Unicode has a lot more whitespace than the classic set
> >> recognized by C isspace(3).
> >>
> >> I could:
> >>
> >> 1) If the original string was unicode, strip all unicode
> >>    whitespace from the end.
> >> 2) Only strip real space in all cases
> >
> >   3) Leave well enough alone.
> 
> In which case the documentation should be changed, because that only
> mentions stripping of spaces, not (a restricted set of) whitespace.

Yes.

Actually, (putting my exegetist hat here), I have the impression that
the original intent of the documentation was to document the current
behavior -- strip all classic whitespace.

> So I proposed to either do:
> 
> 1) generalize it to do s/[\s\0]+\z// again, which means detecting unicode
>    spaces in case the result string is upgraded (and update the docs which 
>    were always inconsistent with the code). This would be backward compatible
>    for the cases where the output string (without stripping) is the same as 
>    before (a downgraded string)
> 
> or
> 
> 2) make it do what was documented: s/[ \0]+\z//
>    (which I also think makes more sense anyways since pack "A" only
>     ever pads with real spaces)
> 
> So now you propose
> 
> 3) leave it at s/[ \t\r\n\f\0]+\z// and document that (I assume at least
>     you'd want it documented)
> 
> If people don't like 2) I think we should then do 1) . 3) is just a weird
> inbetween.

I'd prefer 3, because it's the more conservative. However, having to
choose between 1 and 2, I think 1 is more right, since it continues
what the current implementation does more consistently.

-- 

Reply via email to