[Python-ideas] prefix/suffix for bytes (was: New explicit methods to trim strings)

Cameron Simpson Sat, 07 Mar 2020 16:34:07 -0800

On 07Mar2020 15:01, Christopher Barker <python...@gmail.com> wrote:

On Fri, Mar 6, 2020 at 5:54 PM Guido van Rossum <gu...@python.org> wrote:
(Since bytes may be used for file names I think they should get thisnew capability too.)
I don’t really care one way or another, but is it really still thecase that bytes need to be used for filenames? For uses other than just passing
them around?
Yes, Linux in particular does not guarantee that file names are using any
particular encoding (let alone a consistent encoding for different files).
The only two bytes that are special are '\0' and '/'.
I *think* I understand the issues. And I can see that some software would
need to work with filenames as arbitrary bytes. But that doesn't mean that
you can do much with them that way.

Given that the entire UNIX filename API is bytes, I think this isn'tvery true.

I can see filename.split(b'/') for instance, but how could you strip a
prefix or suffix without knowing the encoding?


Well, directly:

   filename.cutsuffix(b'.abc')

But more seriously, you're either treating them as bytes with noparticular encoding and the above just means "remove these 4 bytes" oryou do know the encoding and are working with strings, so you'd eitherhave a string andcut a string, or have bytes and cut the value'.abc'.encode(encoding=known_encoding).

Things like listdir are dual mode: call it with a bytes directory nameand you get bytes results, call it with a string directory name and youget string results. There's some funky encoding accomodation in there(read the docs, it's a little subtlety to do with returning stringswhich didn't decode cleanly from the underlying bytes).

filename.strip_suffix(b'.txt') would only work for ASCII-compaitble
encodings.

Or b'.txt' is your known bytes encoding of some known string suffix inyour working encoding.

But like the other string-like bytes methods, I think there's a goodcase for supporting bytes prefixes and suffixes; it is just a matter ofusing the correct bytes affix in the regime you're working in. Might notbe filenames, either.

There's no way around the fact that you have to make SOME
assumptions about the encoding if you are going to do anything other than
pass it around or work with the b'/' byte.


They needn't be assumptions; all code has some outer context.

And if that's the case, then you
might as well decode and use 'surrogateescape' so the program won't crash.

Ah, I see you've encountered the listdir-return-string stuff alreadythen.

Getting OT, but I do wonder if we should continue to support (and therefor
encourage) the use of bytes in inappropriate ways.

I think there's plenty of reasonable bytes actions which look a lot likestring actions, and are not confusing. Consider this contrived example:


   payload_bytes = packet_bytes.cutprefix(header_bytes)

There was an interesting writeup by a guy involved in the mercurialPython 3 port where he discusses the pain which came with the bytes typelacking a lot of the string support methods when Python 3 first cameout. He suggests a lot of things would have gone far smoother withthese, as Mercurial had a lot of filenames-as-bytes-strings inside. Herewe are:


   
https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journey-to-and-reflections-on-python-3/

Personally I lean the other way, and welcomed the initial lack ofstringish methods as a good way to uncover bytes mistakenly used forstrings. But I see his point.


Cheers,
Cameron Simpson <c...@cskk.id.au>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/A7TYUKFN74XOOD5MJGBDG5GMUGNTEFXR/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] prefix/suffix for bytes (was: New explicit methods to trim strings)

Reply via email to