Re: [Python-Dev] PEP 467: last round (?)

2016-09-10 Thread Serhiy Storchaka

On 01.09.16 22:36, Ethan Furman wrote:

* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators


Could you please add a mention of alternative: seqtools.chunks()? 
seqtools.chunks(bytes, 1) and seqtools.chunks(bytearray, 1) should be 
equivalent to bytes.iterbytes() and bytearray.iterbytes() (but this 
function is applicable to arbitrary sequences, including memoryview and 
array).


Is there a need of a PEP for new seqtools module (currently two classes 
are planned), or just providing sample implementation on the bugtracker 
would be enough?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-06 Thread Koos Zevenhoven
On Mon, Sep 5, 2016 at 6:06 AM, Nick Coghlan  wrote:
> On 5 September 2016 at 06:42, Koos Zevenhoven  wrote:
>> On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan  wrote:
>>>
>>> There are two self-consistent sets of names:
>>>
>>
>> Let me add a few. I wonder if this is really used so much that
>> bytes.chr is too long to type (and you can do bchr = bytes.chr if you
>> want to)
>>
>> bytes.chr (or bchr in builtins)
>
> The main problem with class method based spellings is that we need to
> either duplicate it on bytearray or else break the bytearray/bytes
> symmetry and propose "bytearray(bytes.chr(x))" as the replacement for
> current cryptic "bytearray([x])"

Warning: some API-design philosophy below:

1. It's not as bad to break symmetry regarding what functionality is
offered for related object types (here: str, bytes, bytearray) than it
is to break symmetry in how the symmetric functionality is provided.
IOW, a missing unnecessary functionality is less bad than exposing the
equivalent functionality under a different name. (This might be kind
of how Random832 was reasoning previously)

2. Symmetry is more important in object access functionality than it
is in instance creation. IOW, symmetry regarding 'constructors' (here:
bchr, bytes.chr, bytes.byte, ...) across different types is not as
crucial as symmetry in slicing. The reason is that the caller of a
constructor is likely to know which class it is instantiating. A
consumer of bytes/bytearray/str-like objects often does not know which
type is being dealt with.


I might be crying over spilled milk here, but that seems to be the
point of the whole PEP. That chars view thing might collect some of
the milk back back into a bottle:

mystr[whatever]  <-> mybytes.chars[whatever] <-> mybytearray.chars[whatever]
iter(mystr) <-> iter(mybytes.chars) <-> iter(mybytearray.chars)

Then introduce 'chars' on str and this becomes

mystring.chars[whatever]  <-> mybytes.chars[whatever] <->
mybytearray.chars[whatever]
iter(mystr.chars) <-> iter(mybytes.chars) <-> iter(mybytearray.chars)

If iter(mystr.chars) is recommended and iter(mystr) discouraged, then
after a decade or two, the world may look quite different regarding
how important it is for a str to be iterable.

This would solve multiple problems at once. Well I admit that "at
once" is not really an accurate description of the process :).

[...]
> You also run into a searchability problem as "chr" will get hits for
> both the chr builtin and bytes.chr, similar to the afalg problem that
> recently came up in another thread. While namespaces are a honking
> great idea, the fact that search is non-hierarchical means they still
> don't give API designers complete freedom to reuse names at will.

Oh, I can kind of see a point here, especially if the search hits
aren't related in any way. Why not just forget all symmetry if this is
an issue? But is it really a bad thing if by searching you find that
there's a chr for both str and bytes?

If I think, "I want to turn my int into a bytes 'character' kind of in
the way that chr turns my int into a str". What am I going to search
or google for? I can't speak for others, but I would probably search
for something that contains 'chr' and 'bytes'.

Based on this, I'm unable to see the search disadvantage of bytes.chr.

[...]
>> bytes.char(or bytes.chr or bchr in builtins)
>> bytes.chars, bytearray.chars (sequence views)
>
> The views are already available via memoryview.cast if folks really
> want them, but encouraging their use in general isn't a great idea, as
> it means more function developers now need to ask themselves "What if
> someone passes me a char view rather than a normal bytes object?".

Thanks, I think this is the first real argument I hear against the
char view. In fact, I don't think people should ask themselves that
question, and just not accept bytes views as input. Would it be enough
to discourage storing and passing bytes views?

Anyway, the only error that would pass silently would be that the
passed-in object gets indexed (e.g. obj[0]) and a bytes-char comes out
instead of an int. But it would be a strange thing to do by the caller
to pass a char view into the bytes-consumer. I could imagine someone
wanting to pass a bytes view into a str-consumer. But there are no
significant silently-passing errors there. If str also gets .chars,
then it becomes even easier to support this.

-- Koos

>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-06 Thread Koos Zevenhoven
On Mon, Sep 5, 2016 at 3:30 AM, Random832  wrote:
> On Sun, Sep 4, 2016, at 16:42, Koos Zevenhoven wrote:
>> On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan  wrote:
>> >
>> > There are two self-consistent sets of names:
>> >
>>
>> Let me add a few. I wonder if this is really used so much that
>> bytes.chr is too long to type (and you can do bchr = bytes.chr if you
>> want to):
>>
>> bytes.chr (or bchr in builtins)
>> bytes.chr_at, bytearray.chr_at
>
> Ugh, that "at" is too reminiscent of java. And it just feels wrong to
> spell it "chr" rather than "char" when there's a vowel elsewhere in the
> name.
>

Oh, I didn't realize that connection. It's funny that I get a Java
connotation from get* methods ;).


> Hmm... how offensive to the zen of python would it be to have "magic" to
> allow both bytes.chr(65) and b'ABCDE'.chr[0]? (and possibly also
> iter(b'ABCDE'.chr)? That is, a descriptor which is callable on the
> class, but returns a view on instances?

Indeed quite magical, while I really like how easy it is to remember
this *once you realize what is going on*.

I think bytes.char (on class) and data.chars (on instance) would be
quite similar.

-- Koos


> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-05 Thread Ethan Furman

On 09/03/2016 09:48 AM, Nick Coghlan wrote:

On 3 September 2016 at 21:35, Martin Panter wrote:

On 3 September 2016 at 08:47, Victor Stinner wrote:

Le samedi 3 septembre 2016, Random832 a écrit :

On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:



The problem with only having `bchr` is that it doesn't help with
`bytearray`;


What is the use case for bytearray.fromord? Even in the rare case
someone needs it, why not bytearray(bchr(...))?


Yes, this was my point: I don't think that we need a bytearray method to
create a mutable string from a single byte.


I agree with the above. Having an easy way to turn an int into a bytes
object is good. But I think the built-in bchr() function on its own is
enough. Just like we have bytes object literals, but the closest we
have for a bytearray literal is bytearray(b". . .").


This is a good point - earlier versions of the PEP didn't include
bchr(), they just had the class methods, so "bytearray(bchr(...))"
wasn't an available spelling (if I remember the original API design
correctly, it would have been something like
"bytearray(bytes.byte(...))"), which meant there was a strong
consistency argument in having the alternate constructor on both
types. Now that the PEP proposes the "bchr" builtin, the "fromord"
constructors look less necessary.


tl;dr -- Sounds good to me.  I'll update the PEP.

---

When this started the idea behind the methods that eventually came to be
called "fromord" and "fromsize" was that they would be the two possible
interpretations of "bytes(x)":

  the legacy Python2 behavior:

>>> var = bytes('abc')
>>> bytes(var[1])
'b'

  the current Python 3 behavior:

>>> var = b'abc'
>>> bytes(var[1])
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
  \x00\x00'

Digging deeper the problem turns out to be that indexing a bytes object
changed:

  Python 2:

>>> b'abc'[1]
'b'

  Python 3:

>>> b'abc'[1]
98

If we pass an actual byte into the Python 3 bytes constructor it behaves
as one would expect:

>>> bytes(b'b')
b'b'

Given all this it can be argued that the real problem is that indexing a
bytes object behaves differently depending on whether you retrieve a single
byte with an index versus a single byte with a slice:

>>> b'abc'[2]
99

>>> b'abc'[2:]
b'c'

Since we cannot fix that behavior, the question is how do we make it more
livable?

- we can add a built-in to transform the int back into a byte:

  >>> bchr(b'abc'[2])
  b'c'

- we can add a method to return a byte from the bytes object, not an int:

  >>> b'abc'.getbyte(2)
  b'c'

- we can add a method to return a byte from an int:

  >>> bytes.fromint(b'abc'[2])
  b'c'

Which is all to say we have two problems to deal with:

- getting bytes from a bytes object
- getting bytes from an int

Since "bytes.fromint()" and "bchr()" are the same, and given that
"bchr(ordinal)" mirrors "chr(ordinal)", I think "bchr" is the better
choice for getting bytes from an int.

For getting bytes from bytes, "getbyte()" and "iterbytes" are good choices.


Given that, and the uncertain deprecation time frame for accepting
integers in the main bytes and bytearray constructors, perhaps both
the "fromsize" and "fromord" parts of the proposal can be deferred
indefinitely in favour of just adding the bchr() builtin?


Agreed.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-04 Thread Nick Coghlan
On 5 September 2016 at 06:42, Koos Zevenhoven  wrote:
> On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan  wrote:
>>
>> There are two self-consistent sets of names:
>>
>
> Let me add a few. I wonder if this is really used so much that
> bytes.chr is too long to type (and you can do bchr = bytes.chr if you
> want to)
>
> bytes.chr (or bchr in builtins)

The main problem with class method based spellings is that we need to
either duplicate it on bytearray or else break the bytearray/bytes
symmetry and propose "bytearray(bytes.chr(x))" as the replacement for
current cryptic "bytearray([x])"

Consider:

bytearray([x])
bytearray(bchr(x))
bytearray(byte(x))
bytearray(bytes.chr(x))

Folks that care about maintainability are generally willing to trade a
few extra characters at development time for ease of reading later,
but there are limits to how large a trade-off they can be asked to
make if we expect the alternative to actually be used (since overly
verbose code can be a readability problem in its own right).

> bytes.chr_at, bytearray.chr_at
> bytes.iterchr, bytearray.iterchr

These don't work for me because I'd expect iterchr to take encoding
and errors arguments and produce length 1 strings.

You also run into a searchability problem as "chr" will get hits for
both the chr builtin and bytes.chr, similar to the afalg problem that
recently came up in another thread. While namespaces are a honking
great idea, the fact that search is non-hierarchical means they still
don't give API designers complete freedom to reuse names at will.

> bytes.chr (or bchr in builtins)
> bytes.chrview, bytearray.chrview (sequence views)
>
> bytes.char(or bytes.chr or bchr in builtins)
> bytes.chars, bytearray.chars (sequence views)

The views are already available via memoryview.cast if folks really
want them, but encouraging their use in general isn't a great idea, as
it means more function developers now need to ask themselves "What if
someone passes me a char view rather than a normal bytes object?".

>> Strings are not going to become atomic objects, no matter how many
>> times people suggest it.
>
> You consider all non-iterable objects atomic? If str.__iter__ raises
> an exception, it does not turn str somehow atomic.

"atomic" is an overloaded word in software design, but it's still the
right one for pointing out that something people want strings to be
atomic, and sometimes they don't - it depends on what they're doing.

In particular, you can look up the many, many, many discussions of
providing a generic flatten operation for iterables, and how it always
founders on the question of types like str and bytes, which can both
be usefully viewed as an atomic unit of information, *and* as
containers of smaller units of information (NumPy arrays are another
excellent example of this problem).

> I wouldn't be
> surprised by breaking changes of this nature to python at some point.

I would, and you should be to:
http://www.curiousefficiency.org/posts/2014/08/python-4000.html

> The breakage will be quite significant, but easy to fix.

Please keep in mind that we're already 10 years into a breaking change
to Python's text handling model, with another decade or so still to go
before the legacy Python 2 text model is spoken of largely in terms
similar to the way COBOL is spoken of today. There is no such thing as
a "significant, but easy to fix" change when it comes to adjusting how
a programming language handles text data, as text handling is a
fundamental part of defining how a language is used to communicate
with people.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-04 Thread Random832
On Sun, Sep 4, 2016, at 16:42, Koos Zevenhoven wrote:
> On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan  wrote:
> >
> > There are two self-consistent sets of names:
> >
> 
> Let me add a few. I wonder if this is really used so much that
> bytes.chr is too long to type (and you can do bchr = bytes.chr if you
> want to):
> 
> bytes.chr (or bchr in builtins)
> bytes.chr_at, bytearray.chr_at

Ugh, that "at" is too reminiscent of java. And it just feels wrong to
spell it "chr" rather than "char" when there's a vowel elsewhere in the
name.

Hmm... how offensive to the zen of python would it be to have "magic" to
allow both bytes.chr(65) and b'ABCDE'.chr[0]? (and possibly also
iter(b'ABCDE'.chr)? That is, a descriptor which is callable on the
class, but returns a view on instances?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-04 Thread Koos Zevenhoven
On Sun, Sep 4, 2016 at 6:38 PM, Nick Coghlan  wrote:
>
> There are two self-consistent sets of names:
>

Let me add a few. I wonder if this is really used so much that
bytes.chr is too long to type (and you can do bchr = bytes.chr if you
want to):

bytes.chr (or bchr in builtins)
bytes.chr_at, bytearray.chr_at
bytes.iterchr, bytearray.iterchr

bytes.chr (or bchr in builtins)
bytes.chrview, bytearray.chrview (sequence views)

bytes.char(or bytes.chr or bchr in builtins)
bytes.chars, bytearray.chars (sequence views)


> bchr
> bytes.getbchr, bytearray.getbchr
> bytes.iterbchr, bytearray.iterbchr
>
> byte
> bytes.getbyte, bytearray.getbyte
> bytes.iterbytes, bytearray.iterbytes
>
> The former set emphasises the "stringiness" of this behaviour, by
> aligning with the chr() builtin
>
> The latter set emphasises that these APIs are still about working with
> arbitrary binary data rather than text, with a Python "byte"
> subsequently being a length 1 bytes object containing a single integer
> between 0 and 255, rather than "What you get when you index or iterate
> over a bytes instance".
>
> Having noticed the discrepancy, my personal preference is to go with
> the latter option (since it better fits the "executable pseudocode"
> ideal and despite my reservations about "bytes objects contain int
> objects rather than byte objects", that shouldn't be any more
> confusing in the long run than explaining that str instances are
> containers of length-1 str instances). The fact "byte" is much easier
> to pronounce than bchr (bee-cher? bee-char?) also doesn't hurt.
>
> However, I suspect we'll need to put both sets of names in front of
> Guido and ask him to just pick whichever he prefers to get it resolved
> one way or the other.
>
>> And didn't someone recently propose deprecating iterability of str
>> (not indexing, or slicing, just iterability)? Then str would also need
>> a way to provide an iterable or sequence view of the characters. For
>> consistency, the str functionality would probably need to mimic the
>> approach in bytes. IOW, this PEP may in fact ultimately dictate how to
>> get a iterable/sequence from a str object.
>
> Strings are not going to become atomic objects, no matter how many
> times people suggest it.
>

You consider all non-iterable objects atomic? If str.__iter__ raises
an exception, it does not turn str somehow atomic. I wouldn't be
surprised by breaking changes of this nature to python at some point.
The breakage will be quite significant, but easy to fix.

-- Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-04 Thread Nick Coghlan
On 4 September 2016 at 20:43, Koos Zevenhoven  wrote:
> On Sun, Sep 4, 2016 at 12:51 PM, Nick Coghlan  wrote:
>> That said, the PEP does propose "getbyte()" and "iterbytes()" for
>> bytes-oriented indexing and iteration, so there's a reasonable
>> consistency argument in favour of also proposing "byte" as the builtin
>> factory function:
>>
>> * data.getbyte(idx) would be a more efficient alternative to byte(data[idx])
>> * data.iterbytes() would be a more efficient alternative to map(byte, data)
>>
>
> .. I don't understand the argument for having 'byte' in these names.
> They should have 'char' or 'chr' in them for exacly the same reason
> that the proposed builtin should have 'chr' in it instead of 'byte'.
> If 'bytes' is an iterable of ints, then get_byte should probably
> return an int
>
> I'm sorry, but this argument comes across as "were're proposing the
> wrong thing here, so for consistency, we might want to do the wrong
> thing in this other part too".

There are two self-consistent sets of names:

bchr
bytes.getbchr, bytearray.getbchr
bytes.iterbchr, bytearray.iterbchr

byte
bytes.getbyte, bytearray.getbyte
bytes.iterbytes, bytearray.iterbytes

The former set emphasises the "stringiness" of this behaviour, by
aligning with the chr() builtin

The latter set emphasises that these APIs are still about working with
arbitrary binary data rather than text, with a Python "byte"
subsequently being a length 1 bytes object containing a single integer
between 0 and 255, rather than "What you get when you index or iterate
over a bytes instance".

Having noticed the discrepancy, my personal preference is to go with
the latter option (since it better fits the "executable pseudocode"
ideal and despite my reservations about "bytes objects contain int
objects rather than byte objects", that shouldn't be any more
confusing in the long run than explaining that str instances are
containers of length-1 str instances). The fact "byte" is much easier
to pronounce than bchr (bee-cher? bee-char?) also doesn't hurt.

However, I suspect we'll need to put both sets of names in front of
Guido and ask him to just pick whichever he prefers to get it resolved
one way or the other.

> And didn't someone recently propose deprecating iterability of str
> (not indexing, or slicing, just iterability)? Then str would also need
> a way to provide an iterable or sequence view of the characters. For
> consistency, the str functionality would probably need to mimic the
> approach in bytes. IOW, this PEP may in fact ultimately dictate how to
> get a iterable/sequence from a str object.

Strings are not going to become atomic objects, no matter how many
times people suggest it.

>> With bchr, those mappings aren't as clear (plus there's a potentially
>> unwanted "text" connotation arising from the use of the "chr"
>> abbreviation).
>
> Which mappings?

The mapping between the builtin name and the method names.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-04 Thread Koos Zevenhoven
On Sun, Sep 4, 2016 at 12:51 PM, Nick Coghlan  wrote:
> On 4 September 2016 at 08:11, Random832  wrote:
>> On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
>>> I guess one reason I don't like bchr (nor chrb, really) is that they
>>> look just like a random sequence of letters in builtins, but not
>>> recognizable the way asdf would be.
>>>
>>> I guess I have one last pair of suggestions for the name of this
>>> function: bytes.chr or bytes.char.
>
> The PEP started out with a classmethod, and that proved problematic
> due to length and the expectation of API symmetry with bytearray. A
> new builtin paralleling chr avoids both of those problems.
>
>> What about byte? Like, not bytes.byte, just builtins.byte.
>
> The main problem with "byte" as a name is that "bytes" is *not* an
> iterable of these - it's an iterable of ints. That concern doesn't
> arise with chr/str as they're both abbreviated singular nouns rather
> than one being the plural form of the other (it also doesn't hurt that
> str actually is an iterable of chr results).
>

Since you agree with me about this...

[...]
>
> That said, the PEP does propose "getbyte()" and "iterbytes()" for
> bytes-oriented indexing and iteration, so there's a reasonable
> consistency argument in favour of also proposing "byte" as the builtin
> factory function:
>
> * data.getbyte(idx) would be a more efficient alternative to byte(data[idx])
> * data.iterbytes() would be a more efficient alternative to map(byte, data)
>

.. I don't understand the argument for having 'byte' in these names.
They should have 'char' or 'chr' in them for exacly the same reason
that the proposed builtin should have 'chr' in it instead of 'byte'.
If 'bytes' is an iterable of ints, then get_byte should probably
return an int

I'm sorry, but this argument comes across as "were're proposing the
wrong thing here, so for consistency, we might want to do the wrong
thing in this other part too".

And didn't someone recently propose deprecating iterability of str
(not indexing, or slicing, just iterability)? Then str would also need
a way to provide an iterable or sequence view of the characters. For
consistency, the str functionality would probably need to mimic the
approach in bytes. IOW, this PEP may in fact ultimately dictate how to
get a iterable/sequence from a str object.

-- Koos


> With bchr, those mappings aren't as clear (plus there's a potentially
> unwanted "text" connotation arising from the use of the "chr"
> abbreviation).
>

Which mappings?

> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-04 Thread Nick Coghlan
On 4 September 2016 at 08:11, Random832  wrote:
> On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
>> I guess one reason I don't like bchr (nor chrb, really) is that they
>> look just like a random sequence of letters in builtins, but not
>> recognizable the way asdf would be.
>>
>> I guess I have one last pair of suggestions for the name of this
>> function: bytes.chr or bytes.char.

The PEP started out with a classmethod, and that proved problematic
due to length and the expectation of API symmetry with bytearray. A
new builtin paralleling chr avoids both of those problems.

> What about byte? Like, not bytes.byte, just builtins.byte.

The main problem with "byte" as a name is that "bytes" is *not* an
iterable of these - it's an iterable of ints. That concern doesn't
arise with chr/str as they're both abbreviated singular nouns rather
than one being the plural form of the other (it also doesn't hurt that
str actually is an iterable of chr results).

If we wanted a meaningful noun (other than byte) for the bchr concept,
then the alternative term that most readily comes to mind for me is
"octet", but I don't know how intuitive or memorable that would be for
folks without an embedded systems or serial communications background
(especially given that we already have 'oct', which does something
completely different).

That said, the PEP does propose "getbyte()" and "iterbytes()" for
bytes-oriented indexing and iteration, so there's a reasonable
consistency argument in favour of also proposing "byte" as the builtin
factory function:

* data.getbyte(idx) would be a more efficient alternative to byte(data[idx])
* data.iterbytes() would be a more efficient alternative to map(byte, data)

With bchr, those mappings aren't as clear (plus there's a potentially
unwanted "text" connotation arising from the use of the "chr"
abbreviation).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Koos Zevenhoven
On Sun, Sep 4, 2016 at 1:23 AM, Ivan Levkivskyi  wrote:
> On 4 September 2016 at 00:11, Random832  wrote:
>>
>> On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
>> > I guess one reason I don't like bchr (nor chrb, really) is that they
>> > look just like a random sequence of letters in builtins, but not
>> > recognizable the way asdf would be.
>> >
>> > I guess I have one last pair of suggestions for the name of this
>> > function: bytes.chr or bytes.char.
>>
>> What about byte? Like, not bytes.byte, just builtins.byte.
>
>
> I like this option, it would be very "symmetric" to have, compare:
>
chr(42)
> '*'
str()
> ''
>
> with this:
>
byte(42)
> b'*'
bytes()
> b''
>
> It is easy to explain and remember this.

In one way, I like it, but on the other hand, indexing a bytes gives
an integer, so maybe a 'byte' is just an integer in range(256). Also,
having both byte and bytes would be a slight annoyance with
autocomplete.

-- Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Ivan Levkivskyi
On 4 September 2016 at 00:11, Random832  wrote:

> On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
> > I guess one reason I don't like bchr (nor chrb, really) is that they
> > look just like a random sequence of letters in builtins, but not
> > recognizable the way asdf would be.
> >
> > I guess I have one last pair of suggestions for the name of this
> > function: bytes.chr or bytes.char.
>
> What about byte? Like, not bytes.byte, just builtins.byte.
>

I like this option, it would be very "symmetric" to have, compare:

>>>chr(42)
'*'
>>>str()
''

with this:

>>>byte(42)
b'*'
>>>bytes()
b''

It is easy to explain and remember this.

--
Ivan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Koos Zevenhoven
On Sat, Sep 3, 2016 at 6:41 PM, Ethan Furman  wrote:
>>>
>>> Open Questions
>>> ==
>>>
>>> Do we add ``iterbytes`` to ``memoryview``, or modify
>>> ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation?
>>> Or
>>> do we ignore memory for now and add it later?
>>
>>
>> Apparently memoryview.cast('s') comes from Nick Coghlan:
>>
>> .
>> However, since 3.5 (https://bugs.python.org/issue15944) you can call
>> cast("c") on most memoryviews, which I think already does what you
>> want:
>>
> tuple(memoryview(b"ABC").cast("c"))
>>
>> (b'A', b'B', b'C')
>
>
> Nice!
>

Indeed! Exposing this as bytes_instance.chars would make porting from
Python 2 really simple. Of course even better would be if slicing the
view would return bytes, so the porting rule would be the same for all
bytes subscripting:

py2str[SOMETHING]

becomes

py3bytes.chars[SOMETHING]

With the "c" memoryview there will be a distinction between slicing
and indexing.

And Random832 seems to be making some good points.

--- Koos


> --
> ~Ethan~
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com



-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Random832
On Sat, Sep 3, 2016, at 18:06, Koos Zevenhoven wrote:
> I guess one reason I don't like bchr (nor chrb, really) is that they
> look just like a random sequence of letters in builtins, but not
> recognizable the way asdf would be.
> 
> I guess I have one last pair of suggestions for the name of this
> function: bytes.chr or bytes.char.

What about byte? Like, not bytes.byte, just builtins.byte.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Random832


On Sat, Sep 3, 2016, at 08:08, Martin Panter wrote:
> On 1 September 2016 at 19:36, Ethan Furman  wrote:
> > Deprecation of current "zero-initialised sequence" behaviour without removal
> > 
> >
> > Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
> > argument and interpret it as meaning to create a zero-initialised sequence
> > of the given size::
> >
> > >>> bytes(3)
> > b'\x00\x00\x00'
> > >>> bytearray(3)
> > bytearray(b'\x00\x00\x00')
> >
> > This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
> > it in place for at least as long as Python 2.7 is supported, possibly
> > indefinitely.
> 
> Can you clarify what “deprecate” means? Just add a note in the
> documentation, or make calls trigger a DeprecationWarning as well?
> Having bytearray(n) trigger a DeprecationWarning would be a minor
> annoyance for code being compatible with Python 2 and 3, since
> bytearray(n) is supported in Python 2.

I don't think bytearray(n) should be deprecated. I don't think that
deprecating bytes(n) should entail also deprecating bytes(n).

If I were designing these classes from scratch, I would not feel any
impulse to make their constructors take the same arguments or have the
same semantics, and I'm a bit unclear on what the reason for this
decision was.

I also don't think bytes.fromcount(n) is necessary. What's wrong with
b'\0'*n? I could swear this has been answered before, but I don't recall
what the answer was. I don't think the rationale mentioned in the PEP is
an adequate explanation, it references an earlier decision, about a
conceptually different class (it's an operation that's much more common
with mutable classes than immutable ones - when's the last time you did
(None,)*n relative to [None]*n), without actually explaining the real
reason for either underlying decision (having bytearray(n) and having
both classes take the same constructor arguments).

I think that the functions we should add/keep are:
bytes(values: Union[bytes, bytearray, Iterable[int]) 
bytearray(count : int)
bytearray(values: Union[bytes, bytearray, Iterable[int]) 
bchr(integer)

If, incidentally, we're going to add a .fromsize method, it'd be nice to
add a way to provide a fill value other than 0. Also, maybe we should
also add it for list and tuple (with the default value None)?

For the (string, encoding) signatures, there's no good reason to keep
them [TOOWTDI is str.encode] but no good reason to get rid of them
either.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Koos Zevenhoven
On Sat, Sep 3, 2016 at 7:59 PM, Nick Coghlan  wrote:
> On 3 September 2016 at 03:54, Koos Zevenhoven  wrote:
>> chrb seems to be more in line with some bytes versions in for instance os
>> than bchr.
>
> The mnemonic for the current name in the PEP is that bchr is to chr as
> b"" is to "". The PEP should probably say that in addition to pointing
> out the 'unichr' Python 2 inspiration, though.

Thanks for explaining. Indeed I hope that unichr does not affect any
naming decisions that will remain in the language for a long time.

> The other big difference between this and the os module case, is that
> the resulting builtin constructor pairs here are str/chr (arbitrary
> text, single code point) and bytes/bchr (arbitrary binary data, single
> binary octet). By contrast, os.getcwd() and os.getcwdb() (and similar
> APIs) are both referring to the same operating system level operation,
> they're just requesting a different return type for the data.

But chr and "bchr" are also requesting a different return type. The
difference is that the data is not coming from an os-level operation
but from an int.

I guess one reason I don't like bchr (nor chrb, really) is that they
look just like a random sequence of letters in builtins, but not
recognizable the way asdf would be.

I guess I have one last pair of suggestions for the name of this
function: bytes.chr or bytes.char.

-- Koos


> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia


-- 
+ Koos Zevenhoven + http://twitter.com/k7hoven +
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Nick Coghlan
On 3 September 2016 at 03:54, Koos Zevenhoven  wrote:
> chrb seems to be more in line with some bytes versions in for instance os
> than bchr.

The mnemonic for the current name in the PEP is that bchr is to chr as
b"" is to "". The PEP should probably say that in addition to pointing
out the 'unichr' Python 2 inspiration, though.

The other big difference between this and the os module case, is that
the resulting builtin constructor pairs here are str/chr (arbitrary
text, single code point) and bytes/bchr (arbitrary binary data, single
binary octet). By contrast, os.getcwd() and os.getcwdb() (and similar
APIs) are both referring to the same operating system level operation,
they're just requesting a different return type for the data.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Nick Coghlan
On 3 September 2016 at 21:35, Martin Panter  wrote:
>> Le samedi 3 septembre 2016, Random832  a écrit :
>>> On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
>>> > The problem with only having `bchr` is that it doesn't help with
>>> > `bytearray`;
>>>
>>> What is the use case for bytearray.fromord? Even in the rare case
>>> someone needs it, why not bytearray(bchr(...))?
>
> On 3 September 2016 at 08:47, Victor Stinner  wrote:
>> Yes, this was my point: I don't think that we need a bytearray method to
>> create a mutable string from a single byte.
>
> I agree with the above. Having an easy way to turn an int into a bytes
> object is good. But I think the built-in bchr() function on its own is
> enough. Just like we have bytes object literals, but the closest we
> have for a bytearray literal is bytearray(b". . .").

This is a good point - earlier versions of the PEP didn't include
bchr(), they just had the class methods, so "bytearray(bchr(...))"
wasn't an available spelling (if I remember the original API design
correctly, it would have been something like
"bytearray(bytes.byte(...))"), which meant there was a strong
consistency argument in having the alternate constructor on both
types. Now that the PEP proposes the "bchr" builtin, the "fromord"
constructors look less necessary.

Given that, and the uncertain deprecation time frame for accepting
integers in the main bytes and bytearray constructors, perhaps both
the "fromsize" and "fromord" parts of the proposal can be deferred
indefinitely in favour of just adding the bchr() builtin?

We wouldn't gain the "initialise a region of memory to an arbitrary
value" feature, but it can be argued that wanting that is a sign
someone may be better off with a more specialised memory manipulation
library, rather than relying solely on the builtins.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Ethan Furman

On 09/02/2016 06:17 PM, Greg Ewing wrote:

Ethan Furman wrote:



The problem with only having `bchr` is that it doesn't
 help with `bytearray`; the problem with not having
 `bchr` is who wants to write `bytes.fromord`?


If we called it 'bytes.fnord' (From Numeric Ordinal)
people would want to write it just for the fun factor.


Very good point!  :)

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Ethan Furman

On 09/03/2016 05:08 AM, Martin Panter wrote:

On 1 September 2016 at 19:36, Ethan Furman wrote:



Deprecation of current "zero-initialised sequence" behaviour without removal


Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::

 >>> bytes(3)
 b'\x00\x00\x00'
 >>> bytearray(3)
 bytearray(b'\x00\x00\x00')

This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
it in place for at least as long as Python 2.7 is supported, possibly
indefinitely.


Can you clarify what “deprecate” means? Just add a note in the
documentation, [...]


This one.


Addition of "getbyte" method to retrieve a single byte
--

This PEP proposes that ``bytes`` and ``bytearray`` gain the method
``getbyte``
which will always return ``bytes``::


Should getbyte() handle negative indexes? E.g. getbyte(-1) returning
the last byte.


Yes.


Open Questions
==

Do we add ``iterbytes`` to ``memoryview``, or modify
``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation?  Or
do we ignore memory for now and add it later?


Apparently memoryview.cast('s') comes from Nick Coghlan:
.
However, since 3.5 (https://bugs.python.org/issue15944) you can call
cast("c") on most memoryviews, which I think already does what you
want:


tuple(memoryview(b"ABC").cast("c"))

(b'A', b'B', b'C')


Nice!

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Martin Panter
On 1 September 2016 at 19:36, Ethan Furman  wrote:
> Deprecation of current "zero-initialised sequence" behaviour without removal
> 
>
> Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
> argument and interpret it as meaning to create a zero-initialised sequence
> of the given size::
>
> >>> bytes(3)
> b'\x00\x00\x00'
> >>> bytearray(3)
> bytearray(b'\x00\x00\x00')
>
> This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
> it in place for at least as long as Python 2.7 is supported, possibly
> indefinitely.

Can you clarify what “deprecate” means? Just add a note in the
documentation, or make calls trigger a DeprecationWarning as well?
Having bytearray(n) trigger a DeprecationWarning would be a minor
annoyance for code being compatible with Python 2 and 3, since
bytearray(n) is supported in Python 2.

> Addition of "getbyte" method to retrieve a single byte
> --
>
> This PEP proposes that ``bytes`` and ``bytearray`` gain the method
> ``getbyte``
> which will always return ``bytes``::

Should getbyte() handle negative indexes? E.g. getbyte(-1) returning
the last byte.

> Open Questions
> ==
>
> Do we add ``iterbytes`` to ``memoryview``, or modify
> ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation?  Or
> do we ignore memory for now and add it later?

Apparently memoryview.cast('s') comes from Nick Coghlan:
.
However, since 3.5 (https://bugs.python.org/issue15944) you can call
cast("c") on most memoryviews, which I think already does what you
want:

>>> tuple(memoryview(b"ABC").cast("c"))
(b'A', b'B', b'C')
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Martin Panter
> Le samedi 3 septembre 2016, Random832  a écrit :
>> On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
>> > The problem with only having `bchr` is that it doesn't help with
>> > `bytearray`;
>>
>> What is the use case for bytearray.fromord? Even in the rare case
>> someone needs it, why not bytearray(bchr(...))?

On 3 September 2016 at 08:47, Victor Stinner  wrote:
> Yes, this was my point: I don't think that we need a bytearray method to
> create a mutable string from a single byte.

I agree with the above. Having an easy way to turn an int into a bytes
object is good. But I think the built-in bchr() function on its own is
enough. Just like we have bytes object literals, but the closest we
have for a bytearray literal is bytearray(b". . .").
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Martin Panter
On 2 September 2016 at 17:54, Koos Zevenhoven  wrote:
> On Thu, Sep 1, 2016 at 10:36 PM, Ethan Furman  wrote:
>> * Deprecate passing single integer values to ``bytes`` and ``bytearray``
>> * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative
>> constructors
>> * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
>> * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
>> * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative
>> iterators
>
> I wonder if from_something with an underscore is more consistent (according
> to a quick search perhaps yes).

That would not be too inconsistent with the sister constructor bytes.fromhex().
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-03 Thread Victor Stinner
Yes, this was my point: I don't think that we need a bytearray method to
create a mutable string from a single byte.

Victor

Le samedi 3 septembre 2016, Random832  a écrit :

> On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
> > The problem with only having `bchr` is that it doesn't help with
> > `bytearray`;
>
> What is the use case for bytearray.fromord? Even in the rare case
> someone needs it, why not bytearray(bchr(...))?
> ___
> Python-Dev mailing list
> Python-Dev@python.org 
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> victor.stinner%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-02 Thread Greg Ewing

Ethan Furman wrote:
The problem with only having `bchr` is that it doesn't help with 
`bytearray`; the problem with not having `bchr` is who wants to write 
`bytes.fromord`?


If we called it 'bytes.fnord' (From Numeric Ordinal)
people would want to write it just for the fun factor.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-02 Thread Random832
On Fri, Sep 2, 2016, at 19:44, Ethan Furman wrote:
> The problem with only having `bchr` is that it doesn't help with
> `bytearray`;

What is the use case for bytearray.fromord? Even in the rare case
someone needs it, why not bytearray(bchr(...))?
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-02 Thread Ethan Furman

On 09/01/2016 04:07 PM, Victor Stinner wrote:

2016-09-02 0:04 GMT+02:00 Ethan Furman:



- `fromord` to replace the mistaken purpose of the default constructor


To replace a bogus bytes(obj)? If someone writes bytes(obj) but expect
to create a byte string from an integer, why not using bchr() to fix
the code?


The problem with only having `bchr` is that it doesn't help with `bytearray`; 
the problem with not having `bchr` is who wants to write `bytes.fromord`?

So we need `bchr`, and we need `bytearray.fromord`; and since the major 
difference between `bytes` and `bytearray` is that one is mutable and one is 
not, `bytes` should also have `fromord`.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-02 Thread Koos Zevenhoven
Some quick comments below, a few more later:

On Thu, Sep 1, 2016 at 10:36 PM, Ethan Furman  wrote:
> One more iteration. PEPs repo not updated yet. Changes are renaming of
> methods to be ``fromsize()`` and ``fromord()``, and moving ``memoryview``
to
> an Open Questions section.
>
>
> PEP: 467
> Title: Minor API improvements for binary sequences
> Version: $Revision$
> Last-Modified: $Date$
> Author: Nick Coghlan , Ethan Furman <
et...@stoneleaf.us>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-03-30
> Python-Version: 3.6
> Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01
>
>
> Abstract
> 
>
> During the initial development of the Python 3 language specification, the
> core ``bytes`` type for arbitrary binary data started as the mutable type
> that is now referred to as ``bytearray``. Other aspects of operating in
> the binary domain in Python have also evolved over the course of the
Python
> 3 series.
>
> This PEP proposes five small adjustments to the APIs of the ``bytes`` and
> ``bytearray`` types to make it easier to operate entirely in the binary
> domain:
>
> * Deprecate passing single integer values to ``bytes`` and ``bytearray``
> * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative
constructors
> * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
> * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
> * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative
iterators

I wonder if from_something with an underscore is more consistent (according
to a quick search perhaps yes).

What about bytes.getchar and iterchars? A 'byte' in python 3 seems to be an
integer. (I would still like a .chars property that gives a sequence view
with __getitem__ and __len__ so that the getchar and iterchars methods are
not needed)

chrb seems to be more in line with some bytes versions in for instance os
than bchr.

Do we really need chrb? Better to introduce from_int or from_ord also in
str and recommend that over chr?

-- Koos (mobile)

>
> Proposals
> =
>
> Deprecation of current "zero-initialised sequence" behaviour without
removal
>

>
> Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
> argument and interpret it as meaning to create a zero-initialised sequence
> of the given size::
>
> >>> bytes(3)
> b'\x00\x00\x00'
> >>> bytearray(3)
> bytearray(b'\x00\x00\x00')
>
> This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
> it in place for at least as long as Python 2.7 is supported, possibly
> indefinitely.
>
> No other changes are proposed to the existing constructors.
>
>
> Addition of explicit "count and byte initialised sequence" constructors
> ---
>
> To replace the deprecated behaviour, this PEP proposes the addition of an
> explicit ``fromsize`` alternative constructor as a class method on both
> ``bytes`` and ``bytearray`` whose first argument is the count, and whose
> second argument is the fill byte to use (defaults to ``\x00``)::
>
> >>> bytes.fromsize(3)
> b'\x00\x00\x00'
> >>> bytearray.fromsize(3)
> bytearray(b'\x00\x00\x00')
> >>> bytes.fromsize(5, b'\x0a')
> b'\x0a\x0a\x0a\x0a\x0a'
> >>> bytearray.fromsize(5, b'\x0a')
> bytearray(b'\x0a\x0a\x0a\x0a\x0a')
>
> ``fromsize`` will behave just as the current constructors behave when
passed
> a single
> integer, while allowing for non-zero fill values when needed.
>
>
> Addition of "bchr" function and explicit "single byte" constructors
> ---
>
> As binary counterparts to the text ``chr`` function, this PEP proposes
> the addition of a ``bchr`` function and an explicit ``fromord``
alternative
> constructor as a class method on both ``bytes`` and ``bytearray``::
>
> >>> bchr(ord("A"))
> b'A'
> >>> bchr(ord(b"A"))
> b'A'
> >>> bytes.fromord(65)
> b'A'
> >>> bytearray.fromord(65)
> bytearray(b'A')
>
> These methods will only accept integers in the range 0 to 255
(inclusive)::
>
> >>> bytes.fromord(512)
> Traceback (most recent call last):
> File "", line 1, in 
> ValueError: integer must be in range(0, 256)
>
> >>> bytes.fromord(1.0)
> Traceback (most recent call last):
> File "", line 1, in 
> TypeError: 'float' object cannot be interpreted as an integer
>
> While this does create some duplication, there are valid reasons for it::
>
> * the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python
> 2 under a different naming scheme
> * the class method is mainly for the ``bytearray.fromord`` case, with
> ``bytes.fromord`` added for consistency
>
> The documentation of the ``ord`` builtin will be updated to explicitly
note
> that ``bchr`` is the primary inverse operation for binary data, while
> ``chr``
> is the inverse operation for text data, and that ``bytes.fro

Re: [Python-Dev] PEP 467: last round (?)

2016-09-01 Thread Victor Stinner
2016-09-02 0:04 GMT+02:00 Ethan Furman :
> - `fromord` to replace the mistaken purpose of the default constructor

To replace a bogus bytes(obj)? If someone writes bytes(obj) but expect
to create a byte string from an integer, why not using bchr() to fix
the code?

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-01 Thread Ethan Furman

On 09/01/2016 02:06 PM, Victor Stinner wrote:

2016-09-01 21:36 GMT+02:00 Ethan Furman:



Abstract


This PEP proposes five small adjustments to the APIs of the ``bytes`` and
``bytearray`` types to make it easier to operate entirely in the binary
domain:


You should add bchr() in the Abstract.


Done.


* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors


I understand that main reason for this change is to catch bugs when
bytes(obj) is used and obj is not supposed to be an integer.

So I expect that bytes(int) will be quickly deprecated, but the PEP
doesn't schedule a removal of the feature. So it looks more than only
adding an alias to bytes(int).

I would prefer to either schedule a removal of bytes(int), or remove
bytes.fromsize() from the PEP.


The PEP states that ``bytes(x)`` will not be removed while 2.7 is supported.  
Once 2.7 is no longer a concern we can visit the question of removing that 
behavior.


* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors


Hum, you already propose to add a builtin function. Why would we need
two ways to create a single byte?


- `bchr` to mirror `chr`
- `fromord` to replace the mistaken purpose of the default constructor


* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators


I like these ones :-)


Cool.


In particular, there's a reasonable case to be made
that ``bytes(x)`` (where ``x`` is an integer) should behave like the
``bytes.fromint(x)`` proposal in this PEP.


"fromint"? Is it bytes.fromord()/bchr()?


Oops, fixed.

--
~Ethan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 467: last round (?)

2016-09-01 Thread Victor Stinner
2016-09-01 21:36 GMT+02:00 Ethan Furman :
> Abstract
> 
>
> This PEP proposes five small adjustments to the APIs of the ``bytes`` and
> ``bytearray`` types to make it easier to operate entirely in the binary
> domain:

You should add bchr() in the Abstract.


> * Deprecate passing single integer values to ``bytes`` and ``bytearray``
> * Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors

I understand that main reason for this change is to catch bugs when
bytes(obj) is used and obj is not supposed to be an integer.

So I expect that bytes(int) will be quickly deprecated, but the PEP
doesn't schedule a removal of the feature. So it looks more than only
adding an alias to bytes(int).

I would prefer to either schedule a removal of bytes(int), or remove
bytes.fromsize() from the PEP.


> * Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors

Hum, you already propose to add a builtin function. Why would we need
two ways to create a single byte?

I'm talking about bchr(int)==bytes.fromord(int). I'm not sure that
there is an use case for bytearray.fromord(int).


> * Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
> * Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators

I like these ones :-)



> In particular, there's a reasonable case to be made
> that ``bytes(x)`` (where ``x`` is an integer) should behave like the
> ``bytes.fromint(x)`` proposal in this PEP.

"fromint"? Is it bytes.fromord()/bchr()?


> Open Questions
> ==
>
> Do we add ``iterbytes`` to ``memoryview``, or modify
> ``memoryview.cast()`` to accept ``'s'`` as a single-byte interpretation?  Or
> do we ignore memory for now and add it later?

It's nice to have bytes.iterbytes() to help porting Python 2 code, but
I'm not sure that this function would be super popular in new Python 3
code.

I don't think that a memoryview.iterbytes() (or cast("s")) would be useful.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 467: last round (?)

2016-09-01 Thread Ethan Furman

One more iteration.  PEPs repo not updated yet.  Changes are renaming of 
methods to be ``fromsize()`` and ``fromord()``, and moving ``memoryview`` to an 
Open Questions section.


PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan , Ethan Furman 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.6
Post-History: 2014-03-30 2014-08-15 2014-08-16 2016-06-07 2016-09-01


Abstract


During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.

This PEP proposes five small adjustments to the APIs of the ``bytes`` and
``bytearray`` types to make it easier to operate entirely in the binary domain:

* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.fromsize`` and ``bytearray.fromsize`` alternative constructors
* Add ``bytes.fromord`` and ``bytearray.fromord`` alternative constructors
* Add ``bytes.getbyte`` and ``bytearray.getbyte`` byte retrieval methods
* Add ``bytes.iterbytes`` and ``bytearray.iterbytes`` alternative iterators


Proposals
=

Deprecation of current "zero-initialised sequence" behaviour without removal


Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::

>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')

This PEP proposes to deprecate that behaviour in Python 3.6, but to leave
it in place for at least as long as Python 2.7 is supported, possibly
indefinitely.

No other changes are proposed to the existing constructors.


Addition of explicit "count and byte initialised sequence" constructors
---

To replace the deprecated behaviour, this PEP proposes the addition of an
explicit ``fromsize`` alternative constructor as a class method on both
``bytes`` and ``bytearray`` whose first argument is the count, and whose
second argument is the fill byte to use (defaults to ``\x00``)::

>>> bytes.fromsize(3)
b'\x00\x00\x00'
>>> bytearray.fromsize(3)
bytearray(b'\x00\x00\x00')
>>> bytes.fromsize(5, b'\x0a')
b'\x0a\x0a\x0a\x0a\x0a'
>>> bytearray.fromsize(5, b'\x0a')
bytearray(b'\x0a\x0a\x0a\x0a\x0a')

``fromsize`` will behave just as the current constructors behave when passed a 
single
integer, while allowing for non-zero fill values when needed.


Addition of "bchr" function and explicit "single byte" constructors
---

As binary counterparts to the text ``chr`` function, this PEP proposes
the addition of a ``bchr`` function and an explicit ``fromord`` alternative
constructor as a class method on both ``bytes`` and ``bytearray``::

>>> bchr(ord("A"))
b'A'
>>> bchr(ord(b"A"))
b'A'
>>> bytes.fromord(65)
b'A'
>>> bytearray.fromord(65)
bytearray(b'A')

These methods will only accept integers in the range 0 to 255 (inclusive)::

>>> bytes.fromord(512)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: integer must be in range(0, 256)

>>> bytes.fromord(1.0)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'float' object cannot be interpreted as an integer

While this does create some duplication, there are valid reasons for it::

* the ``bchr`` builtin is to recreate the ord/chr/unichr trio from Python
  2 under a different naming scheme
* the class method is mainly for the ``bytearray.fromord`` case, with
  ``bytes.fromord`` added for consistency

The documentation of the ``ord`` builtin will be updated to explicitly note
that ``bchr`` is the primary inverse operation for binary data, while ``chr``
is the inverse operation for text data, and that ``bytes.fromord`` and
``bytearray.fromord`` also exist.

Behaviourally, ``bytes.fromord(x)`` will be equivalent to the current
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
expected to be easier to discover and easier to read (especially when used
in conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higher
order functions like ``map``.


Addition of "getbyte" method to retrieve a single byte
--

This PEP proposes that ``bytes`` and ``bytearray`` gain the method ``getbyte``
which will always return ``bytes``::

>>> b'abc'.getbyte(0)
b'a'

If an index is asked for that doesn't exist, ``IndexError`` is rais