Re: [Python-ideas] String and bytes bitwise operations

2018-06-22 Thread INADA Naoki
Hi Terry,

Thanks, but I didn't care because my password is not so long.
I just want to illustrate real world bytes xor usage.
BTW, New MySQL auth methods (sha256 and caching_sha2)
use bytes xor too.

For performance point of view, websocket masking is performance
critical.  Tornado uses extension module only for it.  If bytearray ^= bytes
is supported, websocket frame masking may look like:

  frame ^= mask * ((len(frame)+3)//4)  # mask is 4 bytes long

On Sat, Jun 23, 2018 at 1:26 AM Terry Reedy  wrote:

> On 6/22/2018 7:08 AM, INADA Naoki wrote:
> > Bitwise xor is used for "masking" code like these:
> >
> >
> https://github.com/PyMySQL/PyMySQL/blob/37eba60439039eff17b32ef1a63b45c25ea28cec/pymysql/connections.py#L139-L146
>
> This points to a function _my_crypt that is O(n*n) because of using
> bytes.append.  Using bytearray.append makes it O(n).
> ---
> import random
> import struct
> import timeit
>
> range_type = range
>
> def _my_crypt(message1, message2):
>  length = len(message1)
>  result = b''
>  for i in range_type(length):
>  x = (struct.unpack('B', message1[i:i+1])[0] ^
>   struct.unpack('B', message2[i:i+1])[0])
>  result += struct.pack('B', x)
>  return result
>
> def _my_crypt2(message1, message2):
>  length = len(message1)
>  result = bytearray()
>  for i in range_type(length):
>  x = (struct.unpack('B', message1[i:i+1])[0] ^
>   struct.unpack('B', message2[i:i+1])[0])
>  result += struct.pack('B', x)
>  return bytes(result)
>
> def make(n):
>  result = bytearray()
>  for i in range(n):
>  result.append(random.randint(0, 255))
>  return result
>
> for m in (10, 100, 1000, 10_000, 100_000, 1000_000):
>  m1 = make(m)
>  m2 = make(m)
>
>  n = 1000_000 // m
>  print(f'bytes len {m}, timeit reps {n}')
>  print('old ', timeit.timeit('_my_crypt(m1, m2)', number = n,
> globals=globals()))
>  print('new ', timeit.timeit('_my_crypt2(m1, m2)', number = n,
> globals=globals()))
> ---
> prints
>
> bytes len 10, timeit reps 10
> old  1.227759412998
> new  1.217421230999
> bytes len 100, timeit reps 1
> old  1.145566423
> new  1.092400212003
> bytes len 1000, timeit reps 1000
> old  1.286030619002
> new  1.116868583999
> bytes len 1, timeit reps 100
> old  1.654334465003
> new  1.118191714
> bytes len 10, timeit reps 10
> old  4.256849211005
> new  1.126613756011
> bytes len 100, timeit reps 1
> old  60.65123814404
> new  1.131502019992
>
> I tried to submit this to https://github.com/PyMySQL/PyMySQL/issues/new
> but [Submit] does not work for me.
>
> --
> Terry Jan Reedy
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
INADA Naoki  
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-06-22 Thread Terry Reedy

On 6/22/2018 7:08 AM, INADA Naoki wrote:

Bitwise xor is used for "masking" code like these:

https://github.com/PyMySQL/PyMySQL/blob/37eba60439039eff17b32ef1a63b45c25ea28cec/pymysql/connections.py#L139-L146


This points to a function _my_crypt that is O(n*n) because of using 
bytes.append.  Using bytearray.append makes it O(n).

---
import random
import struct
import timeit

range_type = range

def _my_crypt(message1, message2):
length = len(message1)
result = b''
for i in range_type(length):
x = (struct.unpack('B', message1[i:i+1])[0] ^
 struct.unpack('B', message2[i:i+1])[0])
result += struct.pack('B', x)
return result

def _my_crypt2(message1, message2):
length = len(message1)
result = bytearray()
for i in range_type(length):
x = (struct.unpack('B', message1[i:i+1])[0] ^
 struct.unpack('B', message2[i:i+1])[0])
result += struct.pack('B', x)
return bytes(result)

def make(n):
result = bytearray()
for i in range(n):
result.append(random.randint(0, 255))
return result

for m in (10, 100, 1000, 10_000, 100_000, 1000_000):
m1 = make(m)
m2 = make(m)

n = 1000_000 // m
print(f'bytes len {m}, timeit reps {n}')
print('old ', timeit.timeit('_my_crypt(m1, m2)', number = n, 
globals=globals()))
print('new ', timeit.timeit('_my_crypt2(m1, m2)', number = n, 
globals=globals()))

---
prints

bytes len 10, timeit reps 10
old  1.227759412998
new  1.217421230999
bytes len 100, timeit reps 1
old  1.145566423
new  1.092400212003
bytes len 1000, timeit reps 1000
old  1.286030619002
new  1.116868583999
bytes len 1, timeit reps 100
old  1.654334465003
new  1.118191714
bytes len 10, timeit reps 10
old  4.256849211005
new  1.126613756011
bytes len 100, timeit reps 1
old  60.65123814404
new  1.131502019992

I tried to submit this to https://github.com/PyMySQL/PyMySQL/issues/new
but [Submit] does not work for me.

--
Terry Jan Reedy

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-06-22 Thread INADA Naoki
Bitwise xor is used for "masking" code like these:

https://github.com/PyMySQL/PyMySQL/blob/37eba60439039eff17b32ef1a63b45c25ea28cec/pymysql/connections.py#L139-L146
https://github.com/tornadoweb/tornado/blob/0b2b055061eb4754c80a8d6bc28614b86954e336/tornado/util.py#L470-L471
https://github.com/tornadoweb/tornado/blob/master/tornado/speedups.c#L5

I think implementing it in C is really helpful for protocol library authors.

On Thu, May 17, 2018 at 7:54 PM Ken Hilton  wrote:

> Hi all,
>
> We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ (not).
> We know how they work with numbers:
>
> 420 ^ 502
>
> 110100100
> 10110
> == XOR ==
> 001010010
> = 82
>
> But it might be useful in some cases to (let's say) xor a string (or
> bytestring):
>
> HELLO ^ world
>
> 01001000 01000101 01001100 01001100 0100
> 01110111 0110 01110010 01101100 01100100
> === XOR 
> 0011 00101010 0010 0010 00101011
> = ?*> +
>
> Currently, that's done with this expression for strings:
>
> >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
> '?*> +'
>
> and this expression for bytestrings:
>
> >>> bytes(a ^ b for a, b in zip(b'HELLO', b'world'))
> b'?*> +'
>
> It would be much more convenient, however, to allow a simple xor of a
> string:
>
> >>> 'HELLO' ^ 'world'
> '?*> +'
>
> or bytestring:
>
> >>> b'HELLO' ^ b'world'
> b'?*> +'
>
> (All of this applies to other bitwise operators, of course.)
> Compatibility issues are a no-brainer - currently, bitwise operators for
> strings raise TypeErrors.
>
> Thanks.
>
> Suggesting,
> Ken
> ​ Hilton​
> ;
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
INADA Naoki  
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-21 Thread Steven D'Aprano
On Mon, May 21, 2018 at 11:22:17AM -0700, Chris Barker wrote:
> On Sat, May 19, 2018 at 6:52 AM, Steven D'Aprano 
> wrote:
> 
> > Philosophical arguments about the nature of computer memory aside, byte
> > objects in Python are collections of ints.
> >
> 
> not when you start talking about bit-wise operations :-)
> 
> If a "byte" in python was an integer, then we'd use b**2 rather than b << 1

Why? Ints support bit-shift operations:

py> 45 << 1
90

as well as other bitwise operations. (And b<<1 isn't the same as b**2.) 
I trust that you aren't going to argue that 45 isn't an int.


> (yes, I know bit shifting can be more efficient, but I don't think that's
> why it's there in python)
> 
> The entire point of bitwise operators is so that the bits themselves can be
> accessed and manipulated.

Then where are the primitives for accessing and manipulating individual bits?

> > If you want those ints to represent something else, you're responsible
> > for handling that (say, using the struct module).
> 
> 
> yup -- with struct, and, hmm, maybe bitwise operators?

Right. Which is why we want to add support for bitwise operators to 
bytes.


> Anyway, as you say, this is a Philosophical (or semantic) point -- I don't
> think it effects the discussion at hand.
> 
> However, when you talk about bit-shifting a bytes object, you do need to
> decide if each byte is handled individually, or if they are one big
> collection.

True, we have to decide that. Is there a good reason to handle each byte 
individually?


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-21 Thread Chris Barker via Python-ideas
On Sat, May 19, 2018 at 6:52 AM, Steven D'Aprano 
wrote:

> Philosophical arguments about the nature of computer memory aside, byte
> objects in Python are collections of ints.
>

not when you start talking about bit-wise operations :-)

If a "byte" in python was an integer, then we'd use b**2 rather than b << 1

(yes, I know bit shifting can be more efficient, but I don't think that's
why it's there in python)

The entire point of bitwise operators is so that the bits themselves can be
accessed and manipulated.


> If you want those ints to represent something else, you're responsible
> for handling that (say, using the struct module).


yup -- with struct, and, hmm, maybe bitwise operators?

Anyway, as you say, this is a Philosophical (or semantic) point -- I don't
think it effects the discussion at hand.

However, when you talk about bit-shifting a bytes object, you do need to
decide if each byte is handled individually, or if they are one big
collection.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-19 Thread Steven D'Aprano
On Fri, May 18, 2018 at 06:25:19PM -0400, Chris Barker - NOAA Federal via 
Python-ideas wrote:

> > I suppose you could argue that a "byte" is a patch of
> > storage capable of holding a number from 0 to 255, as opposed to being
> > the number itself, but that's getting rather existential :)
> 
> No, I’m making the distinction that an eight bit byte is, well,  eight
> bits, that CAN represent a number from 0 to 255, or it can represent
> any other data type — like one eighth of the bits in a float, for
> instance. Or a bit field, or 1/2 a 16 bit int.

That might be the case if we were talking about chunks of memory in RAM, 
but we're talking about *bytes objects in Python* where the individual 
items in the object most certainly represent ints between 0 and 255:

py> b"abcde"[3]
100


Philosophical arguments about the nature of computer memory aside, byte 
objects in Python are collections of ints.

If you want those ints to represent something else, you're responsible 
for handling that (say, using the struct module).


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Ken Hilton
On Fri, 18 May 2018 13:49:58 -0300, Facundo Batista wrote:
> Once you think as the whole sequence of bytes as a sequence of bits (eight
times longer, of course), all questions are easly answered, see below...

I had never considered this before, but it seems very interesting.
Essentially that would make bytes1 ^ bytes2 be equivalent to

(int.from_bytes(bytes1, sys.byteorder) ^ int.from_bytes(bytes2,
sys.byteorder)).to_bytes(max(len(bytes1), len(bytes2)), sys.byteorder)


rather than

bytes(a ^ b for a, b in zip(bytes1, bytes2))

I like that idea more than my original elementwise idea.

​Thinking,
Ken Hilton;​
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Chris Angelico
On Sat, May 19, 2018 at 8:25 AM, Chris Barker - NOAA Federal
 wrote:
>> I suppose you could argue that a "byte" is a patch of
>> storage capable of holding a number from 0 to 255, as opposed to being
>> the number itself, but that's getting rather existential :)
>
> No, I’m making the distinction that an eight bit byte is, well,  eight
> bits, that CAN represent a number from 0 to 255, or it can represent
> any other data type — like one eighth of the bits in a float, for
> instance. Or a bit field, or 1/2 a 16 bit int.

Since "bit" simply means "binary digit", that's like saying that a
four-digit number isn't a number; it MIGHT represent a number, but it
might represent one quarter of your credit card. Is "4564" less of a
number for that reason?

>> In Python, a "bytes" object represents a sequence of eight-bit units.
>> When you subscript a bytes [1], you get back an integer with the value
>> at that position.
>
> And when you print it, you get the ascii characters corresponding to
> each byte

That's because those numbers can often be used to represent
characters. But they are really and truly numbers.

(If you want to get down to brass tacks, a Unicode string could be
treated as a sequence of 21-bit numbers. And in some languages, the
"string" type is actually a highly-optimized version of
21-bit-number-array - or 32-bit, perhaps - with fully supported
use-cases involving numerical (non-textual) data.)

> So one element in a bytes object is no more an integer than a character

Except that the bytestring b"\x00\x80\xff\x99" very clearly represents
four numbers, but doesn't clearly represent any characters.

>> Maybe I'm completely misunderstanding your statement here.
>
> Again, it doesn’t much matter, until you get to deciding how to
> bitshift an entire bytes object.

Bitshifting a sequence of bytes has nothing whatsoever to do with
characters. It has to do with the individual numbers, and then you
have to decide how you represent those as a collective: little-endian
or big-endian. That's still a matter of numbers, not characters.

ChrisA
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Chris Barker - NOAA Federal via Python-ideas
>> actually, bytes are, well, bytes ;-) -- that is, 8 bits.
>
> Grammatically, you appear to be disagreeing with the assertion that
> bytes are numbers. Is that the case?

Um, yes. Though I think for the rest of the conversation, it’s a
distinction that doesn’t matter.

> If you want to be extremely technical, an "octet" is a group of eight
> bits (or eight musicians, but I haven't yet figured out how to send
> musicians down an ethernet cable), and a "byte" isn't as rigidly
> defined. But on modern PCs, you can fairly safely assume that they're
> synonymous.

Sure — a byte, I think, is the smallest unit of memory that can be
addressed. It could be more or less that 8 bytes, but that wasn’t the
point either.

> I suppose you could argue that a "byte" is a patch of
> storage capable of holding a number from 0 to 255, as opposed to being
> the number itself, but that's getting rather existential :)

No, I’m making the distinction that an eight bit byte is, well,  eight
bits, that CAN represent a number from 0 to 255, or it can represent
any other data type — like one eighth of the bits in a float, for
instance. Or a bit field, or 1/2 a 16 bit int.

> In Python, a "bytes" object represents a sequence of eight-bit units.
> When you subscript a bytes [1], you get back an integer with the value
> at that position.

And when you print it, you get the ascii characters corresponding to
each byte

So one element in a bytes object is no more an integer than a character

> So if a collection of them is called a "bytes" and
> one of them is an integer in range(0, 256), doesn't it stand to reason
> that a byte is a number?

We use a decimal number (and ascii) to represent the bytes, as it’s
more human readable and consistent with other python types.

> Maybe I'm completely misunderstanding your statement here.

Again, it doesn’t much matter, until you get to deciding how to
bitshift an entire bytes object.

-CHB
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Chris Angelico
On Sat, May 19, 2018 at 2:32 AM, Chris Barker via Python-ideas
 wrote:
> On Thu, May 17, 2018 at 11:07 PM, Greg Ewing 
> wrote:
>>
>> Steven D'Aprano wrote:
>>>
>>> But XORing bytes seems perfectly reasonable. Bytes are numbers, even if
>>> we display them as ASCII characters.
>
>
> actually, bytes are, well, bytes ;-) -- that is, 8 bits.

Grammatically, you appear to be disagreeing with the assertion that
bytes are numbers. Is that the case?

If you want to be extremely technical, an "octet" is a group of eight
bits (or eight musicians, but I haven't yet figured out how to send
musicians down an ethernet cable), and a "byte" isn't as rigidly
defined. But on modern PCs, you can fairly safely assume that they're
synonymous. I suppose you could argue that a "byte" is a patch of
storage capable of holding a number from 0 to 255, as opposed to being
the number itself, but that's getting rather existential :)

In Python, a "bytes" object represents a sequence of eight-bit units.
When you subscript a bytes [1], you get back an integer with the value
at that position. So if a collection of them is called a "bytes" and
one of them is an integer in range(0, 256), doesn't it stand to reason
that a byte is a number?

Maybe I'm completely misunderstanding your statement here.

ChrisA

[1] Yes, I'm aware that older versions of Python behaved differently. [2]
[2] I'm also aware that putting square brackets after the word "bytes"
could be interpreted as subscripting the 'bytes' type itself, rather
than creating a reference to a footnote. Ain't contextual grammar fun?
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Richard Damon
On 5/18/18 12:32 PM, Chris Barker via Python-ideas wrote:
> actually, bytes are, well, bytes ;-) -- that is, 8 bits. But the point
> is that "bitwise" operations make all the sense in the world for
> bytes, but not for unicode text -- did anyone have a use case for
> bitwise operation on unicode strings? I can't imagine one, even if you
> could agree on a definition...
>
The one case I could think of is a very crude form of encryption.
Probably most makes sense if you assume the string is really just ASCII.
Not sure if that is a good enough case to be worth adding it, as often
you special case some characters (like control characters) to avoid
messing things up too much for display.

-- 
Richard Damon

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Facundo Batista
2018-05-18 13:32 GMT-03:00 Chris Barker via Python-ideas
:

> or would it operate on the whole sequence as a single collection of bits?

Once you think as the whole sequence of bytes as a sequence of bits
(eight times longer, of course), all questions are easly answered, see
below...


> Would it be any different? Without thinking hard, it seems some operations,
> like AND and OR and XOR would be the same, but bit shifting would be
> different.

AND, XOR and OR are simple. Bit shifting will shift bits for the whole
bit sequence.


> And then what do you do if the two bytes objects are not the same length?

Pad with 0s to the left. Exactly the same that happen in

>>> 1 ^ 1231312312
1231312313

Regards,

-- 
.Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/
Twitter: @facundobatista
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Chris Barker via Python-ideas
On Thu, May 17, 2018 at 11:07 PM, Greg Ewing 
wrote:

> Steven D'Aprano wrote:
>
>> But XORing bytes seems perfectly reasonable. Bytes are numbers, even if
>> we display them as ASCII characters.
>
>
actually, bytes are, well, bytes ;-) -- that is, 8 bits. But the point is
that "bitwise" operations make all the sense in the world for bytes, but
not for unicode text -- did anyone have a use case for bitwise operation on
unicode strings? I can't imagine one, even if you could agree on a
definition...

Yep. Implement it for bytes,


What exactly would be implemented?

bytes is a sequence type, so would a bitwise operator perform the operation
"elementwise"? (i.e. like numpy)

or would it operate on the whole sequence as a single collection of bits?

Would it be any different? Without thinking hard, it seems some operations,
like AND and OR and XOR would be the same, but bit shifting would be
different.

And then what do you do if the two bytes objects are not the same length?

If "elementwise", then we should think carefully about that -- no where
else does Python do things elementwise in the standard library -- and we
already have numpy if you want to do it now.

and a bytes object can be representing any type of data -- so one byte at a
time might make sense, but maybe two bytes at a time makes more sense --
and if the data is representing, say an integer, then endian-ness matters...

All this is handled by numpy by having multiple data types that can be
"mapped" to a buffer.

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-18 Thread Greg Ewing

Steven D'Aprano wrote:
But XORing bytes seems perfectly reasonable. Bytes are numbers, even if 
we display them as ASCII characters.


Yep. Implement it for bytes, then the user can decode/encode
as appropriate for the application.

--
Greg

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Terry Reedy

On 5/17/2018 6:53 AM, Ken Hilton wrote:

Hi all,

We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ 
(not). We know how they work with numbers:


420 ^ 502

110100100
10110
== XOR ==
001010010
= 82

But it might be useful in some cases to (let's say) xor a string (or 
bytestring):


HELLO ^ world

01001000 01000101 01001100 01001100 0100
01110111 0110 01110010 01101100 01100100
=== XOR 
0011 00101010 0010 0010 00101011
= ?*> +

Currently, that's done with this expression for strings:

     >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
     '?*> +'

and this expression for bytestrings:

     >>> bytes(a ^ b for a, b in zip(b'HELLO', b'world'))
     b'?*> +'

It would be much more convenient, however, to allow a simple xor of a 
string:


     >>> 'HELLO' ^ 'world'
     '?*> +'

or bytestring:

     >>> b'HELLO' ^ b'world'
     b'?*> +'

(All of this applies to other bitwise operators, of course.)
Compatibility issues are a no-brainer - currently, bitwise operators for 
strings raise TypeErrors.


https://bugs.python.org/issue19251 bitwise ops for bytes of equal length


--
Terry Jan Reedy


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Antoine Pitrou

I agree with Steven.  XORing unicode strings doesn't make sense, and is
pointless anyway.  The only interesting question is whether we want to
add bytewise operations to the stdlib.

Regards

Antoine.


On Thu, 17 May 2018 23:13:22 +1000
Steven D'Aprano  wrote:
> On Thu, May 17, 2018 at 03:49:02PM +0300, Serhiy Storchaka wrote:
> > 17.05.18 15:20, Steven D'Aprano пише:  
> > >On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:  
> > >>17.05.18 13:53, Ken Hilton пише:  
> > >>>But it might be useful in some cases to (let's say) xor a string (or
> > >>>bytestring):  
> > >>
> > >>The question is how common a need of these operations? If it is not
> > >>common enough, they are better be implemented as functions in a
> > >>third-party library.  
> > >
> > >The real question is, what does it mean to XOR a text string?  
> > 
> > The OP explained this meaning with a sample implementation.  
> 
> No, he didn't explain the meaning. He gave an example, but not a reason 
> why it should do what he showed.
> 
> Why should the *abstract character* 'H' XORed with the abstract 
> character 'w' return the abstract character '?'? Why shouldn't the 
> result be '>' instead?
> 
> (For the record that's using 'EBCDIC-CP-BE'.)
> 
> The point is, XORing abstract characters seems meaningless to me. If the 
> OP has an explanation for why 'H'^'?' must mean '?', he should explain 
> it.
> 
> XORing code points could easily generate invalid Unicode sequences 
> containing lone surrogates, say, or undefined characters. Or as you 
> point out, out of range values.
> 
> 
> But XORing bytes seems perfectly reasonable. Bytes are numbers, even if 
> we display them as ASCII characters.
> 
> 
> 
> > >>Are you aware that this can raise a ValueError for some input strings?
> > >>For example for '\U0010' and '\U0001'.  
> > >
> > >That works for me.
> > >  
> > >py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U0010',   
> > >'\U0010'))
> > >'\x00'  
> > 
> > Try with '\U0010' and '\U0001', not with '\U0010' and 
> > '\U0010'.  
> 
> Oops, sorry. I misread your post.
> 
> 



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Ken Hilton
On  Thu, 17 May 2018 23:13:22 +1000, Steven D'Aprano wrote:
> No, he didn't explain the meaning. He gave an example, but not a reason
why it should do what he showed.
>
> Why should the *abstract character* 'H' XORed with the abstract
character 'w' return the abstract character '?'? Why shouldn't the
result be '>' instead?

My initial thought was that 'H' ^ 'w' -> '?' because when I was
experimenting with the idea, ord('H') ^ ord('w') -> ord('?'). However, I do
see your point that different encodings give different results, so I'll
drop the idea of bitwise operations on strings.

> XORing code points could easily generate invalid Unicode sequences
containing lone surrogates, say, or undefined characters. Or as you
point out, out of range values.

Invalid Unicode sequences, lone surrogates, and undefined characters, IMO,
are simply consequences of misusing the operators. I hadn't anticipated the
ValueError for '\U0010' and '\U0001', though, which is another
reason for me to drop bitwise operations on strings.

> But XORing bytes seems perfectly reasonable. Bytes are numbers, even if
we display them as ASCII characters.

My thought exactly.

On Thu, 17 May 2018 22:20:43 +1000, Steven D'Aprano wrote:
> What if the strings are unequal lengths?

(out-of-order quote lol)
Then the operators would raise a ValueError. (Assuming bytestrings, since
again, I'm dropping text strings.)

​Sharing ideas​
,
Ken
​ Hilton​
;
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Antoine Pitrou

Indeed, at this point Numpy sounds like the ideal solution for such use
cases.

Regards

Antoine.


On Thu, 17 May 2018 13:06:59 +0200
Stephan Houben 
wrote:
> Seems you want numpy:
> 
> >>> import numpy
> >>> numpy.frombuffer(b"Hello", dtype=numpy.uint8) ^  
> numpy.frombuffer(b"World", dtype=numpy.uint8)
> array([31, 10, 30,  0, 11], dtype=uint8)
> 
> Stephan
> 
> 2018-05-17 12:53 GMT+02:00 Ken Hilton 
> :
> 
> > Hi all,
> >
> > We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ (not).
> > We know how they work with numbers:
> >
> > 420 ^ 502
> >
> > 110100100
> > 10110
> > == XOR ==
> > 001010010
> > = 82
> >
> > But it might be useful in some cases to (let's say) xor a string (or
> > bytestring):
> >
> > HELLO ^ world
> >
> > 01001000 01000101 01001100 01001100 0100
> > 01110111 0110 01110010 01101100 01100100
> > === XOR 
> > 0011 00101010 0010 0010 00101011
> > = ?*> +
> >
> > Currently, that's done with this expression for strings:
> >  
> > >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))  
> > '?*> +'  
> >
> > and this expression for bytestrings:
> >  
> > >>> bytes(a ^ b for a, b in zip(b'HELLO', b'world'))  
> > b'?*> +'  
> >
> > It would be much more convenient, however, to allow a simple xor of a
> > string:
> >  
> > >>> 'HELLO' ^ 'world'  
> > '?*> +'  
> >
> > or bytestring:
> >  
> > >>> b'HELLO' ^ b'world'  
> > b'?*> +'  
> >
> > (All of this applies to other bitwise operators, of course.)
> > Compatibility issues are a no-brainer - currently, bitwise operators for
> > strings raise TypeErrors.
> >
> > Thanks.
> >
> > Suggesting,
> > Ken
> > ​ Hilton​
> > ;
> >
> > ___
> > Python-ideas mailing list
> > Python-ideas@python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >
> >  
> 



___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Steven D'Aprano
On Thu, May 17, 2018 at 03:49:02PM +0300, Serhiy Storchaka wrote:
> 17.05.18 15:20, Steven D'Aprano пише:
> >On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:
> >>17.05.18 13:53, Ken Hilton пише:
> >>>But it might be useful in some cases to (let's say) xor a string (or
> >>>bytestring):
> >>
> >>The question is how common a need of these operations? If it is not
> >>common enough, they are better be implemented as functions in a
> >>third-party library.
> >
> >The real question is, what does it mean to XOR a text string?
> 
> The OP explained this meaning with a sample implementation.

No, he didn't explain the meaning. He gave an example, but not a reason 
why it should do what he showed.

Why should the *abstract character* 'H' XORed with the abstract 
character 'w' return the abstract character '?'? Why shouldn't the 
result be '>' instead?

(For the record that's using 'EBCDIC-CP-BE'.)

The point is, XORing abstract characters seems meaningless to me. If the 
OP has an explanation for why 'H'^'?' must mean '?', he should explain 
it.

XORing code points could easily generate invalid Unicode sequences 
containing lone surrogates, say, or undefined characters. Or as you 
point out, out of range values.


But XORing bytes seems perfectly reasonable. Bytes are numbers, even if 
we display them as ASCII characters.



> >>Are you aware that this can raise a ValueError for some input strings?
> >>For example for '\U0010' and '\U0001'.
> >
> >That works for me.
> >
> >py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U0010', 
> >'\U0010'))
> >'\x00'
> 
> Try with '\U0010' and '\U0001', not with '\U0010' and 
> '\U0010'.

Oops, sorry. I misread your post.


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Serhiy Storchaka

17.05.18 15:20, Steven D'Aprano пише:

On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:

17.05.18 13:53, Ken Hilton пише:

But it might be useful in some cases to (let's say) xor a string (or
bytestring):


The question is how common a need of these operations? If it is not
common enough, they are better be implemented as functions in a
third-party library.


The real question is, what does it mean to XOR a text string?


The OP explained this meaning with a sample implementation.


Are you aware that this can raise a ValueError for some input strings?
For example for '\U0010' and '\U0001'.


That works for me.

py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U0010', '\U0010'))
'\x00'


Try with '\U0010' and '\U0001', not with '\U0010' and 
'\U0010'.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Steven D'Aprano
On Thu, May 17, 2018 at 02:14:10PM +0300, Serhiy Storchaka wrote:
> 17.05.18 13:53, Ken Hilton пише:
> >We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ 
> >(not). We know how they work with numbers:
> >
> >420 ^ 502
> >
> >110100100
> >10110
> >== XOR ==
> >001010010
> >= 82
> >
> >But it might be useful in some cases to (let's say) xor a string (or 
> >bytestring):
> 
> The question is how common a need of these operations? If it is not 
> common enough, they are better be implemented as functions in a 
> third-party library.

The real question is, what does it mean to XOR a text string?

What is 'π' XOR '≠'? I can think of at least three possibilities:

'-\t'  # treat the strings as UTF-8
'↠'  # treat them as UTF-16 or UTF-32
'\x14' # treat them as MacRoman

What if the strings are unequal lengths?


But XORing equal length byte strings makes sense to me. There's no 
ambiguity: it is equivalent to just XORing each byte with the 
corresponding byte.


 
> >Currently, that's done with this expression for strings:
> >
> >     >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
> >     '?*> +'
> 
> Are you aware that this can raise a ValueError for some input strings? 
> For example for '\U0010' and '\U0001'.

That works for me.

py> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('\U0010', '\U0010'))
'\x00'


-- 
Steve

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Serhiy Storchaka

17.05.18 13:53, Ken Hilton пише:
We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ 
(not). We know how they work with numbers:


420 ^ 502

110100100
10110
== XOR ==
001010010
= 82

But it might be useful in some cases to (let's say) xor a string (or 
bytestring):


The question is how common a need of these operations? If it is not 
common enough, they are better be implemented as functions in a 
third-party library.



Currently, that's done with this expression for strings:

     >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
     '?*> +'


Are you aware that this can raise a ValueError for some input strings? 
For example for '\U0010' and '\U0001'.


___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Stephan Houben
Seems you want numpy:

>>> import numpy
>>> numpy.frombuffer(b"Hello", dtype=numpy.uint8) ^
numpy.frombuffer(b"World", dtype=numpy.uint8)
array([31, 10, 30,  0, 11], dtype=uint8)

Stephan

2018-05-17 12:53 GMT+02:00 Ken Hilton :

> Hi all,
>
> We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ (not).
> We know how they work with numbers:
>
> 420 ^ 502
>
> 110100100
> 10110
> == XOR ==
> 001010010
> = 82
>
> But it might be useful in some cases to (let's say) xor a string (or
> bytestring):
>
> HELLO ^ world
>
> 01001000 01000101 01001100 01001100 0100
> 01110111 0110 01110010 01101100 01100100
> === XOR 
> 0011 00101010 0010 0010 00101011
> = ?*> +
>
> Currently, that's done with this expression for strings:
>
> >>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
> '?*> +'
>
> and this expression for bytestrings:
>
> >>> bytes(a ^ b for a, b in zip(b'HELLO', b'world'))
> b'?*> +'
>
> It would be much more convenient, however, to allow a simple xor of a
> string:
>
> >>> 'HELLO' ^ 'world'
> '?*> +'
>
> or bytestring:
>
> >>> b'HELLO' ^ b'world'
> b'?*> +'
>
> (All of this applies to other bitwise operators, of course.)
> Compatibility issues are a no-brainer - currently, bitwise operators for
> strings raise TypeErrors.
>
> Thanks.
>
> Suggesting,
> Ken
> ​ Hilton​
> ;
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] String and bytes bitwise operations

2018-05-17 Thread Ken Hilton
Hi all,

We all know the bitwise operators: & (and), | (or), ^ (xor), and ~ (not).
We know how they work with numbers:

420 ^ 502

110100100
10110
== XOR ==
001010010
= 82

But it might be useful in some cases to (let's say) xor a string (or
bytestring):

HELLO ^ world

01001000 01000101 01001100 01001100 0100
01110111 0110 01110010 01101100 01100100
=== XOR 
0011 00101010 0010 0010 00101011
= ?*> +

Currently, that's done with this expression for strings:

>>> ''.join(chr(ord(a) ^ ord(b)) for a, b in zip('HELLO', 'world'))
'?*> +'

and this expression for bytestrings:

>>> bytes(a ^ b for a, b in zip(b'HELLO', b'world'))
b'?*> +'

It would be much more convenient, however, to allow a simple xor of a
string:

>>> 'HELLO' ^ 'world'
'?*> +'

or bytestring:

>>> b'HELLO' ^ b'world'
b'?*> +'

(All of this applies to other bitwise operators, of course.)
Compatibility issues are a no-brainer - currently, bitwise operators for
strings raise TypeErrors.

Thanks.

Suggesting,
Ken
​ Hilton​
;
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/