[issue28927] bytes.fromhex should ignore all whitespace

2017-03-31 Thread Donald Stufft

Changes by Donald Stufft :


--
pull_requests: +994

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-19 Thread Terry J. Reedy

Terry J. Reedy added the comment:

I think non-ASCII whitespace and digits are YAGNI until we are convinced 
otherwise by evidence from the field that people are routinely mixing other 
decimal digits with 'abcdef' as hex numerals.  Anyone who does try such a thing 
can write a wrapper that first translates to ascii digits.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Thank you for your contribution Robert.

--
resolution:  -> fixed
stage: commit review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-19 Thread Roundup Robot

Roundup Robot added the comment:

New changeset fcc09d9ee7d4 by Serhiy Storchaka in branch 'default':
Issue #28927: bytes.fromhex() and bytearray.fromhex() now ignore all ASCII
https://hg.python.org/cpython/rev/fcc09d9ee7d4

--
nosy: +python-dev

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-19 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

LGTM.

--
assignee:  -> serhiy.storchaka
components: +Interpreter Core
stage: patch review -> commit review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-19 Thread Robert Xiao

Robert Xiao added the comment:

New patch with proper line lengths in documentation.

--
Added file: http://bugs.python.org/file45967/fromhex.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-19 Thread Robert Xiao

Robert Xiao added the comment:

OK, I've attached a new version of the patch with the requested documentation 
changes (versionchanged and whatsnew).

--
Added file: http://bugs.python.org/file45966/fromhex.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-18 Thread Nick Coghlan

Nick Coghlan added the comment:

+1 to the above Unicode whitespace discussion - the "ASCII space -> ASCII 
whitespace" change is relatively straightforward to implement, and clearly 
beneficial given Robert's point regarding the popularity of multi-line 
terminal-oriented hexdump formats.

Neither of those points apply to Unicode whitespace (it would be much harder to 
implement, and we don't have a demonstrated use case for it), so it makes sense 
to avoid going that far.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

No, it is not easy to implement. Currently bytes.fromhex() works only with 
ASCII strings and can just iterate over char*.

If add support of non-ASCII whitespaces, we should add a support of non-ASCII 
digits, as in float(). This looks excessive.

In case if there is a case for this, we rather would expose 
_PyUnicode_TransformDecimalAndSpaceToASCII() in Python level.

In general fromhex.patch LGTM, but please add versionchanged directives and 
document the change in What's New.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-18 Thread Martin Panter

Martin Panter added the comment:

As far as I know, non-ASCII newlines and whitespace are not supported in Python 
source code, so there is not a big need to support it in bytes.fromhex() 
either. But since bytes.fromhex() accepts Unicode strings, I think non-ASCII 
whitespace would be okay if it was easy to implement.

Serhiy: Whitespace is not treated specially by the base-64 decoders, it is just 
treated like any non-alphabet character. See 
 and 
b64decode(validate=True).

Regarding hex-codec, I doubt many people use it in Python 3. To decode a whole 
string, binascii.unhexlify() or bytes.fromhex() is probably more obvious, and I 
think the incremental decoder never worked properly (Issue 20132).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-18 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Is it worth to ignore also non-ASCII whitespaces for compatibility with 
string-to-number convertions and base64 decoder?

>>> float('\xa01.0')
1.0
>>> base64.decodebytes(b'\xa0YWJj')
b'abc'

Note that not all spaces are ignored. They shouldn't break hexadecimal pairs.

>>> bytes.fromhex('a b')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: non-hexadecimal number found in fromhex() arg at position 1

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-17 Thread Nick Coghlan

Nick Coghlan added the comment:

Ah, the parallel with base64 decoding and embedding encoded data in multi-line 
string literals is indeed a compelling one - I'd missed that.

Given that rationale, +1 from me.

Perhaps it would make sense to call that out directly in the documentation? 
Something like a second example saying:


Ignoring all ASCII whitespace provides compatibility with common hexdump 
formats (like the output of ``xxd``), allowing such data to easily be read from 
a file or included directly in source code as a multiline string literal::

>>> bytes.fromhex("""
... F0F1F2F3F4
... FAFBFCFDFE
""")
b'\xf0\xf1\xf2\xf4\xfa\xfb\xfc\xfd'



And then a versionchanged note for 3.7 to say that this was switched from 
filtering out specifically space to filtering out any ASCII whitespace.

My other question would be whether or not a separate issue should be filed to 
update the bytes-to-bytes "hex" codec to be consistent with this change - at 
the moment, it doesn't allow whitespace *at all* (not even ASCII spaces), while 
the base64 decoder is consistent with base64.b64decode and allows it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-17 Thread Robert Xiao

Robert Xiao added the comment:

I see your point, Nick. Can I offer a counterpoint?

Most of the string parsers operate only on relatively short inputs, like 
numbers. Numbers in particular are rarely written with inner spaces, so it 
makes sense not to ignore internal whitespaces.

On the other hand, hexadecimal data can be very long, and is often formatted 
with spaces and newlines. For example, the default output of `xxd -p`, a format 
quite suitable for copy-paste, looks like this:

cffaedfe0701038002001500d80885802100
190048005f5f504147455a45524f
0100
190018035f5f54455854
01909d01

It would be desirable to write something like

blob = bytes.fromhex('''
cffaedfe0701038002001500d80885802100
190048005f5f504147455a45524f
0100
190018035f5f54455854
01909d01
''')

and not have to worry about sticking in some whitespace remover, like this:

blob = bytes.fromhex(''.join('''
cffaedfe0701038002001500d80885802100
190048005f5f504147455a45524f
0100
190018035f5f54455854
01909d01
'''.split()))

or removing the newlines in the source code, which impacts readability. 

Similar kinds of whitespaced output (sometimes with spaces between octets, 
words or dwords, sometimes with tabs between 8-16 byte groups, sometimes with 
newlines between groups, etc.) can be found in the wild and from the "hex" 
clipboard output from various applications.

We can already have newlines and other whitespace with base64, which is in 
principle quite similar:

blob = base64.b64decode('''
z/rt/gcAAAEDAACAAgAAABUAAADYCAAAhYAhAAAZSF9fUEFHRVpFUk8A
AAABAAAZGAMAAF9fVEVYVAAA
AQCQnQEAkJ0BAAcFCQBfX3Rl
eHQAX19URVhUAAALAAABRCF5AQAACwAACAAA
''')

so I think it makes sense to support other whitespaces in fromhex. I'm happy to 
reconsider if there's a strong argument against adding this convenience.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-17 Thread Nick Coghlan

Nick Coghlan added the comment:

My recollection is that fromhex() ignores spaces to account for particularly 
common ways of formatting hex numbers as space separated groups:

"CAFE F00D"
"CAFEF00D CAFEF00D CAFEF00D"
"CA FE F0 0D"
etc

Those show up even in structured hexadecimal data (like hex editor output and 
memory dumps in log files).

That recollection is supported by the example and specification of specifically 
"[0-9a-fA-F ]" in PEP 358 (https://www.python.org/dev/peps/pep-0358/) that 
guided Georg Brandl's initial implementation in 
http://bugs.python.org/issue1669379

Generally speaking, the type level string parsers *aren't* permissive when it 
comes to their whitespace handling - if they allow whitespace at all, it's 
usually only at the beginning or end, where it gets ignored (hence the PEP for 
3.6 to allow underscores in both numeric literals and in the numeric 
constructors).

=
>>> float(" 1.0")
1.0
>>> float("1.0 ")
1.0
>>> float("1 0")
Traceback (most recent call last):
  File "", line 1, in 
ValueError: could not convert string to float: '1 0'
>>> float("1. 0")
Traceback (most recent call last):
  File "", line 1, in 
ValueError: could not convert string to float: '1. 0'
>>> float("1 .0")
Traceback (most recent call last):
  File "", line 1, in 
ValueError: could not convert string to float: '1 .0'
=

The general technique to strip whitespace from *any* input data before handing 
it to a constructor relies on str.translate:

=
>>> import string
>>> _del_whitespace = str.maketrans('', '', string.whitespace)
>>> def clean_whitespace(text):
... return text.translate(_del_whitespace)
... 
>>> clean_whitespace('CA FE\nF0\t0D')
'CAFEF00D'
=

(http://stackoverflow.com/questions/3739909/how-to-strip-all-whitespace-from-string
 also points out `"".join(text.split())` as a clever one liner for a similar 
outcome)

So I'm inclined to advise *against* making any changes here - the apparent 
benefit is more a symptom of the fact that there isn't an immediately obvious 
spelling of "strip all whitespace, including that between other characters, 
from this string" that can be readily applied as an initial filter on the 
incoming data.

(However, I do sometimes wonder if it would make sense to offer a 
"str.stripall()" that defaulted to removing all whitespace, rather than 
treating this purely as a particular use case for str.translate)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-16 Thread Robert Xiao

Robert Xiao added the comment:

Sorry, I should have clarified that these methods consider *ASCII whitespace* 
equivalent - just like my proposed patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-16 Thread Robert Xiao

Robert Xiao added the comment:

Terry, can you elaborate what you mean by a tradeoff? I feel like such a patch 
makes .fromhex more consistent with other string methods like .split() and 
.strip() which implicitly consider all whitespace equivalent.

Martin, I've updated the patch to include documentation updates and more 
testcases.

--
Added file: http://bugs.python.org/file45933/fromhex.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-16 Thread Terry J. Reedy

Terry J. Reedy added the comment:

I am a bit dubious about this.  There is a tradeoff here between convenience 
and bug detection.  The patch is not strictly necessary.

>>> bytes.fromhex('ab\ncd'.replace('\n', ''))
b'\xab\xcd'

Bytes (and bytearray) .fromhex already ignores spaces.  Not ignoring newlines, 
by default, may have been intentional.  Nick, you wrote the doc entry for 
'fromhex'.  Do you know anything about the design intention?

--
nosy: +ncoghlan, terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-09 Thread Martin Panter

Martin Panter added the comment:

Seems a reasonable feature. The documentation would also need updating.

Which specific (whitespace) characters do you propose to ignore? Just ASCII 
ones, as in bytes.isspace(), or others like b"\xA0" (non-breaking space) and 
U+2028 (line separator), as in str.isspace()? Perhaps you should add test cases 
to clarify.

--
nosy: +martin.panter
stage:  -> patch review
versions: +Python 3.7 -Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-09 Thread Robert Xiao

Robert Xiao added the comment:

I used Py_ISSPACE, which uses the .strip() default charset - I think this is a 
reasonable choice. We don't have to go crazy and support all the Unicode spaces.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28927] bytes.fromhex should ignore all whitespace

2016-12-09 Thread Robert Xiao

New submission from Robert Xiao:

bytes.fromhex ignores space characters now (yay!) but still barfs if fed 
newlines or tabs:

>>> bytes.fromhex('ab\ncd')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: non-hexadecimal number found in fromhex() arg at position 2
>>> bytes.fromhex('ab\tcd')
Traceback (most recent call last):
  File "", line 1, in 
ValueError: non-hexadecimal number found in fromhex() arg at position 2

It's often quite useful to paste blobs of hex into source code (or the REPL) 
and call ".fromhex" on them. These might include spaces, tabs and/or newlines, 
and barfing on these other whitespace characters is inconvenient.

I propose that bytes.fromhex should ignore all whitespace. A patch + test is 
attached.

--
files: fromhex.patch
keywords: patch
messages: 282811
nosy: nneonneo
priority: normal
severity: normal
status: open
title: bytes.fromhex should ignore all whitespace
type: enhancement
versions: Python 3.6
Added file: http://bugs.python.org/file45823/fromhex.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com