Re: [Python-Dev] Call for prudence about PEP-572

2018-07-09 Thread Anthony Flury via Python-Dev

On 09/07/18 08:26, Matěj Cepl wrote:

On 2018-07-07, 15:48 GMT, Guido van Rossum wrote:

 if validate(name := re.search(pattern, line).group(1)):
 return name

Except there is no error handling for situations when
re.search() returns None, so one shouldn't use it anyway (most
of the time). Which seems to me like another nice example why
one should stay away from this style as much as possible. I am
too lazy to be tempted into this
nice-example-terrible-production-code world.

So wrap it in a try/except to capture the re.search returning None:

    try:
        if validate(name := re.search(pattern, line).group(1)):
         return name
except TypeError:
          # No match case

And this solution is even better if the validate function raises an 
exception rather than True/False


which I think is much better than trying to do multiple steps :

   match = re.search(pattern, line)
   if match is not None:
       name = match.group(1)
       if name and validate(name):
          return name
   else:
   # No valid case
   else:
       # No Match case


Best,

Matěj



--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Hashes in Python3.5 for tuples and frozensets

2018-05-17 Thread Anthony Flury via Python-Dev

Chris,
I entirely agree. The same questioner also asked about the fastest data 
type to use as a key in a dictionary; and which data structure is 
fastest. I get the impression the person is very into 
micro-optimization, without profiling their application. It seems every 
choice is made based on the speed of that operation; without 
consideration of how often that operation is used.


On 17/05/18 09:16, Chris Angelico wrote:

On Thu, May 17, 2018 at 5:21 PM, Anthony Flury via Python-Dev
<python-dev@python.org> wrote:

Victor,
Thanks for the link, but to be honest it will just confuse people - neither
the link or the related bpo entries state that the fix is only limited to
strings. They simply talk about hash randomization - which in my opinion
implies ALL hash algorithms; which is why I asked the question.

I am not sure how much should be exposed about the scope of security fixes
but you can understand my (and other's) confusion.

I am aware that applications shouldn't make assumptions about the value of
any given hash value - apart from some simple assumptions based hash value
equality (i.e. if two objects have different hash values they can't be the
same value).

The hash values of Python objects are calculated by the __hash__
method, so arbitrary objects can do what they like, including
degenerate algorithms such as:

class X:
 def __hash__(self): return 7
Agreed - I should have said the default hash algorithm. Hashes for 
custom object are entirely application dependent.


So it's impossible to randomize ALL hashes at the language level. Only
str and bytes hashes are randomized, because they're the ones most
likely to be exploitable - for instance, a web server will receive a
query like "http://spam.example/target?a=1=2=3; and provide a
dictionary {"a":1, "b":2, "c":3}. Similarly, a JSON decoder is always
going to create string keys in its dictionaries (JSON objects). Do you
know of any situation in which an attacker can provide the keys for a
dict/set as integers?
I was just asking the question - rather than critiquing the fault-fix. I 
am actually more concerned that the documentation relating to the fix 
doesn't make it clear that only strings have their hashes randomised.



/B//TW : //
//
//This question was prompted by a question on a social media platform about
the whether hash values are transferable between across platforms.
Everything I could find stated that after Python 3.3 ALL hash values were
randomized - but that clearly isn't the case; and the original questioner
identified that some hash values are randomized and other aren't.//
/

That's actually immaterial. Even if the hashes weren't actually
randomized, you shouldn't be making assumptions about anything
specific in the hash, save that *within one Python process*, two equal
values will have equal hashes (and therefore two objects with unequal
hashes will not be equal).
Entirely agree - I was just trying to get to the bottom of the 
difference - especially considering that the documentation I could find 
implied that all hash algorithms had been randomized.

//I did suggest strongly to the original questioner that relying on the same
hash value across different platforms wasn't a clever solution - their
original plan was to store hash values in a cross system database to enable
quick retrieval of data (!!!). I did remind the OP that a hash value wasn't
guaranteed to be unique anyway - and they might come across two different
values with the same hash - and no way to distinguish between them if all
they have is the hash. Hopefully their revised design will store the key,
not the hash./

Uhh if you're using a database, let the database do the work of
being a database. I don't know what this "cross system database" would
be implemented in, but if it's a proper multi-user relational database
engine like PostgreSQL, it's already going to have way better indexing
than anything you'd do manually. I think there are WAY better
solutions than worrying about Python's inbuilt hashing.

Agreed

If you MUST hash your data for sharing and storage, the easiest
solution is to just use a cryptographic hash straight out of
hashlib.py.
As stated before - I think the original questioner was intent on micro 
optimizations - and they had hit on the idea that storing an integer 
would be quicker than storing as string - entirely ignoring both the 
practicality of trying to code all strings into a value (since hashes 
aren't guaranteed not to collide), and the issues of trying to reverse 
that translation once the stored key had been retrieved.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/anthony.flury%40btinternet.com


Thanks for your comments :-)

--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
T

Re: [Python-Dev] Hashes in Python3.5 for tuples and frozensets

2018-05-17 Thread Anthony Flury via Python-Dev

Victor,
Thanks for the link, but to be honest it will just confuse people - 
neither the link or the related bpo entries state that the fix is only 
limited to strings. They simply talk about hash randomization - which in 
my opinion implies ALL hash algorithms; which is why I asked the question.


I am not sure how much should be exposed about the scope of security 
fixes but you can understand my (and other's) confusion.


I am aware that applications shouldn't make assumptions about the value 
of any given hash value - apart from some simple assumptions based hash 
value equality (i.e. if two objects have different hash values they 
can't be the same value).


/B//TW : //
//
//This question was prompted by a question on a social media platform 
about the whether hash values are transferable between across platforms. 
Everything I could find stated that after Python 3.3 ALL hash values 
were randomized - but that clearly isn't the case; and the original 
questioner identified that some hash values are randomized and other 
aren't.//

//
//I did suggest strongly to the original questioner that relying on the 
same hash value across different platforms wasn't a clever solution - 
their original plan was to store hash values in a cross system database 
to enable quick retrieval of data (!!!). I did remind the OP that a hash 
value wasn't guaranteed to be unique anyway - and they might come across 
two different values with the same hash - and no way to distinguish 
between them if all they have is the hash. Hopefully their revised 
design will store the key, not the hash./



On 17/05/18 07:38, Victor Stinner wrote:

Hi,

String hash is randomized, but not the integer hash:

$ python3.5 -c 'print(hash("abc"))'
-8844814677999896014
$ python3.5 -c 'print(hash("abc"))'
-7757160699952389646

$ python3.5 -c 'print(hash(1))'
1
$ python3.5 -c 'print(hash(1))'
1

frozenset hash is combined from values of the set. So it's only
randomized if values hashes are randomized.

The denial of service is more likely to occur with strings as keys,
than with integers.

See the following link for more information:
http://python-security.readthedocs.io/vuln/cve-2012-1150_hash_dos.html

Victor

2018-05-16 17:48 GMT-04:00 Anthony Flury via Python-Dev <python-dev@python.org>:

This may be known but I wanted to ask this esteemed body first.

I understand that from Python3.3 there was a security fix to ensure that
different python processes would generate different hash value for the same
input - to prevent denial of service based on crafted hash conflicts.

I opened two python REPLs on my Linux 64bit PC and did the following

Terminal 1:

 >>> hash('Hello World')
-1010252950208276719

 >>> hash( frozenset({1,9}) )
  -7625378979602737914
 >>> hash(frozenset({300,301}))
-8571255922896611313

 >>> hash((1,9))
3713081631926832981
 >>> hash((875,932))
3712694086932196356



Terminal 2:

 >>> hash('Hello World')
-8267767374510285039

 >>> hash( frozenset({1,9}) )
  -7625378979602737914
 >>> hash(frozenset({300,301}))
-8571255922896611313

 >>> hash((1,9))
3713081631926832981
 >>> hash((875,932))
3712694086932196356

As can be seen - taking a hash of a string does indeed create a different
value between the two processes (as expected).

However the frozen set hash, the same in both cases, as is the hash of the
tuples - suggesting that the vulnerability resolved in Python 3.3 wasn't
resolved across all potentially hashable values. lI even used different
large numbers to ensure that the integers weren't being interned.

I can imagine that frozensets aren't used frequently as hash keys - but I
would think that tuples are regularly used. Since that their hashes are not
salted does the vulnerability still exist in some form ?.

--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury <https://twitter.com/TonyFlury/>*

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/vstinner%40redhat.com



--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury <https://twitter.com/TonyFlury/>*

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Hashes in Python3.5 for tuples and frozensets

2018-05-16 Thread Anthony Flury via Python-Dev

This may be known but I wanted to ask this esteemed body first.

I understand that from Python3.3 there was a security fix to ensure that 
different python processes would generate different hash value for the 
same input - to prevent denial of service based on crafted hash conflicts.


I opened two python REPLs on my Linux 64bit PC and did the following

Terminal 1:

>>> hash('Hello World')
   -1010252950208276719

>>> hash( frozenset({1,9}) )
 -7625378979602737914
>>> hash(frozenset({300,301}))
   -8571255922896611313

>>> hash((1,9))
   3713081631926832981
>>> hash((875,932))
   3712694086932196356



Terminal 2:

>>> hash('Hello World')
   -8267767374510285039

>>> hash( frozenset({1,9}) )
 -7625378979602737914
>>> hash(frozenset({300,301}))
   -8571255922896611313

>>> hash((1,9))
   3713081631926832981
>>> hash((875,932))
   3712694086932196356

As can be seen - taking a hash of a string does indeed create a 
different value between the two processes (as expected).


However the frozen set hash, the same in both cases, as is the hash of 
the tuples - suggesting that the vulnerability resolved in Python 3.3 
wasn't resolved across all potentially hashable values. lI even used 
different large numbers to ensure that the integers weren't being interned.


I can imagine that frozensets aren't used frequently as hash keys - but 
I would think that tuples are regularly used. Since that their hashes 
are not salted does the vulnerability still exist in some form ?.


--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Review of Pull Request 5974 please

2018-04-29 Thread Anthony Flury via Python-Dev

All,

Can someone please review Pull Request 5974 
 on Python3.8 - the Pull 
request was submitted on 4th March - this pull request is associated 
with bpo-32933 


To summarize the point of this pull request:

It fixes a bug of omission within mock_open 
 
(part of unittest.mock)


The functionality of mock_open enables the test code to mock a file 
being opened with some data which can be read. Importantly, mock_open 
has a read_data attrribute which can be used to specify the data to read 
from the file.


The mocked file which is opened correctly supports file.read(), 
file.readlines(), file.readline(). These all make use of the read_data 
as expected, and the mocked file also supports being opened as a context 
manager.


But the mock_open file does not support iteration  - so pythonic code 
which uses a for loop to iterate around the file content will only ever 
appear to iterate around an empty file, regardless of the read_data 
attribute when the mock_open is created


So non-pythonic methods to iterate around the file contents - such as 
this :


    data = opened_file.readlines()
    for line in data:
        process_line(line)

and this :

       line = opened_file.readline()
       while line:
   process_line(line)
               line = opened_file.readline()

Can both be tested with the mocked file containing simulated data (using 
the read_data attribute) as expected.


But this code (which by any standard is the 'correct' way to iterate 
around the file content of a text file):


    for line in opened_file:
        process_line(line)

Will only ever appear to iterate around an empty file when tested using 
mock_open.


I would like this to be reviewed so it can be back-ported into Python3.7 
and 3.6 if at all possible. I know that the bug has existed since the 
original version of mock_open, but it does seem strange that code under 
test which uses a pythonic code structure can't be fully tested fully 
using the standard library.


--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] assignment expressions: an alternative proposal

2018-04-24 Thread Anthony Flury via Python-Dev

On 24/04/18 17:11, Yury Selivanov wrote:

On Tue, Apr 24, 2018 at 12:03 PM, Ethan Furman  wrote:
[..]

But I do write this:

   def wrapper(func, some_value):
 value_I_want = process(some_value)
 def wrapped(*args, **kwds):
   if value_I_want == 42:
  ...

But this pattern is more rare than comparing local variables. That's
the point I'm trying to use.  Besides, to make it an assignment
expression under my proposal you would need to use parens. Which makes
it even less likely that you confuse '=' and '=='.


Just because you wrap a set of character in parens doesn't mean that you 
wont potentially mistype what you should type inside the parens. The 
failure mode of in C :


    if (a = 3)
        do_something_with_a(a);

Is Incredibly common even with very experienced developers - so much so 
that most linters flag it as a likely error, and I think gcc has an 
option to flag it as a warning - even though it is valid and very 
occasionally it is useful.


Also many developers who come to Python from languages such as C will 
still place parens around conditionals - this means that a typo which 
will cause a Syntax Error in current versions, but would cause a 
potentially subtle bug under your implementation (unless you maintain 
the rule that you can't rebind currently bound names - which renders the 
whole idea useless in loops (as already discussed at length).


I also still can't think of a single other Python construct where the 
semantics of an operator are explicitly modified by syntaxtic elements 
outside the operator. For mathematical operators, the surrounding parens 
modifies the grouping of the operators but not the semantics (* means *, 
it is just the operands which potentially change).


You could argue that your proposal overloads the semantics of the parens 
(similar to how braces are overloaded to implement dictionaries and set 
literals), but I don't think that overloading the semantics of parens is 
good idea.


If Python is going to do assignment expressions we shouldn't overload 
parens in my opinion - we should have a separate operator - doing this 
avoids needing to exclude rebinding, and makes such expressions 
considerably more useful.


--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] assignment expressions: an alternative proposal

2018-04-24 Thread Anthony Flury via Python-Dev

On 24/04/18 14:50, Yury Selivanov wrote:

On Tue, Apr 24, 2018 at 9:46 AM, Nick Coghlan  wrote:

On 24 April 2018 at 23:38, Yury Selivanov  wrote:

I propose to use the following syntax for assignment expressions:

 ( NAME = expr )

I know that it was proposed before and this idea was rejected, because
accidentally using '=' in place of '==' is a pain point in
C/C++/JavaScript.

That said, I believe we can still use this syntax as long as we impose
the following three restrictions on it:

1. Only NAME token is allowed as a single target.

2. Parenthesis are required.

3. Most importantly: it is *not* allowed to mask names in the current
local scope.

While I agree this would be unambiguous to a computer, I think for
most humans it would be experienced as a confusing set of arcane and
arbitrary rules about what "=" means in Python.

I respectfully disagree.  There are no "arcane and confusing rules"
about "=", it's rather simple:

"=" is always an assignment.

But it isn't - in your proposed syntax :

 * * = * is an assignment with no return value
 * *( = )* is an assignment with a returned value

  So now '=' is always an assignment, it is an assignment with extra 
semantics depending on surrounding syntax.


As discussed previously by others on this exact proposals, you now have 
the issue of  confusion when using keyword arguments : *my_func(a = b)* 
: clearly that is a call to `my_func' where argument a has the value of 
b, but if you want to do an assigment expression when calling the 
function you now have to do *my_func((a=b)) -* which frankly looks messy 
in my opinion; you get the same issue when you are wanting to do 
assignment expressions in tuples.


Using a different operator for assignments which return values avoids 
the messy potentially multiple level brackets, and means that the 
semantics of an operator depends only on that operator and not on syntax 
elements before and after it.


--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 572: Assignment Expressions

2018-04-21 Thread Anthony Flury via Python-Dev

On 21/04/18 11:18, Chris Angelico wrote:

But you haven't answered anything about what "readable" means. Does it
mean "if I look at this code, I can predict what dis.dis() would
output"? Or does it mean "this code clearly expresses an algorithm and
the programmer's intent"? Frequently I hear people complain that
something is unreadable because it fails the former check. I'm much
more interested in the latter check. For instance, this line of code
expresses the concept "generate the squares of odd numbers":

[x*x for x in range(100) if x % 2]

But it doesn't clearly express the disassembly. Is that a problem? Are
list comprehensions a bad feature for that reason? I don't think so.

ChrisA


For what it worth - readability for me is all about understanding the 
intent. I don't care (most of the time) about how the particular code 
construct is actually implemented. When I am maintaining code (or trying 
to) I need to understand what the developer intended (or in the case of 
a bug, the gap between the outcome and the intention).


One of the challenges about readability is it partially depends on skill 
level - for a beginner the comprehension may well be baffling where as 
someone with more skills would  understand it - almost intuitively; as 
an example: I have been using Python for 7 years - and comprehensions 
with more than one for loop still are not intuitive for me, I can't read 
them without an amount of deep thought about how the loops work together.


--
--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 572: Assignment Expressions

2018-04-21 Thread Anthony Flury via Python-Dev

On 21/04/18 08:46, Chris Angelico wrote:

doubled_items = [x for x in (items := get_items()) if x * 2 in items]

This will leak 'items' into the surrounding scope (but not 'x').
At the risk of stating the obvious - wasn't there work in Python 3 to 
prevent leakage from comprehensions ?

[x for x in x if x] # This works
[x for y in x if x := y] # UnboundLocalError


The standard library example given earlier notwithstanding, I can see no 
benefit in using the same name as the iterator and the loop target name. 
To be honest I have trouble parsing that first version, and keeping 
track of which x is which (especially which x is being used in the 
conditional clause) : surely this would be better : [x_item for x_item 
in x if x_item]


Your 2nd example makes no sense to me as to the intention of the code - 
the re-use of the name x is confusing at best.



--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 572: Assignment Expressions

2018-04-20 Thread Anthony Flury via Python-Dev
I am entirely new to this list, but if I can I would like share my 
comments :


 * I do think this proposal  :=  has merit in my
   opinion; it does make some code more readable.

 * I think readability is only improved if :

 * target is restricted to a simple name - I don't see a benefit in
   more complex targets
 * chaining is not allowed - I think the construct :

            while (line := input.read_row()) is not None:
                    process_line(line)

        Is readable, but :

            while (current_line := line := input.read_row()) is not
   None:
                    line = process_line(line)

  is not obvious - and certainly isn't any more obvious than :

            while (line := input.read_row()) is not None:
                    current_line = line
                line = process_line(line)

 * The current expectations of how comprehensions work should also be
   honored; I don't claim to have fully followed all of the discussions
   around this, but it seems to me that comprehensions work in a
   particular way because of a concerted effect (especially in Python
   3) to make them that way. They are self contained and don't leak
   values in their containing scope. Similarly I think that setting
   variables within a comprehension is just for the benefit of readable
   code within the comprehension - i.e. :

stuff = [[y, x/y] for x in range(5) for y in [f(x)]]

                can become :

stuff = [[y := f(x), x/y] for x in range(5)]

So - overall from me a conditional +1 - conditions as above; if they are 
not possible then -1 from me.


--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Timeline for Pull request reviews in 3.8

2018-04-06 Thread Anthony Flury via Python-Dev

All,

The three pull requests are :

Python 2.7 - doc string fix : https://github.com/python/cpython/pull/6015

Python 3.8 - documentation fix : https://github.com/python/cpython/pull/5982

Python 3.8 - Small bug fix on unittest.mock.mock_open : 
https://github.com/python/cpython/pull/5974


The Py2.7 change does not need to be rolled forward to Python3 documentation

The two Py3.8 fixes could/should/can ? be backported to earlier versions

These are all trivial with no conflicts with their target branch (or at 
least there wasn't when I made the requests).



--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Timeline for Pull request reviews in 3.8

2018-04-05 Thread Anthony Flury via Python-Dev
Can anyone enlighten me on what the expected time-line is for reviewing 
pull requests made on 3.8.


I made a few simple fixes in Early March - and I understand everyone is 
busy.


What is the time line and cut off dates for backports to 3.7 and 3.6.

I also made a documentation change (based on a open bug report) into 
2.7, and I am keen to understand the planned time-line for those too.


--
Anthony Flury
email : *anthony.fl...@btinternet.com*
Twitter : *@TonyFlury *
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Git hub : CLA Not Signed label

2018-03-10 Thread Anthony Flury via Python-Dev

All,
I submitted two Pull Requests last Sunday, only a few hours after I 
signed the CLA.


I understand why the 'Knights who say ni' marked the Pull request as 
'CLA Not Signed' Label at the time I submitted the Pull requests, but I 
was wondering when the Labels get reset.


How often (if at all) does the bot look at old pull requests ?

Thanks for any help you can give, I am sorry if the question sounds basic.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com