On 7/3/2018 5:37 PM, Serhiy Storchaka wrote:
I like programming languages in which all are expressions (including
function declarations, branching and loops) and you can use an
assignment at any point, but Python is built on other ways, and I like
Python too. PEP 572 looks violating several Python design principles.
Python looks simple language, and this is its strong side. I believe
most Python users are not professional programmers -- they are
sysadmins, scientists, hobbyists and kids -- but Python is suitable for
them because its clear syntax and encouraging good style of programming.
In particularly mutating and non-mutating operations are separated. The
assignment expression breaks this. There should be very good reasons for
doing this. But it looks to me that all examples for PEP 572 can be
written better without using the walrus operator.
I appreciate you showing alternatives I can use now. Even once
implemented, one can not use A.E's until one no longer cares about 3.7
compatibility. Then there will still be a choice.
results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
results = [(x, y, x/y) for x in input_data for y in [f(x)] if y > 0]
Would (f(x),) be faster?
import timeit as ti
print(ti.timeit('for y in {x}: pass', 'x=1'))
print(ti.timeit('for y in [x]: pass', 'x=1'))
print(ti.timeit('for y in (x,): pass', 'x=1'))
# prints
0.13765254499999996 # seconds per 1_000_000 = microseconds each.
0.10321274000000003
0.09492473300000004
Yes, but not enough to pay for adding ',', and sometimes forgetting.
stuff = [[y := f(x), x/y] for x in range(5)]
stuff = [[y, x/y] for x in range(5) for y in [f(x)]]
Creating an leaky name binding appears to about 5 x faster than
iterating a temporary singleton.
print(ti.timeit('y=x', 'x=1'))
print(ti.timeit('y=x; del y', 'x=1'))
#
0.017357778999999907
0.021115051000000107
If one adds 'del y' to make the two equivalent, the chars typed is about
the same. To me, the choice amounts to subject reference. Even with
y:=x available, I would write the expansion as
res = []
for x in range(5):
y = f(x)
res.append((y, x/y))
rather than use the assignment expression in the tuple. It creates a
'hitch' in thought.
This idiom looks unusual for you? But this is a legal Python syntax, and
it is not more unusual than the new walrus operator. This idiom is not
commonly used because there is very little need of using above examples
in real code. And I'm sure that the walrus operator in comprehension
will be very rare unless PEP 572 will encourage writing complicated
comprehensions. Most users prefer to write an explicit loop.
I want to remember that PEP 572 started from the discussion on
Python-ideas which proposed a syntax for writing the following code as a
comprehension:
smooth_signal = []
average = initial_value
for xt in signal:
average = (1-decay)*average + decay*xt
smooth_signal.append(average)
Using the "for in []" idiom this can be written (if you prefer
comprehensions) as:
smooth_signal = [average
for average in [initial_value]
for x in signal
for average in [(1-decay)*average + decay*x]]
Try now to write this using PEP 572. The walrus operator turned to be
less suitable for solving the original problem because it doesn't help
to initialize the initial value.
Examples from PEP 572:
# Loop-and-a-half
while (command := input("> ")) != "quit":
print("You entered:", command)
The straightforward way:
while True:
command = input("> ")
if command == "quit": break
print("You entered:", command)
The clever way:
for command in iter(lambda: input("> "), "quit"):
print("You entered:", command)
The 2-argument form of iter is under-remembered and under-used. The
length difference is 8.
while (command := input("> ")) != "quit":
for command in iter(lambda: input("> "), "quit"):
I like the iter version, but the for-loop machinery and extra function
call makes a minimal loop half a millisecond slower.
import timeit as ti
def s():
it = iter(10000*'0' + '1')
def w():
it = iter(10000*'0' + '1')
while True:
command = next(it)
if command == '1': break
def f():
it = iter(10000*'0' + '1')
for command in iter(lambda: next(it), '1'): pass
print(ti.timeit('s()', 'from __main__ import s', number=1000))
print(ti.timeit('w()', 'from __main__ import w', number=1000))
print(ti.timeit('f()', 'from __main__ import f', number=1000))
#
0.0009702129999999975
0.9365254250000001
1.5913117949999998
Of course, with added processing of 'command' the time difference
disappears. Printing (in IDLE) is an extreme case.
def wp():
it = iter(100*'0' + '1')
while True:
command = next(it)
if command == '1': break
print('w', command)
def fp():
it = iter(100*'0' + '1')
for command in iter(lambda: next(it), '1'):
print('f', command)
print(ti.timeit('wp()', 'from __main__ import wp', number=1))
print(ti.timeit('fp()', 'from __main__ import fp', number=1))
#
0.48
0.47
# Capturing regular expression match objects
# See, for instance, Lib/pydoc.py, which uses a multiline spelling
# of this effect
if match := re.search(pat, text):
print("Found:", match.group(0))
# The same syntax chains nicely into 'elif' statements, unlike the
# equivalent using assignment statements.
elif match := re.search(otherpat, text):
print("Alternate found:", match.group(0))
elif match := re.search(third, text):
print("Fallback found:", match.group(0))
It may be more efficient to use a single regular expression which
consists of multiple or-ed patterns
My attempt resulted in a slowdown. Duplicating the dominance of pat
over otherpat over third requires, I believe, negative lookahead assertions.
---
import re
import timeit as ti
##print(ti.timeit('for y in {x}: pass', 'x=1'))
##print(ti.timeit('for y in [x]: pass', 'x=1'))
##print(ti.timeit('for y in (x,): pass', 'x=1'))
##
##print(ti.timeit('y=x', 'x=1'))
##print(ti.timeit('y=x; del y', 'x=1'))
pat1 = re.compile('1')
pat2 = re.compile('2')
pat3 = re.compile('3')
pat123 = re.compile('1|2(?!.*1)|3(?!.*(1|2))')
# I think most people would prefer to use the 3 simple patterns.
def ifel(text):
match = re.search(pat1, text)
if match: return match.group()
match = re.search(pat2, text)
if match: return match.group()
match = re.search(pat3, text)
if match: return match.group()
def mach(text):
match = re.search(pat123, text)
return match.group()
print([ifel('321'), ifel('32x'), ifel('3xx')] == ['1', '2', '3'])
print([mach('321'), mach('32x'), mach('3xx')] == ['1', '2', '3'])
# True, True
text = '0'*10000 + '321'
print(ti.timeit('ifel(text)', "from __main__ import ifel, text",
number=100000))
print(ti.timeit('mach(text)', "from __main__ import mach, text",
number=100000))
# 0.77, 7.22
marked as different groups.
When I put parens around 1, 2, 3 in pat123, the 2nd timeit continued
until I restarted Shell. Maybe you can do better.
For example see the cute regex-based tokenizer in gettext.py:
_token_pattern = re.compile(r"""
(?P<WHITESPACES>[ \t]+) | # spaces and
horizontal tabs
(?P<NUMBER>[0-9]+\b) | # decimal integer
(?P<NAME>n\b) | # only n is allowed
(?P<PARENTHESIS>[()]) |
(?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +,
-, <, >,
# <=, >=, ==, !=,
&&, ||,
# ? :
# unary and
bitwise ops
# not allowed
(?P<INVALID>\w+|.) # invalid token
""", re.VERBOSE|re.DOTALL)
def _tokenize(plural):
for mo in re.finditer(_token_pattern, plural):
kind = mo.lastgroup
if kind == 'WHITESPACES':
continue
value = mo.group(kind)
if kind == 'INVALID':
raise ValueError('invalid token in plural form: %s' % value)
yield value
yield ''
I have not found any code similar to the PEP 572 example in pydoc.py. It
has different code:
pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
r'RFC[- ]?(\d+)|'
r'PEP[- ]?(\d+)|'
r'(self\.)?(\w+))')
...
start, end = match.span()
results.append(escape(text[here:start]))
all, scheme, rfc, pep, selfdot, name = match.groups()
if scheme:
url = escape(all).replace('"', '"')
results.append('<a href="%s">%s</a>' % (url, url))
elif rfc:
url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif pep:
...
It doesn't look as a sequence of re.search() calls. It is more clear and
efficient, and using the assignment expression will not make it better.
# Reading socket data until an empty string is returned
while data := sock.recv():
print("Received data:", data)
for data in iter(sock.recv, b''):
print("Received data:", data)
if pid := os.fork():
# Parent code
else:
# Child code
pid = os.fork()
if pid:
# Parent code
else:
# Child code
It looks to me that there is no use case for PEP 572. It just makes
Python worse.
--
Terry Jan Reedy
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com