On 7/3/2018 5:37 PM, Serhiy Storchaka wrote:
I like programming languages in which all are expressions (including function declarations, branching and loops) and you can use an assignment at any point, but Python is built on other ways, and I like Python too. PEP 572 looks violating several Python design principles. Python looks simple language, and this is its strong side. I believe most Python users are not professional programmers -- they are sysadmins, scientists, hobbyists and kids -- but Python is suitable for them because its clear syntax and encouraging good style of programming. In particularly mutating and non-mutating operations are separated. The assignment expression breaks this. There should be very good reasons for doing this. But it looks to me that all examples for PEP 572 can be written better without using the walrus operator.
I appreciate you showing alternatives I can use now. Even once implemented, one can not use A.E's until one no longer cares about 3.7 compatibility. Then there will still be a choice.

results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]

     results = [(x, y, x/y) for x in input_data for y in [f(x)] if y > 0]

Would (f(x),) be faster?

import timeit as ti

print(ti.timeit('for y in {x}: pass', 'x=1'))
print(ti.timeit('for y in [x]: pass', 'x=1'))
print(ti.timeit('for y in (x,): pass', 'x=1'))

# prints
0.13765254499999996  # seconds per 1_000_000 = microseconds each.
0.10321274000000003
0.09492473300000004

Yes, but not enough to pay for adding ',', and sometimes forgetting.

stuff = [[y := f(x), x/y] for x in range(5)]
 stuff = [[y, x/y] for x in range(5) for y in [f(x)]]

Creating an leaky name binding appears to about 5 x faster than iterating a temporary singleton.

print(ti.timeit('y=x', 'x=1'))
print(ti.timeit('y=x; del y', 'x=1'))
#
0.017357778999999907
0.021115051000000107

If one adds 'del y' to make the two equivalent, the chars typed is about the same. To me, the choice amounts to subject reference. Even with y:=x available, I would write the expansion as

res = []
for x in range(5):
    y = f(x)
    res.append((y, x/y))

rather than use the assignment expression in the tuple. It creates a 'hitch' in thought.

This idiom looks unusual for you? But this is a legal Python syntax, and it is not more unusual than the new walrus operator. This idiom is not commonly used because there is very little need of using above examples in real code. And I'm sure that the walrus operator in comprehension will be very rare unless PEP 572 will encourage writing complicated comprehensions. Most users prefer to write an explicit loop.

I want to remember that PEP 572 started from the discussion on Python-ideas which proposed a syntax for writing the following code as a comprehension:

     smooth_signal = []
     average = initial_value
     for xt in signal:
         average = (1-decay)*average + decay*xt
         smooth_signal.append(average)

Using the "for in []" idiom this can be written (if you prefer comprehensions) as:

     smooth_signal = [average
                      for average in [initial_value]
                      for x in signal
                      for average in [(1-decay)*average + decay*x]]

Try now to write this using PEP 572. The walrus operator turned to be less suitable for solving the original problem because it doesn't help to initialize the initial value.


Examples from PEP 572:

# Loop-and-a-half
while (command := input("> ")) != "quit":
    print("You entered:", command)

The straightforward way:

     while True:
         command = input("> ")
         if command == "quit": break
         print("You entered:", command)

The clever way:

     for command in iter(lambda: input("> "), "quit"):
         print("You entered:", command)

The 2-argument form of iter is under-remembered and under-used. The length difference is 8.
    while (command := input("> ")) != "quit":
    for command in iter(lambda: input("> "), "quit"):

I like the iter version, but the for-loop machinery and extra function call makes a minimal loop half a millisecond slower.

import timeit as ti

def s():
    it = iter(10000*'0' + '1')

def w():
    it = iter(10000*'0' + '1')
    while True:
        command = next(it)
        if command == '1': break

def f():
    it = iter(10000*'0' + '1')
    for command in iter(lambda: next(it), '1'): pass

print(ti.timeit('s()', 'from __main__ import s', number=1000))
print(ti.timeit('w()', 'from __main__ import w', number=1000))
print(ti.timeit('f()', 'from __main__ import f', number=1000))
#
0.0009702129999999975
0.9365254250000001
1.5913117949999998

Of course, with added processing of 'command' the time difference disappears. Printing (in IDLE) is an extreme case.

def wp():
    it = iter(100*'0' + '1')
    while True:
        command = next(it)
        if command == '1': break
        print('w', command)

def fp():
    it = iter(100*'0' + '1')
    for command in iter(lambda: next(it), '1'):
        print('f', command)

print(ti.timeit('wp()', 'from __main__ import wp', number=1))
print(ti.timeit('fp()', 'from __main__ import fp', number=1))
#
0.48
0.47

# Capturing regular expression match objects
# See, for instance, Lib/pydoc.py, which uses a multiline spelling
# of this effect
if match := re.search(pat, text):
    print("Found:", match.group(0))
# The same syntax chains nicely into 'elif' statements, unlike the
# equivalent using assignment statements.
elif match := re.search(otherpat, text):
    print("Alternate found:", match.group(0))
elif match := re.search(third, text):
    print("Fallback found:", match.group(0))

It may be more efficient to use a single regular expression which consists of multiple or-ed patterns

My attempt resulted in a slowdown. Duplicating the dominance of pat over otherpat over third requires, I believe, negative lookahead assertions.
---

import re
import timeit as ti

##print(ti.timeit('for y in {x}: pass', 'x=1'))
##print(ti.timeit('for y in [x]: pass', 'x=1'))
##print(ti.timeit('for y in (x,): pass', 'x=1'))
##
##print(ti.timeit('y=x', 'x=1'))
##print(ti.timeit('y=x; del y', 'x=1'))

pat1 = re.compile('1')
pat2 = re.compile('2')
pat3 = re.compile('3')
pat123 = re.compile('1|2(?!.*1)|3(?!.*(1|2))')
# I think most people would prefer to use the 3 simple patterns.

def ifel(text):
    match = re.search(pat1, text)
    if match: return match.group()
    match = re.search(pat2, text)
    if match: return match.group()
    match = re.search(pat3, text)
    if match: return match.group()

def mach(text):
    match = re.search(pat123, text)
    return match.group()

print([ifel('321'), ifel('32x'), ifel('3xx')] == ['1', '2', '3'])
print([mach('321'), mach('32x'), mach('3xx')] == ['1', '2', '3'])
# True, True

text = '0'*10000 + '321'
print(ti.timeit('ifel(text)', "from __main__ import ifel, text", number=100000)) print(ti.timeit('mach(text)', "from __main__ import mach, text", number=100000))
# 0.77, 7.22

marked as different groups.

When I put parens around 1, 2, 3 in pat123, the 2nd timeit continued until I restarted Shell. Maybe you can do better.


For example see the cute regex-based tokenizer in gettext.py:

_token_pattern = re.compile(r"""
        (?P<WHITESPACES>[ \t]+)                    | # spaces and horizontal tabs
        (?P<NUMBER>[0-9]+\b)                       | # decimal integer
        (?P<NAME>n\b)                              | # only n is allowed
        (?P<PARENTHESIS>[()])                      |
        (?P<OPERATOR>[-*/%+?:]|[><!]=?|==|&&|\|\|) | # !, *, /, %, +, -, <, >,                                                      # <=, >=, ==, !=, &&, ||,
                                                     # ? :
                                                     # unary and bitwise ops
                                                     # not allowed
        (?P<INVALID>\w+|.)                           # invalid token
    """, re.VERBOSE|re.DOTALL)

def _tokenize(plural):
    for mo in re.finditer(_token_pattern, plural):
        kind = mo.lastgroup
        if kind == 'WHITESPACES':
            continue
        value = mo.group(kind)
        if kind == 'INVALID':
            raise ValueError('invalid token in plural form: %s' % value)
        yield value
    yield ''

I have not found any code similar to the PEP 572 example in pydoc.py. It has different code:

pattern = re.compile(r'\b((http|ftp)://\S+[\w/]|'
                        r'RFC[- ]?(\d+)|'
                        r'PEP[- ]?(\d+)|'
                        r'(self\.)?(\w+))')
...
start, end = match.span()
results.append(escape(text[here:start]))

all, scheme, rfc, pep, selfdot, name = match.groups()
if scheme:
    url = escape(all).replace('"', '&quot;')
    results.append('<a href="%s">%s</a>' % (url, url))
elif rfc:
    url = 'http://www.rfc-editor.org/rfc/rfc%d.txt' % int(rfc)
    results.append('<a href="%s">%s</a>' % (url, escape(all)))
elif pep:
...

It doesn't look as a sequence of re.search() calls. It is more clear and efficient, and using the assignment expression will not make it better.

# Reading socket data until an empty string is returned
while data := sock.recv():
    print("Received data:", data)

     for data in iter(sock.recv, b''):
         print("Received data:", data)

if pid := os.fork():
    # Parent code
else:
    # Child code

     pid = os.fork()
     if pid:
         # Parent code
     else:
         # Child code


It looks to me that there is no use case for PEP 572. It just makes Python worse.



--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to