[Python-Dev] Re: Deferred, coalescing, and other very recent reference counting optimization

2020-09-03 Thread Larry Hastings


On 9/2/20 8:50 PM, Jim J. Jewett wrote:

I suspect that splitting the reference count away from the object itself could 
also be profitable, as it means the cache won't have to be dirtied (and 
flushed) on read access, and can keep Copy-On-Write from duplicating pages.


I had a patch from Thomas Wouters doing that for the Gilectomy. Last 
time I tried it, it was a performance wash.



//arry/

___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/RTRVQJNOJPGKBTJVPD62Z7LTKR7QI7FB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Travis CI migrated from the legacy GitHub API to the new GitHub Action

2020-09-03 Thread Victor Stinner
Hi,

tl; dr Travis CI issues are now resolved thanks to Ernest!

During the last 3 months, the Travis CI job failed randomly to report
the build status to GitHub pull requests:
https://github.com/python/core-workflow/issues/371

I discovered that travis-ci.org uses the legacy GitHub API, whereas
there is a new travis-ci.com website which uses the new GitHub Action
API. The migration started in May 2018, is still in the beta phase,
and must be done manually:

* 
https://blog.travis-ci.com/2018-05-02-open-source-projects-on-travis-ci-com-with-github-apps
* 
https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com/#existing-open-source-repositories-on-travis-ciorg

Two weeks ago, Ernest W. Durbin III migrated the GitHub "python"
organization from travis-ci.org to travis-ci.com, and also migrated
the cpython project to travis-ci.com (the migration requires to
migrate each project individually).

Since the migration, I didn't notice the "Travis CI doesn't report the
build status" issue anymore on new pull requests. Great!

Thanks Ernest!

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/ZABQWCQKUNKNO5XUCNIMB7GELSXTA646/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Python logging with a custom clock

2020-09-03 Thread N. Benes
Dear list,

log records in the Python logging library always use timestamps provided
by `time.time()`, i.e. the usual system clock (UTC, CLOCK_REALTIME).

This time is used as absolute timestamp in log records and for
timestamps relative to the load of the library (Lib/logging/__init__.py:
`_startTime`).

I would like to receive feedback to and propose the attached (and
possibly incomplete) patch that allows the programmer to provide a
custom callable to get the time instead of `time.time()`.
For example, this could be:

clock = lambda: time.clock_gettime(time.CLOCK_TAI)

and the callable could be provided during initial logging setup

logging.basicConfig(clock=clock, ...)

There is a similar approach in log4j to specify a custom clock [0].


This change enables the use of non-UTC clocks, e.g. `CLOCK_TAI` or
`CLOCK_MONOTONIC`, which are unaffected by leap seconds and count SI
seconds. (In fact, logging's use of differences of UTC timestamps could
make users believe that the obtained duration reflects SI seconds, which
it doesn't in all cases.)

Combining a custom absolute clock such as `CLOCK_TAI` with custom log
formatters allows users to /store or transfer/ log records with TAI
timestamps, and /display/ them with UTC timestamps (e.g. properly
converted from TAI to UTC with a "60" second during active leap second).
This resolves the ambiguity when analysing and correlating logs from
different machines also during leap seconds.


Attached is a simple example showing the different timestamps based on
UTC and TAI (assuming the current offset of +37 seconds [1] is properly
configured on the host, e.g. through PTP or `adjtimex()` with `ADJ_TAI`).

$ export TZ=GMT
$ date --iso-8601=seconds  && python3 example.py
2020-09-02T14:34:14+00:00
2020-09-02T14:34:51+ INFO message

According to the documentation `time.CLOCK_TAI` was introduced in Python
3.9 [2], but already today the system constant can be used (e.g. on
Debian Buster, Linux 4.19.0, Python 3.7.3 it is 11).

The two patches provided are for Python 3.7.3 (Debian Buster) and Python
3.8.5 (python.org). In the latter case, it may need to be considered how
changing the Python logging clock works: it probably should fail if
handlers are already configured, unless `force` is also provided and
handlers are reset.

Kind regards,
-- nicolas benes


[0] `log4j.Clock` in
https://logging.apache.org/log4j/log4j-2.8/manual/configuration.html
[1] https://www.timeanddate.com/worldclock/other/tai
[2] https://docs.python.org/dev/library/time.html#time.CLOCK_TAI


--
Nicolas Benes   [email protected]
   +Software Engineer
 +E S+  European Southern Observatory  https://www.eso.org
   OKarl-Schwarzschild-Strasse 2
   +D-85748 Garching b. Muenchen
Germany
--

>From e19413a025d7807f68fc83f17cbb5da147d869f5 Mon Sep 17 00:00:00 2001
From: Nicolas Benes 
Date: Tue, 1 Sep 2020 18:33:38 +0200
Subject: [PATCH] Logging with custom clock

---
 Lib/logging/__init__.py | 20 
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/Lib/logging/__init__.py b/Lib/logging/__init__.py
index 2761509..54ccf9e 100644
--- a/Lib/logging/__init__.py
+++ b/Lib/logging/__init__.py
@@ -49,10 +49,15 @@ __date__= "07 February 2010"
 #   Miscellaneous module data
 #---
 
+#
+#_clock_gettime is a callable to get the current time in seconds
+#
+_clock_gettime = time.time
+
 #
 #_startTime is used as the base when calculating the relative time of events
 #
-_startTime = time.time()
+_startTime = _clock_gettime()
 
 #
 #raiseExceptions is used to see if exceptions during handling should be
@@ -295,7 +300,7 @@ class LogRecord(object):
 """
 Initialize a logging record with interesting information.
 """
-ct = time.time()
+ct = _clock_gettime()
 self.name = name
 self.msg = msg
 #
@@ -494,8 +499,8 @@ class Formatter(object):
 %(lineno)d  Source line number where the logging call was issued
 (if available)
 %(funcName)sFunction name
-%(created)f Time when the LogRecord was created (time.time()
-return value)
+%(created)f Time when the LogRecord was created (by default
+time.time() return value)
 %(asctime)s Textual time when the LogRecord was created
 %(msecs)d   Millisecond portion of the creation time
 %(relativeCreated)d Time in milliseconds when the LogRecord was created,
@@ -1862,6 +1867,8 @@ def basicConfig(**kwargs):
   handlers, which will be added to the root handler. Any handler
   in the list which does not have a formatter assigned will be
   assigned the formatter created

[Python-Dev] Buildbot migrated to a new server

2020-09-03 Thread Victor Stinner
Hi,

tl; dr Buildbots were unstable for 3 weeks but the issue is mostly resolved.


Since last January, the disk of the buildbot server was full every 2
weeks and The Night’s Watch had to fix it in the darkness for you
(usually, remove JUnit files and restart the server). The old machine
only has 8 GB for the whole system and all data, whereas buildbot
workers produce large JUnit (XML) files (around 5 MB per file).

Three weeks ago, Ernest W. Durbin III provided us a new machine with a
larger disk (60 GB) and installed PostgreSQL database (whereas SQLite
was used previously). He automated the installation of the machine,
but also (great new feature!) automated reloading the Buildbot server
when a new configuration is pushed in the Git repository. The
configuration is public and maintained at:
https://github.com/python/buildmaster-config/

The migration was really smooth, except that last week, we noticed
that workers started to be disconnected every minute, and then filled
their temporary directory with temporary compiler files leaked by
interrupted builds. Buildbot owners have to update their client
configuration and remove manually temporary files:
https://mail.python.org/archives/list/[email protected]/thread/SZR2OLH67OYXSSADSM65HJYOIMFF44JZ/

Most buildbot worker configurations have been updated and the issue is
mostly resolved.

There is another minor issue, HTTPS connections also closed after 1
minute and so the web page is refreshed automatically every minute.
The load balancer configuration should be adjusted:
https://bugs.python.org/issue41701

Victor
--
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/DYUX5EEDAX3IO66QOICPK3VNEENSEIIQ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Deferred, coalescing, and other very recent reference counting optimization

2020-09-03 Thread Tim Peters
I'm surprised nobody has mentioned this:  there are no "unboxed" types
in CPython - in effect, every object user code creates is allocated
from the heap. Even, e.g., integers and floats. So even non-contrived
code can create garbage at a ferocious rate. For example, think about
this simple function, which naively computes the Euclidean distance
between two n-vectors:

```python
def dist(xs, ys):
from math import sqrt
if len(xs) != len(ys):
raise ValueError("inputs must have same length")
return sqrt(sum((x - y)**2 for x, y in zip(xs, ys)))
```

In general, `len(xs)` and `len(ys)` each create a new integer object,
which both become trash the instant `!=` completes.  Say the common
length is `n`.

`zip` then creates `n` 2-tuple objects, each of which lives only long
enough to be unpacked into `x` and `y`.  The the result of `x-y` lives
only long enough to be squared, and the result of that lives only long
enough to be added into `sum()`'s internal accumulator.  Finally, the
grand total lives only long enough to extract its square root.

With "immediate" reclamation of garbage via refcounting, memory use is
trival regardless of how large `n` is, as CPython reuses the same heap
space over & over & over, ASAP.  The space for each 2-tuple is
reclaimed before `x-y` is computed, the space for that is reclaimed
when the square completes, and the space for the square is reclaimed
right after `sum()` folds it in.  It's memory-efficient and
cache-friendly "in the small".

Of course that's _assuming__, e.g., that `(x-y).__pow__(2)` doesn't
save away its arguments somewhere that outlives the method call, but
the compiler has no way to know whether it does.  The only thing it
can assume about the element types is that they support the methods
the code invokes.  It fact, the compiler has no idea whether the
result type of `x-y` is even the same as the type of `x` or of `y`.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/NU4T5TFPDVYDCR5ADY6KKJ6USWVFD3TZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Deferred, coalescing, and other very recent reference counting optimization

2020-09-03 Thread Brandt Bucher
Tim Peters wrote:
> `zip` then creates `n` 2-tuple objects, each of which lives only long enough 
> to be unpacked into `x` and `y`... With "immediate" reclamation of garbage 
> via refcounting, memory use is trival regardless of how large `n` is, as 
> CPython reuses the same heap space over & over & over, ASAP.  The space for 
> each 2-tuple is reclaimed before `x-y` is computed...

It's also worth noting that the current refcounting scheme allows for some 
pretty sneaky optimizations under-the-hood. In your example, `zip` only ever 
creates one 2-tuple, and keeps reusing it over and over:

https://github.com/python/cpython/blob/c96d00e88ead8f99bb6aa1357928ac4545d9287c/Python/bltinmodule.c#L2623

This is thanks to the fact that most `zip` usage looks exactly like yours, 
where the tuple is only around long enough to be unpacked. If `zip.__next__` 
knows it's not referenced anywhere else anymore, it is free to mutate(!) it in 
place. I believe PyUnicode_Append does something similar for string 
concatenation, as well.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/MJ4BL42YSMP5BUGWT7EEK3EKNVGBDH35/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: PEP 622 version 2 (Structural Pattern Matching)

2020-09-03 Thread Rob Cliffe via Python-Dev




On 30/07/2020 00:34, Nick Coghlan wrote:
the proposed name binding syntax inherently conflicts with the 
existing assignment statement lvalue syntax in two areas:


* dotted names (binds an attribute in assignment, looks up a 
constraint value in a match case)
* underscore targets (binds in assignment, wildcard match without 
binding in a match case)



The former syntactic conflict presents a bigger problem, though, as it 
means that we'd be irrevocably committed to having two different 
lvalue syntaxes for the rest of Python's future as a language.




+1
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/WLNMH7OFURYPL2E7YT5JRYXW7RLDGIH6/
Code of Conduct: http://python.org/psf/codeofconduct/