Re: [Python-Dev] Python startup time

2018-10-10 Thread Ronald Oussoren via Python-Dev


> On 9 Oct 2018, at 23:02, Gregory Szorc  wrote:
> 
> 
> 
> While we're here, CPython might want to look into getdirentriesattr() as
> a replacement for readdir(). We switched to it in Mercurial several
> years ago to make `hg status` operations significantly faster [2]. I'm
> not sure if it will yield a speedup on APFS though. But it's worth a
> try. (If it does, you could probably make
> os.listdir()/os.scandir()/os.walk() significantly faster on macOS.)

Note that getdirentriesattr is deprecated as of macOS 10.10, getattrlistbulk
is the non-deprecated replacement (introduced in 10.10). 

Ronald___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-10-09 Thread Antoine Pitrou


Hi,

On Tue, 9 Oct 2018 14:02:02 -0700
Gregory Szorc  wrote:
> 
> Python 3.7 doesn't exhibit as much of a problem. But it is still there.
> A brief audit of the importer code and call stacks confirms it is the
> same problem - just less prevalent. Wall time execution of the test
> harness from Python 2.7 to Python 3.7 drops from ~37:43s to ~20:39.
> Overall kernel CPU time drops from ~75% to ~19%. And that wall time
> improvement is despite Python 3's slower process startup. So locking in
> the kernel is really a killer on Python 2.7.

Thanks for the detailed feedback.

> I hope someone finds this information useful to further improving
> [startup] performance. (And given that Python 3.7 is substantially
> faster by avoiding excessive readdir(), I wouldn't be surprised if this
> problem is already known!)

The macOS problem wasn't known, but the general problem of filesystem
calls was (in relation with e.g. networked filesystems).

Significant work went into improving Python 3 in that regard after the
import mechanism was rewritten in pure Python.  Nowadays Python caches
the contents of all sys.path directories, so (once the cache is primed)
it's mostly a single stat() call per directory to check whether the
cache is up-to-date.  This is not entirely free, but massively better
than what Python 2 did, which was to stat() many filename patterns in
each sys.path directory.

(of course, the fact that Python 3 imports many more modules at startup
mitigates the end result a bit)


As a sidenote, I was always shocked with how the Mercurial test suite
was architected.  You're wasting so much time launching processes that
I wonder why you kept it that way for so long :-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-10-09 Thread Gregory Szorc
On 5/1/2018 8:26 PM, Gregory Szorc wrote:
> On 7/19/2017 12:15 PM, Larry Hastings wrote:
>>
>>
>> On 07/19/2017 05:59 AM, Victor Stinner wrote:
>>> Mercurial startup time is already 45.8x slower than Git whereas tested
>>> Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
>>> developers, with a startup time 2x - 3x slower...
>>
>> When Matt Mackall spoke at the Python Language Summit some years back, I
>> recall that he specifically complained about Python startup time.  He
>> said Python 3 "didn't solve any problems for [them]"--they'd already
>> solved their Unicode hygiene problems--and that Python's slow startup
>> time was already a big problem for them.  Python 3 being /even slower/
>> to start was absolutely one of the reasons why they didn't want to upgrade.
>>
>> You might think "what's a few milliseconds matter".  But if you run
>> hundreds of commands in a shell script it adds up.  git's speed is one
>> of the few bright spots in its UX, and hg's comparative slowness here is
>> a palpable disadvantage.
>>
>>
>>> So please continue efforts for make Python startup even faster to beat
>>> all other programming languages, and finally convince Mercurial to
>>> upgrade ;-)
>>
>> I believe Mercurial is, finally, slowly porting to Python 3.
>>
>> https://www.mercurial-scm.org/wiki/Python3
>>
>> Nevertheless, I can't really be annoyed or upset at them moving slowly
>> to adopt Python 3, as Matt's objections were entirely legitimate.
> 
> I just now found found this thread when searching the archive for
> threads about startup time. And I was searching for threads about
> startup time because Mercurial's startup time has been getting slower
> over the past few months and this is causing substantial pain.
> 
> As I posted back in 2014 [1], CPython's startup overhead was >10% of the
> total CPU time in Mercurial's test suite. And when you factor in the
> time to import modules that get Mercurial to a point where it can run
> commands, it was more like 30%!
> 
> Mercurial's full test suite currently runs `hg` ~25,000 times. Using
> Victor's startup time numbers of 6.4ms for 2.7 and 14.5ms for
> 3.7/master, Python startup overhead contributes ~160s on 2.7 and ~360s
> on 3.7/master. Even if you divide this by the number of available CPU
> cores, we're talking dozens of seconds of wall time just waiting for
> CPython to get to a place where Mercurial's first bytecode can execute.
> 
> And the problem is worse when you factor in the time it takes to import
> Mercurial's own modules.
> 
> As a concrete example, I recently landed a Mercurial patch [2] that
> stubs out zope.interface to prevent the import of 9 modules on every
> `hg` invocation. This "only" saved ~6.94ms for a typical `hg`
> invocation. But this decreased the CPU time required to run the test
> suite on my i7-6700K from ~4450s to ~3980s (~89.5% of original) - a
> reduction of almost 8 minutes of CPU time (and over 1 minute of wall time)!
> 
> By the time CPython gets Mercurial to a point where we can run useful
> code, we've already blown most of or past the time budget where humans
> perceive an action/command as instantaneous. If you ignore startup
> overhead, Mercurial's performance compares quite well to Git's for many
> operations. But the reality is that CPython startup overhead makes it
> look like Mercurial is non-instantaneous before Mercurial even has the
> opportunity to execute meaningful code!
> 
> Mercurial provides a `chg` program that essentially spins up a daemon
> `hg` process running a "command server" so the `chg` program [written in
> C - no startup overhead] can dispatch commands to an already-running
> Python/`hg` process and avoid paying the startup overhead cost. When you
> run Mercurial's test suite using `chg`, it completes *minutes* faster.
> `chg` exists mainly as a workaround for slow startup overhead.
> 
> Changing gears, my day job is maintaining Firefox's build system. We use
> Python heavily in the build system. And again, Python startup overhead
> is problematic. I don't have numbers offhand, but we invoke likely a few
> hundred Python processes as part of building Firefox. It should be
> several thousand. But, we've had to "hack" parts of the build system to
> "batch" certain build actions in single process invocations in order to
> avoid Python startup overhead. This undermines the ability of some build
> tools to formulate a reasonable understanding of the DAG and it causes a
> bit of pain for build system developers and makes it difficult to
> achieve "no-op" and fast incremental builds because we're always
> invoking certain Python processes because we've had to move DAG
> awareness out of the build backend and into Python. At some point, we'll
> likely replace Python code with Rust so the build system is more "pure"
> and easier to maintain and reason about.
> 
> I've seen posts in this thread and elsewhere in the CPython development
> universe that challenge whether milliseconds in 

Re: [Python-Dev] Python startup time

2018-05-14 Thread M.-A. Lemburg

On 14.05.2018 18:26, Chris Barker via Python-Dev wrote:
> 
> 
> On Fri, May 11, 2018 at 11:05 AM, Ryan Gonzalez  > wrote:
> 
>  https://refi64.com/uprocd/ 
> 
> 
> very cool -- but *nix only, of course :-(
> 
> But it seems that there is a demand for this sort of thing, and a few
> major projects are rolling their own. So maybe it makes sense to put
> something into the standard library that everyone could contribute to
> and use.
> 
> With regard to forking -- is there another way? I don't have the
> expertise to have any idea if this is possible, but:
> 
> start up python
> 
> capture the entire runtime image as a single binary blob.
> 
> could that blob be simply loaded into memory and run?
> 
> (hmm -- probably not -- memory addresses would be hard-coded then, yes?)
> or is memory virtualized enough these days?

You might want to look into combining this with PyRun:

https://www.egenix.com/products/python/PyRun/

which takes care of mmap'ing the byte code of the stdlib into
memory.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...   http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...   http://zope.egenix.com/


::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
  http://www.malemburg.com/

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread Oleg Broytman
On Mon, May 14, 2018 at 12:26:19PM -0400, Chris Barker via Python-Dev 
 wrote:
> With regard to forking -- is there another way? I don't have the expertise
> to have any idea if this is possible, but:
> 
> start up python
> 
> capture the entire runtime image as a single binary blob.
> could that blob be simply loaded into memory and run?

   Like emacs unexec? https://www.google.com/search?q=emacs+unexec

> -CHB
> 
> -- 
> 
> Christopher Barker, Ph.D.
> Oceanographer
> 
> Emergency Response Division
> NOAA/NOS/OR(206) 526-6959   voice
> 7600 Sand Point Way NE   (206) 526-6329   fax
> Seattle, WA  98115   (206) 526-6317   main reception
> 
> chris.bar...@noaa.gov

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/p...@phdru.name
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread INADA Naoki
2018年5月15日(火) 2:17 Antoine Pitrou :

>
> Le 14/05/2018 à 19:12, INADA Naoki a écrit :
> > I'm sorry, the word *will* may be stronger than I thought.
> >
> > I meant if memory image dumped on disk is used casually,
> > it may make easier to make security hole.
> >
> > For example, if `hg` memory image is reused, and it can be leaked in some
> > way,
> > hg serve will be hashdos weak.
>
> This discussion subthread is not about having a memory image dumped on
> disk, but a daemon utility that preloads a new Python process when you
> first start up your CLI application.  Each time a new process is
> preloaded, it will by construction use a new hash seed.
>

My reply was to:

> capture the entire runtime image as a single binary blob.
> could that blob be simply loaded into memory and run?

So I thought about reusing memory image undeterministic times.

Of course, prefork is much safer because hash initial vector is only in
process ram.

Regards,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread Antoine Pitrou

Le 14/05/2018 à 19:12, INADA Naoki a écrit :
> I'm sorry, the word *will* may be stronger than I thought.
> 
> I meant if memory image dumped on disk is used casually,
> it may make easier to make security hole.
> 
> For example, if `hg` memory image is reused, and it can be leaked in some
> way,
> hg serve will be hashdos weak.

This discussion subthread is not about having a memory image dumped on
disk, but a daemon utility that preloads a new Python process when you
first start up your CLI application.  Each time a new process is
preloaded, it will by construction use a new hash seed.

(by contrast, the Node.js CVE issue you linked to is about having the
same hash seed accross a Node.js version; that's disastrous)

Also you add a reuse limit to ensure that the hash seed is rotated (e.g.
every 100 invocations).

Regards

Antoine.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread INADA Naoki
I'm sorry, the word *will* may be stronger than I thought.

I meant if memory image dumped on disk is used casually,
it may make easier to make security hole.

For example, if `hg` memory image is reused, and it can be leaked in some
way,
hg serve will be hashdos weak.

I don't deny that it's useful and safe when it's used carefully.

Regards,

On Tue, May 15, 2018 at 1:58 AM Antoine Pitrou  wrote:

> On Tue, 15 May 2018 01:33:18 +0900
> INADA Naoki  wrote:
> >
> > It will broke hash randomization.
> >
> > See also: https://www.cvedetails.com/cve/CVE-2017-11499/

> I don't know why it would.  The mechanism of pre-initializing a process
> which is re-used accross many requests is how most server applications
> of Python already work (you don't want to bear the cost of spawning
> a new interpreter for each request, as antiquated CGI does). I have not
> heard that it breaks hash randomization, so a similar mechanism on the
> CLI side shouldn't break it either.

> Regards

> Antoine.


> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/songofacandy%40gmail.com



-- 
-- 
INADA Naoki  
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread Antoine Pitrou
On Tue, 15 May 2018 01:33:18 +0900
INADA Naoki  wrote:
> 
> It will broke hash randomization.
> 
> See also: https://www.cvedetails.com/cve/CVE-2017-11499/

I don't know why it would.  The mechanism of pre-initializing a process
which is re-used accross many requests is how most server applications
of Python already work (you don't want to bear the cost of spawning
a new interpreter for each request, as antiquated CGI does). I have not
heard that it breaks hash randomization, so a similar mechanism on the
CLI side shouldn't break it either.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread Chris Barker via Python-Dev
On Mon, May 14, 2018 at 12:33 PM, INADA Naoki 
wrote:

> It will broke hash randomization.
>
> See also: https://www.cvedetails.com/cve/CVE-2017-11499/


I'm not enough of a security expert to know how much that matters in this
case, but I suppose one could do a bit of post-proccessing on the image to
randomize the hashes? or is that just insane?

Also -- I wasn't thinking it would be a pre-build binary blob that everyone
used -- but one built on the fly on an individual system, maybe once per
reboot, or once per shell instance even. So if you are running, e.g. hg a
bunch of times in a shell, does it matter that the instances are all
identical?

-CHB


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread INADA Naoki
On Tue, May 15, 2018 at 1:29 AM Chris Barker via Python-Dev <
python-dev@python.org> wrote:


> On Fri, May 11, 2018 at 11:05 AM, Ryan Gonzalez  wrote:

>>  https://refi64.com/uprocd/ 


> very cool -- but *nix only, of course :-(

> But it seems that there is a demand for this sort of thing, and a few
major projects are rolling their own. So maybe it makes sense to put
something into the standard library that everyone could contribute to and
use.

> With regard to forking -- is there another way? I don't have the
expertise to have any idea if this is possible, but:

> start up python

> capture the entire runtime image as a single binary blob.

> could that blob be simply loaded into memory and run?

> (hmm -- probably not -- memory addresses would be hard-coded then, yes?)
or is memory virtualized enough these days?

> -CHB


It will broke hash randomization.

See also: https://www.cvedetails.com/cve/CVE-2017-11499/

Regards,

-- 
Inada Naoki
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-14 Thread Chris Barker via Python-Dev
On Fri, May 11, 2018 at 11:05 AM, Ryan Gonzalez  wrote:

>  https://refi64.com/uprocd/ 


very cool -- but *nix only, of course :-(

But it seems that there is a demand for this sort of thing, and a few major
projects are rolling their own. So maybe it makes sense to put something
into the standard library that everyone could contribute to and use.

With regard to forking -- is there another way? I don't have the expertise
to have any idea if this is possible, but:

start up python

capture the entire runtime image as a single binary blob.

could that blob be simply loaded into memory and run?

(hmm -- probably not -- memory addresses would be hard-coded then, yes?) or
is memory virtualized enough these days?

-CHB

-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time - daemon

2018-05-11 Thread Guido van Rossum
On Fri, May 11, 2018 at 11:57 PM, Barry Warsaw  wrote:

> On May 11, 2018, at 12:23, Guido van Rossum  wrote:
> >
> > Indeed, we have an implementation of this specific to mypy.
>
> Is there anything in mypy’s implementation that can be generalized into a
> library?
>

Not sure, here's the code:
https://github.com/python/mypy/blob/master/mypy/dmypy.py
https://github.com/python/mypy/blob/master/mypy/dmypy_server.py
(also dmypy_util.py there)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time - daemon

2018-05-11 Thread Barry Warsaw
On May 11, 2018, at 12:23, Guido van Rossum  wrote:
> 
> Indeed, we have an implementation of this specific to mypy.

Is there anything in mypy’s implementation that can be generalized into a 
library?

-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time - daemon

2018-05-11 Thread Guido van Rossum
Indeed, we have an implementation of this specific to mypy.

On Fri, May 11, 2018 at 11:34 AM, Antoine Pitrou 
wrote:

>
> Yes, you don't want this to be a generic utility, rather a helper
> library that people can integrate into their command-line applications
> to enable such startup caching.
>
> Regards
>
> Antoine.
>
>
> On Fri, 11 May 2018 17:27:35 +0200
> Oleg Broytman  wrote:
> > On Fri, May 11, 2018 at 07:38:05AM -0700, Chris Barker - NOAA Federal
> via Python-Dev  wrote:
> > > Could one make a little startup utility that, when invoked the first
> > > time, starts up a raw python interpreter, keeps it running somewhere,
> > > and then forks it to run the actual python code.
> > >
> > > Then every invocation after that would make a new fork.
> >
> >Used to be implemented (and discussed in this list) many times. Just
> > a few examples:
> >
> > http://readyexec.sourceforge.net/
> > https://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/
> >
> >Proven to be hard and never gain any traction.
> >
> > a) you don't want the daemon to import all possible modules so you need
> >to run a separate copy of the daemon for every Python version, every
> >user and every client program;
> > b) you need to find "your" daemon - using TCP? unix sockets? named pipes?
> > b) need to redirect stdio to/from the daemon;
> > c) need to redirect signals and exceptions;
> > d) have problems with elevated privileges (how do you elevate the daemon
> >if the client was started with `sudo -H`?);
> > e) not portable (there is a popular GUI that cannot fork).
> >
> > > -CHB
> > > Sent from my iPhone
> >
> > Oleg.
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> guido%40python.org
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time - daemon

2018-05-11 Thread Antoine Pitrou

Yes, you don't want this to be a generic utility, rather a helper
library that people can integrate into their command-line applications
to enable such startup caching.

Regards

Antoine.


On Fri, 11 May 2018 17:27:35 +0200
Oleg Broytman  wrote:
> On Fri, May 11, 2018 at 07:38:05AM -0700, Chris Barker - NOAA Federal via 
> Python-Dev  wrote:
> > Could one make a little startup utility that, when invoked the first
> > time, starts up a raw python interpreter, keeps it running somewhere,
> > and then forks it to run the actual python code.
> > 
> > Then every invocation after that would make a new fork.  
> 
>Used to be implemented (and discussed in this list) many times. Just
> a few examples:
> 
> http://readyexec.sourceforge.net/
> https://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/
> 
>Proven to be hard and never gain any traction.
> 
> a) you don't want the daemon to import all possible modules so you need
>to run a separate copy of the daemon for every Python version, every
>user and every client program;
> b) you need to find "your" daemon - using TCP? unix sockets? named pipes?
> b) need to redirect stdio to/from the daemon;
> c) need to redirect signals and exceptions;
> d) have problems with elevated privileges (how do you elevate the daemon
>if the client was started with `sudo -H`?);
> e) not portable (there is a popular GUI that cannot fork).
> 
> > -CHB
> > Sent from my iPhone  
> 
> Oleg.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time - daemon

2018-05-11 Thread Oleg Broytman
On Fri, May 11, 2018 at 07:38:05AM -0700, Chris Barker - NOAA Federal via 
Python-Dev  wrote:
> Could one make a little startup utility that, when invoked the first
> time, starts up a raw python interpreter, keeps it running somewhere,
> and then forks it to run the actual python code.
> 
> Then every invocation after that would make a new fork.

   Used to be implemented (and discussed in this list) many times. Just
a few examples:

http://readyexec.sourceforge.net/
https://blogs.gnome.org/johan/2007/01/18/introducing-python-launcher/

   Proven to be hard and never gain any traction.

a) you don't want the daemon to import all possible modules so you need
   to run a separate copy of the daemon for every Python version, every
   user and every client program;
b) you need to find "your" daemon - using TCP? unix sockets? named pipes?
b) need to redirect stdio to/from the daemon;
c) need to redirect signals and exceptions;
d) have problems with elevated privileges (how do you elevate the daemon
   if the client was started with `sudo -H`?);
e) not portable (there is a popular GUI that cannot fork).

> -CHB
> Sent from my iPhone

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/p...@phdru.name
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-11 Thread Ryan Gonzalez

 https://refi64.com/uprocd/ 


On May 11, 2018 9:39:28 AM Chris Barker - NOAA Federal via Python-Dev 
 wrote:



Inspired by chg:

Could one make a little startup utility that, when invoked the first
time, starts up a raw python interpreter, keeps it running somewhere,
and then forks it to run the actual python code.

Then every invocation after that would make a new fork. I presume
forking is a LOT faster than re-invoking the entire startup.

I suspect that many of the cases where startup time really matters is
when a command line utility is likely to be invoked many times — often
in the same shell instance.

So having a “pre-built” warm interpreter ready to go could really help.

This is way past my technical expertise to know if it’s possible, or
to try to prototype it, but I’m sure many of you would know.

-CHB

Sent from my iPhone


On May 7, 2018, at 12:28 PM, Neil Schemenauer  wrote:

On 2018-05-03, Lukasz Langa wrote:

On May 2, 2018, at 8:57 PM, INADA Naoki  wrote:
* Add lazy compiling API or flag in `re` module.  The pattern is compiled
when first used.


How about go the other way and allow compiling at Python
*compile*-time? That would actually make things faster instead of
just moving the time spent around.


Lisp has a special form 'eval-when'.  It can be used to cause
evaluation of the body expression at compile time.

In Carl's "A fast startup patch" post, he talks about getting rid of
the unmarshal step and storing objects in the heap segment of the
executable.  Those would be the objects necessary to evaluate code.
The marshal module has a limited number of types that it handle.
I believe they are: bool, bytes, code objects, complex, Ellipsis
float, frozenset, int, None, tuple and str.

If the same mechanism could handle more types, rather than storing
the code to be evaluated, we could store the objects created after
evaluation of the top-level module body.  Or, have a mechanism to
mark which code should be evaluated at compile time (much like the
eval-when form).

For the re.compile example, the compiled regex could be what is
stored after compiling the Python module (i.e. the re.compile gets
run at compile time).  The objects created by re.compile (e.g.
SRE_Pattern) would have to be something that the heap dumper could
handle.

Traditionally, Python has had the model "there is only runtime".
So, starting to do things at compile time complicates that model.

Regards,

 Neil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-11 Thread Chris Barker - NOAA Federal via Python-Dev
Inspired by chg:

Could one make a little startup utility that, when invoked the first
time, starts up a raw python interpreter, keeps it running somewhere,
and then forks it to run the actual python code.

Then every invocation after that would make a new fork. I presume
forking is a LOT faster than re-invoking the entire startup.

I suspect that many of the cases where startup time really matters is
when a command line utility is likely to be invoked many times — often
in the same shell instance.

So having a “pre-built” warm interpreter ready to go could really help.

This is way past my technical expertise to know if it’s possible, or
to try to prototype it, but I’m sure many of you would know.

-CHB

Sent from my iPhone

> On May 7, 2018, at 12:28 PM, Neil Schemenauer  wrote:
>
> On 2018-05-03, Lukasz Langa wrote:
>>> On May 2, 2018, at 8:57 PM, INADA Naoki  wrote:
>>> * Add lazy compiling API or flag in `re` module.  The pattern is compiled
>>> when first used.
>>
>> How about go the other way and allow compiling at Python
>> *compile*-time? That would actually make things faster instead of
>> just moving the time spent around.
>
> Lisp has a special form 'eval-when'.  It can be used to cause
> evaluation of the body expression at compile time.
>
> In Carl's "A fast startup patch" post, he talks about getting rid of
> the unmarshal step and storing objects in the heap segment of the
> executable.  Those would be the objects necessary to evaluate code.
> The marshal module has a limited number of types that it handle.
> I believe they are: bool, bytes, code objects, complex, Ellipsis
> float, frozenset, int, None, tuple and str.
>
> If the same mechanism could handle more types, rather than storing
> the code to be evaluated, we could store the objects created after
> evaluation of the top-level module body.  Or, have a mechanism to
> mark which code should be evaluated at compile time (much like the
> eval-when form).
>
> For the re.compile example, the compiled regex could be what is
> stored after compiling the Python module (i.e. the re.compile gets
> run at compile time).  The objects created by re.compile (e.g.
> SRE_Pattern) would have to be something that the heap dumper could
> handle.
>
> Traditionally, Python has had the model "there is only runtime".
> So, starting to do things at compile time complicates that model.
>
> Regards,
>
>  Neil
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-07 Thread Neil Schemenauer
On 2018-05-03, Lukasz Langa wrote:
> > On May 2, 2018, at 8:57 PM, INADA Naoki  wrote:
> > * Add lazy compiling API or flag in `re` module.  The pattern is compiled
> > when first used.
> 
> How about go the other way and allow compiling at Python
> *compile*-time? That would actually make things faster instead of
> just moving the time spent around.

Lisp has a special form 'eval-when'.  It can be used to cause
evaluation of the body expression at compile time.

In Carl's "A fast startup patch" post, he talks about getting rid of
the unmarshal step and storing objects in the heap segment of the
executable.  Those would be the objects necessary to evaluate code.
The marshal module has a limited number of types that it handle.
I believe they are: bool, bytes, code objects, complex, Ellipsis
float, frozenset, int, None, tuple and str.

If the same mechanism could handle more types, rather than storing
the code to be evaluated, we could store the objects created after
evaluation of the top-level module body.  Or, have a mechanism to
mark which code should be evaluated at compile time (much like the
eval-when form).

For the re.compile example, the compiled regex could be what is
stored after compiling the Python module (i.e. the re.compile gets
run at compile time).  The objects created by re.compile (e.g.
SRE_Pattern) would have to be something that the heap dumper could
handle.

Traditionally, Python has had the model "there is only runtime".
So, starting to do things at compile time complicates that model.

Regards,

  Neil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Chris Jerdonek
FYI, a lot of these ideas were discussed back in September and October of
2017 on this list if you search the subject lines for "startup" e.g.
starting here and here:
https://mail.python.org/pipermail/python-dev/2017-September/149150.html
https://mail.python.org/pipermail/python-dev/2017-October/149670.html

At the end Guido kicked (at least part of) the discussion back to
python-ideas.

--Chris


On Thu, May 3, 2018 at 5:55 PM, Chris Angelico  wrote:

> On Fri, May 4, 2018 at 10:43 AM, Gregory P. Smith  wrote:
> > I'd also like to see this concept somehow extended to decorators so that
> the
> > results of the decoration can be captured in the compiled pyc rather than
> > requiring execution at import time.  I realize that limits what
> decorators
> > can do, but the evil things they could do that this would eliminate are
> > things they just shouldn't be doing in most situations.  meaning: there
> > would probably be two types of decorators... colons seem to be all the
> rage
> > these days so we could add an @: operator for that. :P ... Along with a
> from
> > __future__ import to change the behavior or all decorators in a file from
> > runtime to compile time by default.
> >
> > from __future__ import compile_time_decorators  # we'd be unlikely to
> ever
> > change the default and break things, __future__ seems wrong
> >
> > @this_happens_at_compile_time(3)
> > def ...
> >
> > @:this_waits_until_runtime(5)
> > def ...
> >
> > Just a not-so-wild idea, no idea if this should become a PEP for 3.8.
> (the
> > : syntax is a joke - i'd prefer @@ so it looks like eyeballs)
>
> At this point, we're squarely in python-ideas territory, but there are
> some possibilities. Imagine popping this line of code at the bottom of
> your file:
>
> import importlib; importlib.freeze_module()
>
> as a declaration that the dictionary for this module is now locked in
> and can be dumped out in whatever form is most efficient. Effectively,
> you're stating that you do not need any sort of dynamism (that call
> could be easily disabled for testing), and that, if the optimization
> breaks anything, you accept responsibility for it.
>
> How this would be implemented, I'm not sure, but that's no different
> from the @: idea.
>
> ChrisA
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> chris.jerdonek%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Chris Angelico
On Fri, May 4, 2018 at 10:43 AM, Gregory P. Smith  wrote:
> I'd also like to see this concept somehow extended to decorators so that the
> results of the decoration can be captured in the compiled pyc rather than
> requiring execution at import time.  I realize that limits what decorators
> can do, but the evil things they could do that this would eliminate are
> things they just shouldn't be doing in most situations.  meaning: there
> would probably be two types of decorators... colons seem to be all the rage
> these days so we could add an @: operator for that. :P ... Along with a from
> __future__ import to change the behavior or all decorators in a file from
> runtime to compile time by default.
>
> from __future__ import compile_time_decorators  # we'd be unlikely to ever
> change the default and break things, __future__ seems wrong
>
> @this_happens_at_compile_time(3)
> def ...
>
> @:this_waits_until_runtime(5)
> def ...
>
> Just a not-so-wild idea, no idea if this should become a PEP for 3.8.  (the
> : syntax is a joke - i'd prefer @@ so it looks like eyeballs)

At this point, we're squarely in python-ideas territory, but there are
some possibilities. Imagine popping this line of code at the bottom of
your file:

import importlib; importlib.freeze_module()

as a declaration that the dictionary for this module is now locked in
and can be dumped out in whatever form is most efficient. Effectively,
you're stating that you do not need any sort of dynamism (that call
could be easily disabled for testing), and that, if the optimization
breaks anything, you accept responsibility for it.

How this would be implemented, I'm not sure, but that's no different
from the @: idea.

ChrisA
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Gregory P. Smith
On Thu, May 3, 2018 at 5:22 PM, Lukasz Langa  wrote:

>
> > On May 2, 2018, at 8:57 PM, INADA Naoki  wrote:
> >
> > Recently, I reported how stdlib slows down `import requests`.
> > https://github.com/requests/requests/issues/4315#issuecomment-385584974
> >
> > For Python 3.8, my ideas for faster startup time are:
> >
> > * Add lazy compiling API or flag in `re` module.  The pattern is compiled
> > when first used.
>
> How about go the other way and allow compiling at Python *compile*-time?
> That would actually make things faster instead of just moving the time
> spent around.
>
> I do see value in being less eager in Python but I think the real wins are
> hiding behind ahead-of-time compilation.
>

Agreed in concept.  We've got a lot of unused letters that could be new
string prefixes... (ugh)

I'd also like to see this concept somehow extended to decorators so that
the results of the decoration can be captured in the compiled pyc rather
than requiring execution at import time.  I realize that limits what
decorators can do, but the evil things they could do that this would
eliminate are things they just shouldn't be doing in most situations.
meaning: there would probably be two types of decorators... colons seem to
be all the rage these days so we could add an @: operator for that. :P ...
Along with a from __future__ import to change the behavior or all
decorators in a file from runtime to compile time by default.

from __future__ import compile_time_decorators  # we'd be unlikely to ever
change the default and break things, __future__ seems wrong

@this_happens_at_compile_time(3)
def ...

@:this_waits_until_runtime(5)
def ...

Just a not-so-wild idea, no idea if this should become a PEP for 3.8.  (the
: syntax is a joke - i'd prefer @@ so it looks like eyeballs)

If this were done to decorators, you can imagine extending that concept to
something similar to allow compile time re.compile calls as some form of
assignment decorator:

GREGS_RE = @re.compile(r'A regex compiled at compile time\. number = \d+')

-gps


> - Ł
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> greg%40krypto.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Lukasz Langa

> On May 2, 2018, at 8:57 PM, INADA Naoki  wrote:
> 
> Recently, I reported how stdlib slows down `import requests`.
> https://github.com/requests/requests/issues/4315#issuecomment-385584974
> 
> For Python 3.8, my ideas for faster startup time are:
> 
> * Add lazy compiling API or flag in `re` module.  The pattern is compiled
> when first used.

How about go the other way and allow compiling at Python *compile*-time? That 
would actually make things faster instead of just moving the time spent around.

I do see value in being less eager in Python but I think the real wins are 
hiding behind ahead-of-time compilation.

- Ł
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Ray Donnelly
On Wed, May 2, 2018 at 6:55 PM, Nathaniel Smith  wrote:
> On Wed, May 2, 2018, 09:51 Gregory Szorc  wrote:
>>
>> Correct me if I'm wrong, but aren't there downsides with regards to C
>> extension compatibility to not having a shared libpython? Or does all the
>> packaging tooling "just work" without a libpython? (It's possible I have
my
>> wires crossed up with something else regarding a statically linked
Python.)
>
>
> IIRC, the rule on Linux is that if you build an extension on a statically
> built python, then it can be imported on a shared python, but not
> vice-versa. Manylinux wheels are therefore always built on a static python
> so that they'll work everywhere. (We should probably clean this up
upstream
> at some point, but there's not a lot of appetite for touching this stuff –
> very obscure, very easy to break things without realizing it, not much
> upside.)
>
> On Windows I don't think there is such a thing as a static build, because
> extensions have to link to the python dll to work at all. And on MacOS I'm
> not sure, though from knowing how their linker works my guess is that all
> extensions act like static extensions do on Linux.

Yes, on Windows there's always a python?.dll.

macOS is an interesting one. For Anaconda 5.0 I read somewhere (how's that
for a useless reference - and perhaps I got the wrong end of the stick)
that Python for all Unixen should use a statically linked interpreter so I
happily went ahead and did that. Of course I tested it against a good few
wheels at the time and everything seemed fine (well, no worse than the
usual binary compatibility woes at least) so I went ahead with it.

Now that Python 3.7 is around the corner we have a chance to re-evaluate
this decision. We have received no binary compat. bugs whatsoever due to
this change (we got a few bugs where people used python-config incorrectly
either directly or via swig or CMake), were we just lucky?

Anyway, it is obviously safer for us to do what upstream does and I will
try to post some benchmarks of static vs shared to the list so we can
discuss it. I guess it is a little late in the release schedule to propose
any such change for 3.7? If not I will try to prepare something. I will
discuss it in depth with the rest of the AD team soon too.

>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
>
https://mail.python.org/mailman/options/python-dev/mingw.android%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Gregory P. Smith
On Wed, May 2, 2018 at 2:13 PM, Barry Warsaw  wrote:

> Thanks for bringing this topic up again.  At $day_job, this is a highly
> visible and important topic, since the majority of our command line tools
> are written in Python (of varying versions from 2.7 to 3.6).  Some of those
> tools can take upwards of 5 seconds or more just to respond to —help, which
> causes lots of pain for developers, who complain (rightly so) up the
> management chain. ;)
>
> We’ve done a fair bit of work to bring those numbers down without super
> radical workarounds.  Often there are problems not strictly related to the
> Python interpreter that contribute to this.  Python gets blamed, but it’s
> not always the interpreter’s fault.  Common issues include:
>
> * Modules that have import-time side effects, such as network access or
> expensive creation of data structures.  Python 3.7’s `-X importtime` switch
> is a really wonderful way to identify the worst offenders.  Once 3.7 is
> released, I do plan to spend some time using this to collect data
> internally so we can attack our own libraries, and perhaps put automated
> performance testing into our build stack, to identify start up time
> regressions.
>
> * pkg_resources.  When you have tons of entries on sys.path, pkg_resources
> does a lot of work at import time, and because of common patterns which
> tend to use pkg_resources namespace package support in __init__.py files,
> this just kills start up times.  Of course, pkg_resources has other uses
> too, so even in a purely Python 3 world (where your namespace packages can
> omit the __init__.py), you’ll often get clobbered as soon as you want to
> use the Basic Resource Access API.  This is also pretty common, and it’s
> the main reason why Brett and I created importlib.resources for 3.7 (with a
> standalone API-compatible library for older Pythons).  That’s one less
> reason to use pkg_resources, but it doesn’t address the __init__.py use.
> Brett and I have been talking about addressing that for 3.8.
>
> * pex - which we use as our single file zipapp tool.  Especially the
> interaction between pex and pkg_resources introduces pretty significant
> overhead.  My colleague Loren Carvalho created a tool called shiv which
> requires at least Python 3.6, avoids the use of pkg_resources, and
> implements other tricks to be much more performant than pex.   Shiv is now
> open source and you can find it on RTD and GitHub.
>
> The switch to shiv and importlib.resources can shave 25-50% off of warm
> cache start up times for zipapp style executables.
>
> Another thing we’ve done, although I’m much less sanguine about them as a
> general approach, is to move imports into functions, but we’re trying to
> only use that trick on the most critical cases.
>
> Some import time effects can’t be changed.  Decorators come to mind, and
> click is a popular library for CLIs that provides some great features, but
> decorators do prevent a lazy loading approach.
>
> > On May 1, 2018, at 20:26, Gregory Szorc  wrote:
>
> >> You might think "what's a few milliseconds matter".  But if you run
> >> hundreds of commands in a shell script it adds up.  git's speed is one
> >> of the few bright spots in its UX, and hg's comparative slowness here is
> >> a palpable disadvantage.
>
> Oh, for command line tools, milliseconds absolutely matter.
>
> > As a concrete example, I recently landed a Mercurial patch [2] that
> > stubs out zope.interface to prevent the import of 9 modules on every
> > `hg` invocation.
>
> I have a similar dastardly plan to provide a pkg_resources stub :).
>
> > Mercurial provides a `chg` program that essentially spins up a daemon
> > `hg` process running a "command server" so the `chg` program [written in
> > C - no startup overhead] can dispatch commands to an already-running
> > Python/`hg` process and avoid paying the startup overhead cost. When you
> > run Mercurial's test suite using `chg`, it completes *minutes* faster.
> > `chg` exists mainly as a workaround for slow startup overhead.
>
> A couple of our developers demoed a similar approach for one of our CLIs
> that almost everyone uses.  It’s a big application with lots of
> dependencies, so particularly vulnerable to pex and pkg_resources
> overhead.  While it was just a prototype, it was darn impressive to see
> subsequent invocations produce output almost immediately.  It’s unfortunate
> that we have to utilize all these tricks to get even moderately performant
> Python CLIs.
>
>
Note that this kind of "trick" is not unique to Python.  I see it used by
large Java tools at work.  In effect emacs has done similar things for many
decades with its saved core-dump at build time. It saves a snapshot of
initialized elisp interpreter state and loads that into memory instead of
rerunning initialization to reproduce the state.

I don't know if anyone has looked at making a similar concept of saved
post-startup interpreter state for rapid 

Re: [Python-Dev] Python startup time

2018-05-03 Thread Nathaniel Smith
On Wed, May 2, 2018, 20:59 INADA Naoki  wrote:

> Recently, I reported how stdlib slows down `import requests`.
> https://github.com/requests/requests/issues/4315#issuecomment-385584974

[...]

> * Add faster and simpler http.parser (maybe, based on h11 [1]) and avoid
> using email module in http module.
>

It's always risky making predictions, but hopefully by the time 3.8 is out,
requests will have switched to using h11 directly instead of the http
module. (Kenneth wants the big headline feature for the next major requests
release to be async support, and that pretty much requires switching to
something like h11.)

I don't know how fast importing h11 is though... It does currently compile
a bunch of regexps at import time :-).

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Brett Cannon
On Thu, 3 May 2018 at 07:31 Nick Coghlan  wrote:

> On 3 May 2018 at 15:56, Glenn Linderman  wrote:
>
>> On 5/2/2018 8:56 PM, Gregory Szorc wrote:
>>
>> Nobody in the project is seriously talking about a complete rewrite in
>> Rust. Contributors to the project have varying opinions on how aggressively
>> Rust should be utilized. People who contribute to the C code, low-level
>> primitives (like storage, deltas, etc), and those who care about
>> performance tend to want more Rust. One thing we almost universally agree
>> on is that we want to rewrite all of Mercurial's C code in Rust. I
>> anticipate that figuring out the balance between Rust and Python in
>> Mercurial will be an ongoing conversation/process for the next few years.
>>
>> Have you considered simply rewriting CPython in Rust?
>>
>
> FWIW, I'd actually like to see Rust approved as a language for writing
> stdlib extension modules, but actually ever making that change in policy
> would require a concrete motivating use case.
>

Eric Snow, Barry Warsaw, and I have actually discussed this as part of our
weekly open source office hours as work where we tend to talk about massive
ideas that would take multiple people full-time to accomplish. :)


>
>
>> And yes, the 4th word in that question was intended to produce peals of
>> shocked laughter. But why Rust? Why not Go?
>>
>
> Trying to get two different garbage collection engines to play nice with
> each other is a recipe for significant pain, since you can easily end up
> with uncollectable cycles that neither GC system has complete visibility
> into (all it needs is a loop from PyObject A -> Go Object B -> back to
> PyObject A).
>
> Combining Python and Rust can still get into that kind of trouble when
> using reference counting on the Rust side, but it's a lot easier to avoid
> than it is in runtimes with mandatory GC.
>

Rust supports RAII
 so it shouldn't
be that bad.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Nick Coghlan
On 3 May 2018 at 15:56, Glenn Linderman  wrote:

> On 5/2/2018 8:56 PM, Gregory Szorc wrote:
>
> Nobody in the project is seriously talking about a complete rewrite in
> Rust. Contributors to the project have varying opinions on how aggressively
> Rust should be utilized. People who contribute to the C code, low-level
> primitives (like storage, deltas, etc), and those who care about
> performance tend to want more Rust. One thing we almost universally agree
> on is that we want to rewrite all of Mercurial's C code in Rust. I
> anticipate that figuring out the balance between Rust and Python in
> Mercurial will be an ongoing conversation/process for the next few years.
>
> Have you considered simply rewriting CPython in Rust?
>

FWIW, I'd actually like to see Rust approved as a language for writing
stdlib extension modules, but actually ever making that change in policy
would require a concrete motivating use case.


> And yes, the 4th word in that question was intended to produce peals of
> shocked laughter. But why Rust? Why not Go?
>

Trying to get two different garbage collection engines to play nice with
each other is a recipe for significant pain, since you can easily end up
with uncollectable cycles that neither GC system has complete visibility
into (all it needs is a loop from PyObject A -> Go Object B -> back to
PyObject A).

Combining Python and Rust can still get into that kind of trouble when
using reference counting on the Rust side, but it's a lot easier to avoid
than it is in runtimes with mandatory GC.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Ryan Gonzalez
I'm hardly an expert, but AFAIK CPython's start-up issues are more due to a 
mix of architectural issues and the fact that it's hard to optimize imports 
while maintaining backwards compatibility with Python's dynamism.


--
Ryan (ライアン)
Yoko Shimomura, ryo (supercell/EGOIST), Hiroyuki Sawano >> everyone else
https://refi64.com/



On May 3, 2018 1:37:57 AM Glenn Linderman  wrote:


On 5/2/2018 8:56 PM, Gregory Szorc wrote:

Nobody in the project is seriously talking about a complete rewrite in
Rust. Contributors to the project have varying opinions on how
aggressively Rust should be utilized. People who contribute to the C
code, low-level primitives (like storage, deltas, etc), and those who
care about performance tend to want more Rust. One thing we almost
universally agree on is that we want to rewrite all of Mercurial's C
code in Rust. I anticipate that figuring out the balance between Rust
and Python in Mercurial will be an ongoing conversation/process for
the next few years.

Have you considered simply rewriting CPython in Rust?

And yes, the 4th word in that question was intended to produce peals of
shocked laughter. But why Rust? Why not Go? http://esr.ibiblio.org/?p=7724



--
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com





___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-03 Thread Glenn Linderman

On 5/2/2018 8:56 PM, Gregory Szorc wrote:
Nobody in the project is seriously talking about a complete rewrite in 
Rust. Contributors to the project have varying opinions on how 
aggressively Rust should be utilized. People who contribute to the C 
code, low-level primitives (like storage, deltas, etc), and those who 
care about performance tend to want more Rust. One thing we almost 
universally agree on is that we want to rewrite all of Mercurial's C 
code in Rust. I anticipate that figuring out the balance between Rust 
and Python in Mercurial will be an ongoing conversation/process for 
the next few years.

Have you considered simply rewriting CPython in Rust?

And yes, the 4th word in that question was intended to produce peals of 
shocked laughter. But why Rust? Why not Go? http://esr.ibiblio.org/?p=7724
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Gregory Szorc
On Wed, May 2, 2018 at 8:26 PM, Benjamin Peterson 
wrote:

>
>
> On Wed, May 2, 2018, at 09:42, Gregory Szorc wrote:
> > The direction Mercurial is going in is that `hg` will likely become a
> Rust
> > binary (instead of a #!python script) that will use an embedded Python
> > interpreter. So we will have low-level control over the interpreter via
> the
> > C API. I'd also like to see us distribute a copy of Python in our
> official
> > builds. This will allow us to take various shortcuts, such as not having
> to
> > probe various sys.path entries since certain packages can only exist in
> one
> > place. I'd love to get to the state Google is at where they have
> > self-contained binaries with ELF sections containing Python modules. But
> > that requires a bit of very low-level hacking. We'll likely have a Rust
> > binary (that possibly static links libpython) and a separate JAR/zip-like
> > file containing resources.
>
> I'm curious about the rust binary. I can see that would give you startup
> time benefits similar to the ones you could get hacking the interpreter
> directly; e.g., you can use a zipfile for everything and not have site.py.
> But it seems like the Python-side wins would stop there. Is this all a
> prelude to incrementally rewriting hg in rust? (Mercuric oxide?)
>

The plans are recorded at https://www.mercurial-scm.org/wiki/OxidationPlan.
tl;dr we want to write some low-level bits in Rust but we anticipate the
bulk of the application logic remaining in Python.

Nobody in the project is seriously talking about a complete rewrite in
Rust. Contributors to the project have varying opinions on how aggressively
Rust should be utilized. People who contribute to the C code, low-level
primitives (like storage, deltas, etc), and those who care about
performance tend to want more Rust. One thing we almost universally agree
on is that we want to rewrite all of Mercurial's C code in Rust. I
anticipate that figuring out the balance between Rust and Python in
Mercurial will be an ongoing conversation/process for the next few years.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Terry Reedy

On 5/2/2018 12:42 PM, Gregory Szorc wrote:

I know this kinda/sorta exists with zipimporter. But zipimporter uses 
zlib (slow) and only allows .py/.pyc files. And I think some Python 
application distribution tools have also solved this problem. I'd 
*really* like to see a proper/robust solution in Python itself. Along 
that vein, it would be really nice if the "standalone Python 
application" story were a bit more formalized. From my perspective, it 
is insanely difficult to package and distribute an application that 
happens to use Python. It requires vastly different solutions for 
different platforms. I want to declare a minimal boilerplate somewhere 
(perhaps in setup.py) and run a command that produces an 
as-self-contained-as-possible application complete with platform-native 
installers.


I few years ago I helped my wife create a tutorial in the Renpy visual 
storytelling engine.  It is free and open source.

https://www.renpy.org
It is written in Python, while users write scripts in both Python and a 
custom scripting language.


When we were done, we pressed a button and it generated self-contained 
zip files for Windows, Linux, and Mac.  This can  be done from any of 
the three platforms.  After we tested all three files, she created a web 
page with links to the three files for download.  There have been no 
complaints so far. Perhaps the file generators could be adapted to 
packaging a project directory into a self-contained app.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread INADA Naoki
Recently, I reported how stdlib slows down `import requests`.
https://github.com/requests/requests/issues/4315#issuecomment-385584974

For Python 3.8, my ideas for faster startup time are:

* Add lazy compiling API or flag in `re` module.  The pattern is compiled
when first used.
* Add IntEnum and IntFlag alternative in C, like PyStructSequence for
namedtuple.
   It will make importing `socket` and `ssl` module much faster.  (Both
module has huge enum/flag).
* Add special casing for UTF-8 and ASCII in TextIOWrapper.  When
application uses only
   UTF-8 or ASCII, we can skip importing codecs and encodings package
entirely.
* Add faster and simpler http.parser (maybe, based on h11 [1]) and avoid
using email module in http module.

[1]: https://h11.readthedocs.io/en/latest/

I don't have significant estimate how they can make `import requests`
faster, but I believe most of these ideas
are worth enough.

Regards,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Benjamin Peterson


On Wed, May 2, 2018, at 09:42, Gregory Szorc wrote:
> The direction Mercurial is going in is that `hg` will likely become a Rust
> binary (instead of a #!python script) that will use an embedded Python
> interpreter. So we will have low-level control over the interpreter via the
> C API. I'd also like to see us distribute a copy of Python in our official
> builds. This will allow us to take various shortcuts, such as not having to
> probe various sys.path entries since certain packages can only exist in one
> place. I'd love to get to the state Google is at where they have
> self-contained binaries with ELF sections containing Python modules. But
> that requires a bit of very low-level hacking. We'll likely have a Rust
> binary (that possibly static links libpython) and a separate JAR/zip-like
> file containing resources.

I'm curious about the rust binary. I can see that would give you startup time 
benefits similar to the ones you could get hacking the interpreter directly; 
e.g., you can use a zipfile for everything and not have site.py. But it seems 
like the Python-side wins would stop there. Is this all a prelude to 
incrementally rewriting hg in rust? (Mercuric oxide?)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Barry Warsaw
On May 2, 2018, at 15:24, Gregory Szorc  wrote:
> 
> FWIW, Google has a patched glibc that implements dlopen_with_offset().
> It allows you to do things like memory map the current binary and then
> dlopen() a shared library embedded in an ELF section.
> 
> I've seen the code in the branch at
> https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/google/grte/v4-2.19/master.
> It likely exists elsewhere. An attempt to upstream it occurred at
> https://sourceware.org/bugzilla/show_bug.cgi?id=11767. It is probably
> well worth someone's time to pick up the torch and get this landed in
> glibc so everyone can be a massive step closer to self-contained, single
> binary applications. Of course, it will take years before you can rely
> on a glibc version with this API being deployed universally. But the
> sooner this lands...

Oh, I’m well aware of the history of this patch. :)  I’d love to see it 
available on the platforms I use, and agree it’s well worth someone’s time to 
continue to shepherd this through the processes to make that happen.  Even if 
it did take years to roll out, Python could use it with the proper compile-time 
checks.

-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Gregory Szorc
On 5/2/18 2:24 PM, Barry Warsaw wrote:
> On May 2, 2018, at 09:42, Gregory Szorc  wrote:
> 
>> As for things Python could do to make things better, one idea is for 
>> "package bundles." Instead of using .py, .pyc, .so, etc files as separate 
>> files on the filesystem, allow Python packages to be distributed as 
>> standalone "archive" files.
> 
> Of course, .so files have to be extracted to the file system, because we have 
> to live with dlopen()’s API.  In our first release of shiv, we had a loader 
> that did exactly that for just .so files.  We ended up just doing .pyz file 
> unpacking unconditionally, ignoring zip-safe, mostly because too many 
> packages still use __file__, which doesn’t work in a zipapp.

FWIW, Google has a patched glibc that implements dlopen_with_offset().
It allows you to do things like memory map the current binary and then
dlopen() a shared library embedded in an ELF section.

I've seen the code in the branch at
https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/google/grte/v4-2.19/master.
It likely exists elsewhere. An attempt to upstream it occurred at
https://sourceware.org/bugzilla/show_bug.cgi?id=11767. It is probably
well worth someone's time to pick up the torch and get this landed in
glibc so everyone can be a massive step closer to self-contained, single
binary applications. Of course, it will take years before you can rely
on a glibc version with this API being deployed universally. But the
sooner this lands...

> 
> I’ll plug shiv and importlib.resources (and the standalone 
> importlib_resources) again here. :)
> 
>> If you go this route, please don't require the use of zlib for file 
>> compression, as zlib is painfully slow compared to alternatives like lz4 and 
>> zstandard.
> 
> shiv works in a similar manner to pex, although it’s a completely new 
> implementation that doesn’t suffer from huge sys.paths or the use of 
> pkg_resources.  shiv + importlib.resources saves us 25-50% of warm cache 
> startup time.  That makes things better but still not ideal.  Ultimately 
> though that means we don’t suffer from the slowness of zlib since we don’t 
> count cold cache times (i.e. before the initial pyz unpacking operation).
> 
> Cheers,
> -Barry
> 
> 
> 

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Barry Warsaw
On May 2, 2018, at 09:42, Gregory Szorc  wrote:

> As for things Python could do to make things better, one idea is for "package 
> bundles." Instead of using .py, .pyc, .so, etc files as separate files on the 
> filesystem, allow Python packages to be distributed as standalone "archive" 
> files.

Of course, .so files have to be extracted to the file system, because we have 
to live with dlopen()’s API.  In our first release of shiv, we had a loader 
that did exactly that for just .so files.  We ended up just doing .pyz file 
unpacking unconditionally, ignoring zip-safe, mostly because too many packages 
still use __file__, which doesn’t work in a zipapp.

I’ll plug shiv and importlib.resources (and the standalone importlib_resources) 
again here. :)

> If you go this route, please don't require the use of zlib for file 
> compression, as zlib is painfully slow compared to alternatives like lz4 and 
> zstandard.

shiv works in a similar manner to pex, although it’s a completely new 
implementation that doesn’t suffer from huge sys.paths or the use of 
pkg_resources.  shiv + importlib.resources saves us 25-50% of warm cache 
startup time.  That makes things better but still not ideal.  Ultimately though 
that means we don’t suffer from the slowness of zlib since we don’t count cold 
cache times (i.e. before the initial pyz unpacking operation).

Cheers,
-Barry



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Barry Warsaw
Thanks for bringing this topic up again.  At $day_job, this is a highly visible 
and important topic, since the majority of our command line tools are written 
in Python (of varying versions from 2.7 to 3.6).  Some of those tools can take 
upwards of 5 seconds or more just to respond to —help, which causes lots of 
pain for developers, who complain (rightly so) up the management chain. ;)

We’ve done a fair bit of work to bring those numbers down without super radical 
workarounds.  Often there are problems not strictly related to the Python 
interpreter that contribute to this.  Python gets blamed, but it’s not always 
the interpreter’s fault.  Common issues include:

* Modules that have import-time side effects, such as network access or 
expensive creation of data structures.  Python 3.7’s `-X importtime` switch is 
a really wonderful way to identify the worst offenders.  Once 3.7 is released, 
I do plan to spend some time using this to collect data internally so we can 
attack our own libraries, and perhaps put automated performance testing into 
our build stack, to identify start up time regressions.

* pkg_resources.  When you have tons of entries on sys.path, pkg_resources does 
a lot of work at import time, and because of common patterns which tend to use 
pkg_resources namespace package support in __init__.py files, this just kills 
start up times.  Of course, pkg_resources has other uses too, so even in a 
purely Python 3 world (where your namespace packages can omit the __init__.py), 
you’ll often get clobbered as soon as you want to use the Basic Resource Access 
API.  This is also pretty common, and it’s the main reason why Brett and I 
created importlib.resources for 3.7 (with a standalone API-compatible library 
for older Pythons).  That’s one less reason to use pkg_resources, but it 
doesn’t address the __init__.py use.  Brett and I have been talking about 
addressing that for 3.8.

* pex - which we use as our single file zipapp tool.  Especially the 
interaction between pex and pkg_resources introduces pretty significant 
overhead.  My colleague Loren Carvalho created a tool called shiv which 
requires at least Python 3.6, avoids the use of pkg_resources, and implements 
other tricks to be much more performant than pex.   Shiv is now open source and 
you can find it on RTD and GitHub.

The switch to shiv and importlib.resources can shave 25-50% off of warm cache 
start up times for zipapp style executables.

Another thing we’ve done, although I’m much less sanguine about them as a 
general approach, is to move imports into functions, but we’re trying to only 
use that trick on the most critical cases.

Some import time effects can’t be changed.  Decorators come to mind, and click 
is a popular library for CLIs that provides some great features, but decorators 
do prevent a lazy loading approach.

> On May 1, 2018, at 20:26, Gregory Szorc  wrote:

>> You might think "what's a few milliseconds matter".  But if you run
>> hundreds of commands in a shell script it adds up.  git's speed is one
>> of the few bright spots in its UX, and hg's comparative slowness here is
>> a palpable disadvantage.

Oh, for command line tools, milliseconds absolutely matter.

> As a concrete example, I recently landed a Mercurial patch [2] that
> stubs out zope.interface to prevent the import of 9 modules on every
> `hg` invocation.

I have a similar dastardly plan to provide a pkg_resources stub :).

> Mercurial provides a `chg` program that essentially spins up a daemon
> `hg` process running a "command server" so the `chg` program [written in
> C - no startup overhead] can dispatch commands to an already-running
> Python/`hg` process and avoid paying the startup overhead cost. When you
> run Mercurial's test suite using `chg`, it completes *minutes* faster.
> `chg` exists mainly as a workaround for slow startup overhead.

A couple of our developers demoed a similar approach for one of our CLIs that 
almost everyone uses.  It’s a big application with lots of dependencies, so 
particularly vulnerable to pex and pkg_resources overhead.  While it was just a 
prototype, it was darn impressive to see subsequent invocations produce output 
almost immediately.  It’s unfortunate that we have to utilize all these tricks 
to get even moderately performant Python CLIs.

A few of us spent some time at last year’s core Python dev talking about other 
things we could do to improve Python’s start up time, not just with the 
interpreter itself, but within the larger context of the Python ecosystem.  
Many ideas seem promising until you dive into the details, so it’s definitely 
hard to imagine maintaining all of Python’s dynamic semantics and still making 
it an order of magnitude faster to start up.  But that’s not an excuse to give 
up, and I’m hoping we can continue to attack the problem, both in the micro and 
the macro, for 3.8 and beyond, because the alternative is that Python becomes 
less 

Re: [Python-Dev] Python startup time

2018-05-02 Thread Neil Schemenauer
Antoine:
> The overhead of importing is not in trying too many names, but in
> loading the module and executing its bytecode.

That was my conclusion as well when I did some profiling last fall
at the Python core sprint.  My lazy execution experiments are an
attempt to solve this:

https://github.com/python/cpython/pull/6194

I expect that Mercurial is already doing a lot of tricks to make
execution more lazy.  They have a lazy module import hook but they
probably do other things to not execute more bytecode at startup
then is needed.  My lazy execution idea is that this could happen
more automatically.  I.e. don't pay for something you don't use.
Right now, with eager module imports, you usually pay a price for
every bit of bytecode that your program potentially uses.

Another idea, suggested to me by Carl Shapiro, is to store
unmarshalled Python data in the heap section of the executable (or
in DLLs).  Then, the OS page fault handling would take care of only
loading the data into RAM that is actually being used.  The linker
would take care of fixing up pointer references.  There are a lot of
details to work out with this idea but I have heard that Jeethu Rao
(Carl's colleague at Instagram) has a prototype implementation that
shows promise.

Regards,

  Neil
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Nathaniel Smith
On Wed, May 2, 2018, 09:51 Gregory Szorc  wrote:

> Correct me if I'm wrong, but aren't there downsides with regards to C
> extension compatibility to not having a shared libpython? Or does all the
> packaging tooling "just work" without a libpython? (It's possible I have my
> wires crossed up with something else regarding a statically linked Python.)
>

IIRC, the rule on Linux is that if you build an extension on a statically
built python, then it can be imported on a shared python, but not
vice-versa. Manylinux wheels are therefore always built on a static python
so that they'll work everywhere. (We should probably clean this up upstream
at some point, but there's not a lot of appetite for touching this stuff –
very obscure, very easy to break things without realizing it, not much
upside.)

On Windows I don't think there is such a thing as a static build, because
extensions have to link to the python dll to work at all. And on MacOS I'm
not sure, though from knowing how their linker works my guess is that all
extensions act like static extensions do on Linux.

-n
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Gregory Szorc
 On Tue, May 1, 2018 at 11:55 PM, Ray Donnelly 
wrote:

> Is your Python interpreter statically linked? The Python 3 ones from the
anaconda distribution (use Miniconda!) are for Linux and macOS and that
roughly halved our startup times.

My Python interpreters use a shared library. I'll definitely investigate
the performance of a statically-linked interpreter.

Correct me if I'm wrong, but aren't there downsides with regards to C
extension compatibility to not having a shared libpython? Or does all the
packaging tooling "just work" without a libpython? (It's possible I have my
wires crossed up with something else regarding a statically linked Python.)

On Wed, May 2, 2018 at 2:26 AM, Victor Stinner  wrote:

> What do you propose to make Python startup faster?
>

That's a very good question. I'm not sure I'm able to answer it because I
haven't dug too much into CPython's internals much farther than what is
required to implement C extensions. But I can share insight from what the
Mercurial project has collectively learned.


>
> As I wrote in my previous emails, many Python core developers care of
> the startup time and we are working on making it faster.
>
> INADA Naoki added -X importtime to identify slow imports and
> understand where Python spent its startup time.
>

-X importtime is a great start! For a follow-up enhancement, it would be
useful to see what aspects of import are slow. Is it finding modules
(involves filesystem I/O)? Is it unmarshaling pyc files? Is it executing
the module code? If executing code, what part is slow? Inline
statements/expressions? Compiling types? Printing the microseconds it takes
to import a module is useful. But it only gives me a general direction: I
want to know what parts of the import made it slow so I know if I should be
focusing on code running during module import, slimming down the size of a
module, eliminating the module import from fast paths, pursuing alternative
module importers, etc.


>
> Recent example: Barry Warsaw identified that pkg_resources is slow and
> added importlib.resources to Python 3.7:
> https://docs.python.org/dev/library/importlib.html#module-
> importlib.resources
>
> Brett Cannon is also working on a standard solution for lazy imports
> since many years:
> https://pypi.org/project/modutil/
> https://snarky.ca/lazy-importing-in-python-3-7/
>

Mercurial has used lazy module imports for years. On 2.7.14, it reduces `hg
version` from ~160ms to ~55ms (~34% of original). On Python 3, we're using
`importlib.util.LazyLoader` and it reduces `hg version` on 3.7 from ~245ms
to ~120ms (~49% of original). I'm not sure why Python 3's built-in module
importer doesn't yield the speedup that our custom Python 2 importer does.
One explanation is our custom importer is more advanced than importlib.
Another is that Python 3's import mechanism is slower (possibly due to
being written in Python instead of C). We haven't yet spent much time
optimizing Mercurial for Python 3: our immediate goal is to get it working
first. Given the startup performance problem on Python 3, it is only a
matter of time before we dig into this further.

It's worth noting that lazy module importing can be undone via common
patterns. Most commonly, `from foo import X`. It's *really* difficult to
implement a proper object proxy. Mercurial's lazy importer gives up in this
case and imports the module and exports the symbol. (But if the imported
module is a package, we detect that and make the module exports proxies to
a lazy module.)

Another common undermining of the lazy importer is code that runs during
import time module exec that accesses an attribute. e.g.

```
import foo

class myobject(foo.Foo):
pass
```

Mercurial goes out of its way to avoid these patterns so modules can be
delay imported as much as possible. As long as import times are
problematic, it would be helpful if the standard library adopted similar
patterns. Although I recognize there are backwards compatibility concerns
that tie your hands a bit.


> Nick Coghlan is working on the C API to configure Python startup: PEP
> 432. When it will be ready, maybe Mercurial could use a custom Python
> optimized for its use case.
>

That looks great!

The direction Mercurial is going in is that `hg` will likely become a Rust
binary (instead of a #!python script) that will use an embedded Python
interpreter. So we will have low-level control over the interpreter via the
C API. I'd also like to see us distribute a copy of Python in our official
builds. This will allow us to take various shortcuts, such as not having to
probe various sys.path entries since certain packages can only exist in one
place. I'd love to get to the state Google is at where they have
self-contained binaries with ELF sections containing Python modules. But
that requires a bit of very low-level hacking. We'll likely have a Rust
binary (that possibly static links libpython) and a separate JAR/zip-like
file 

Re: [Python-Dev] Python startup time

2018-05-02 Thread Antoine Pitrou
On Wed, 2 May 2018 11:26:35 +0200
Victor Stinner  wrote:
> 
> Brett Cannon is also working on a standard solution for lazy imports
> since many years:
> https://pypi.org/project/modutil/
> https://snarky.ca/lazy-importing-in-python-3-7/

AFAIK, Mercurial already has its own lazy importer.

> Nick Coghlan is working on the C API to configure Python startup: PEP
> 432. When it will be ready, maybe Mercurial could use a custom Python
> optimized for its use case.
> 
> IMHO Python import system is inefficient. We try too many alternative names.

The overhead of importing is not in trying too many names, but in
loading the module and executing its bytecode.

> Why do we still check for the .pyc file outside __pycache__ directories?

Because we support sourceless distributions.

> Why do we have to check for 3 different names for .so files?

See https://bugs.python.org/issue32387

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Victor Stinner
What do you propose to make Python startup faster?

As I wrote in my previous emails, many Python core developers care of
the startup time and we are working on making it faster.

INADA Naoki added -X importtime to identify slow imports and
understand where Python spent its startup time.

Recent example: Barry Warsaw identified that pkg_resources is slow and
added importlib.resources to Python 3.7:
https://docs.python.org/dev/library/importlib.html#module-importlib.resources

Brett Cannon is also working on a standard solution for lazy imports
since many years:
https://pypi.org/project/modutil/
https://snarky.ca/lazy-importing-in-python-3-7/

Nick Coghlan is working on the C API to configure Python startup: PEP
432. When it will be ready, maybe Mercurial could use a custom Python
optimized for its use case.

IMHO Python import system is inefficient. We try too many alternative names.

Example with Python 3.8

$ ./python -vv:
>>> import dontexist
# trying 
/home/vstinner/prog/python/master/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/dontexist.so
# trying /home/vstinner/prog/python/master/dontexist.py
# trying /home/vstinner/prog/python/master/dontexist.pyc
# trying 
/home/vstinner/prog/python/master/Lib/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.abi3.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.so
# trying /home/vstinner/prog/python/master/Lib/dontexist.py
# trying /home/vstinner/prog/python/master/Lib/dontexist.pyc
# trying 
/home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying 
/home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.abi3.so
# trying 
/home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.so
# trying 
/home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.py
# trying 
/home/vstinner/prog/python/master/build/lib.linux-x86_64-3.8-pydebug/dontexist.pyc
# trying 
/home/vstinner/.local/lib/python3.8/site-packages/dontexist.cpython-38dm-x86_64-linux-gnu.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.abi3.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.so
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.py
# trying /home/vstinner/.local/lib/python3.8/site-packages/dontexist.pyc
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 983, in _find_and_load
  File "", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'dontexist'

Why do we still check for the .pyc file outside __pycache__ directories?

Why do we have to check for 3 different names for .so files?

Does Mercurial need all directories of sys.path?

What's the status of the "system python" project? :-)

I also would prefer Python without the site module. Can we rewrite
this module in C maybe? Until recently, the site module was needed on
Python to create the "mbcs" encoding alias. Hopefully, the feature has
been removed into Lib/encodings/__init__.py (new private _alias_mbcs()
function).

Python 3.7b3+:

$ python3.7 -X importtime -c pass
import time: self [us] | cumulative | imported package
import time:95 | 95 | zipimport
import time:   589 |589 | _frozen_importlib_external
import time:67 | 67 | _codecs
import time:   498 |565 |   codecs
import time:   425 |425 |   encodings.aliases
import time:   641 |   1629 | encodings
import time:   228 |228 | encodings.utf_8
import time:   143 |143 | _signal
import time:   335 |335 | encodings.latin_1
import time:58 | 58 | _abc
import time:   265 |322 |   abc
import time:   298 |619 | io
import time:69 | 69 |   _stat
import time:   196 |265 | stat
import time:   169 |169 |   genericpath
import time:   336 |505 | posixpath
import time:  1190 |   1190 | _collections_abc
import time:   600 |   2557 |   os
import time:   223 |223 |   _sitebuiltins
import time:   214 |214 |   sitecustomize
import time:74 | 74 |   usercustomize
import time:   477 |   3544 | site

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2018-05-02 Thread Ray Donnelly
On Wed, May 2, 2018, 4:53 AM Gregory Szorc  wrote:

> On 7/19/2017 12:15 PM, Larry Hastings wrote:
> >
> >
> > On 07/19/2017 05:59 AM, Victor Stinner wrote:
> >> Mercurial startup time is already 45.8x slower than Git whereas tested
> >> Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
> >> developers, with a startup time 2x - 3x slower...
> >
> > When Matt Mackall spoke at the Python Language Summit some years back, I
> > recall that he specifically complained about Python startup time.  He
> > said Python 3 "didn't solve any problems for [them]"--they'd already
> > solved their Unicode hygiene problems--and that Python's slow startup
> > time was already a big problem for them.  Python 3 being /even slower/
> > to start was absolutely one of the reasons why they didn't want to
> upgrade.
> >
> > You might think "what's a few milliseconds matter".  But if you run
> > hundreds of commands in a shell script it adds up.  git's speed is one
> > of the few bright spots in its UX, and hg's comparative slowness here is
> > a palpable disadvantage.
> >
> >
> >> So please continue efforts for make Python startup even faster to beat
> >> all other programming languages, and finally convince Mercurial to
> >> upgrade ;-)
> >
> > I believe Mercurial is, finally, slowly porting to Python 3.
> >
> > https://www.mercurial-scm.org/wiki/Python3
> >
> > Nevertheless, I can't really be annoyed or upset at them moving slowly
> > to adopt Python 3, as Matt's objections were entirely legitimate.
>
> I just now found found this thread when searching the archive for
> threads about startup time. And I was searching for threads about
> startup time because Mercurial's startup time has been getting slower
> over the past few months and this is causing substantial pain.
>
> As I posted back in 2014 [1], CPython's startup overhead was >10% of the
> total CPU time in Mercurial's test suite. And when you factor in the
> time to import modules that get Mercurial to a point where it can run
> commands, it was more like 30%!
>
> Mercurial's full test suite currently runs `hg` ~25,000 times. Using
> Victor's startup time numbers of 6.4ms for 2.7 and 14.5ms for
> 3.7/master, Python startup overhead contributes ~160s on 2.7 and ~360s
> on 3.7/master. Even if you divide this by the number of available CPU
> cores, we're talking dozens of seconds of wall time just waiting for
> CPython to get to a place where Mercurial's first bytecode can execute.
>
> And the problem is worse when you factor in the time it takes to import
> Mercurial's own modules.
>
> As a concrete example, I recently landed a Mercurial patch [2] that
> stubs out zope.interface to prevent the import of 9 modules on every
> `hg` invocation. This "only" saved ~6.94ms for a typical `hg`
> invocation. But this decreased the CPU time required to run the test
> suite on my i7-6700K from ~4450s to ~3980s (~89.5% of original) - a
> reduction of almost 8 minutes of CPU time (and over 1 minute of wall time)!
>
> By the time CPython gets Mercurial to a point where we can run useful
> code, we've already blown most of or past the time budget where humans
> perceive an action/command as instantaneous. If you ignore startup
> overhead, Mercurial's performance compares quite well to Git's for many
> operations. But the reality is that CPython startup overhead makes it
> look like Mercurial is non-instantaneous before Mercurial even has the
> opportunity to execute meaningful code!
>
> Mercurial provides a `chg` program that essentially spins up a daemon
> `hg` process running a "command server" so the `chg` program [written in
> C - no startup overhead] can dispatch commands to an already-running
> Python/`hg` process and avoid paying the startup overhead cost. When you
> run Mercurial's test suite using `chg`, it completes *minutes* faster.
> `chg` exists mainly as a workaround for slow startup overhead.
>
> Changing gears, my day job is maintaining Firefox's build system. We use
> Python heavily in the build system. And again, Python startup overhead
> is problematic. I don't have numbers offhand, but we invoke likely a few
> hundred Python processes as part of building Firefox. It should be
> several thousand. But, we've had to "hack" parts of the build system to
> "batch" certain build actions in single process invocations in order to
> avoid Python startup overhead. This undermines the ability of some build
> tools to formulate a reasonable understanding of the DAG and it causes a
> bit of pain for build system developers and makes it difficult to
> achieve "no-op" and fast incremental builds because we're always
> invoking certain Python processes because we've had to move DAG
> awareness out of the build backend and into Python. At some point, we'll
> likely replace Python code with Rust so the build system is more "pure"
> and easier to maintain and reason about.
>
> I've seen posts in this thread and elsewhere in the CPython 

Re: [Python-Dev] Python startup time

2018-05-01 Thread Gregory Szorc
On 7/19/2017 12:15 PM, Larry Hastings wrote:
> 
> 
> On 07/19/2017 05:59 AM, Victor Stinner wrote:
>> Mercurial startup time is already 45.8x slower than Git whereas tested
>> Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
>> developers, with a startup time 2x - 3x slower...
> 
> When Matt Mackall spoke at the Python Language Summit some years back, I
> recall that he specifically complained about Python startup time.  He
> said Python 3 "didn't solve any problems for [them]"--they'd already
> solved their Unicode hygiene problems--and that Python's slow startup
> time was already a big problem for them.  Python 3 being /even slower/
> to start was absolutely one of the reasons why they didn't want to upgrade.
> 
> You might think "what's a few milliseconds matter".  But if you run
> hundreds of commands in a shell script it adds up.  git's speed is one
> of the few bright spots in its UX, and hg's comparative slowness here is
> a palpable disadvantage.
> 
> 
>> So please continue efforts for make Python startup even faster to beat
>> all other programming languages, and finally convince Mercurial to
>> upgrade ;-)
> 
> I believe Mercurial is, finally, slowly porting to Python 3.
> 
> https://www.mercurial-scm.org/wiki/Python3
> 
> Nevertheless, I can't really be annoyed or upset at them moving slowly
> to adopt Python 3, as Matt's objections were entirely legitimate.

I just now found found this thread when searching the archive for
threads about startup time. And I was searching for threads about
startup time because Mercurial's startup time has been getting slower
over the past few months and this is causing substantial pain.

As I posted back in 2014 [1], CPython's startup overhead was >10% of the
total CPU time in Mercurial's test suite. And when you factor in the
time to import modules that get Mercurial to a point where it can run
commands, it was more like 30%!

Mercurial's full test suite currently runs `hg` ~25,000 times. Using
Victor's startup time numbers of 6.4ms for 2.7 and 14.5ms for
3.7/master, Python startup overhead contributes ~160s on 2.7 and ~360s
on 3.7/master. Even if you divide this by the number of available CPU
cores, we're talking dozens of seconds of wall time just waiting for
CPython to get to a place where Mercurial's first bytecode can execute.

And the problem is worse when you factor in the time it takes to import
Mercurial's own modules.

As a concrete example, I recently landed a Mercurial patch [2] that
stubs out zope.interface to prevent the import of 9 modules on every
`hg` invocation. This "only" saved ~6.94ms for a typical `hg`
invocation. But this decreased the CPU time required to run the test
suite on my i7-6700K from ~4450s to ~3980s (~89.5% of original) - a
reduction of almost 8 minutes of CPU time (and over 1 minute of wall time)!

By the time CPython gets Mercurial to a point where we can run useful
code, we've already blown most of or past the time budget where humans
perceive an action/command as instantaneous. If you ignore startup
overhead, Mercurial's performance compares quite well to Git's for many
operations. But the reality is that CPython startup overhead makes it
look like Mercurial is non-instantaneous before Mercurial even has the
opportunity to execute meaningful code!

Mercurial provides a `chg` program that essentially spins up a daemon
`hg` process running a "command server" so the `chg` program [written in
C - no startup overhead] can dispatch commands to an already-running
Python/`hg` process and avoid paying the startup overhead cost. When you
run Mercurial's test suite using `chg`, it completes *minutes* faster.
`chg` exists mainly as a workaround for slow startup overhead.

Changing gears, my day job is maintaining Firefox's build system. We use
Python heavily in the build system. And again, Python startup overhead
is problematic. I don't have numbers offhand, but we invoke likely a few
hundred Python processes as part of building Firefox. It should be
several thousand. But, we've had to "hack" parts of the build system to
"batch" certain build actions in single process invocations in order to
avoid Python startup overhead. This undermines the ability of some build
tools to formulate a reasonable understanding of the DAG and it causes a
bit of pain for build system developers and makes it difficult to
achieve "no-op" and fast incremental builds because we're always
invoking certain Python processes because we've had to move DAG
awareness out of the build backend and into Python. At some point, we'll
likely replace Python code with Rust so the build system is more "pure"
and easier to maintain and reason about.

I've seen posts in this thread and elsewhere in the CPython development
universe that challenge whether milliseconds in startup time matter.
Speaking as a Mercurial and Firefox build system developer,
*milliseconds absolutely matter*. Going further, *fractions of
milliseconds matter*. For Mercurial's 

Re: [Python-Dev] Python startup time

2017-07-23 Thread Nick Coghlan
On 23 July 2017 at 09:35, Steve Dower  wrote:
> Yes, I’m aware of that, which is why I don’t have any specific suggestions
> off-hand. But given the differences in file systems between Windows and
> other OSs, it wouldn’t surprise me if there were a more optimal approach for
> NTFS to amortize calls better. Perhaps not, but it is still the most
> expensive part of startup that we have any ability to change, so it’s worth
> investigating.

That does remind me of a capability we haven''t played with a lot recently:

$ python3 -m site
sys.path = [
'/home/ncoghlan',
'/usr/lib64/python36.zip',
'/usr/lib64/python3.6',
'/usr/lib64/python3.6/lib-dynload',
'/home/ncoghlan/.local/lib/python3.6/site-packages',
'/usr/lib64/python3.6/site-packages',
'/usr/lib/python3.6/site-packages',
]
USER_BASE: '/home/ncoghlan/.local' (exists)
USER_SITE: '/home/ncoghlan/.local/lib/python3.6/site-packages' (exists)
ENABLE_USER_SITE: True

The interpreter puts a zip file ahead of the regular unpacked standard
library on sys.path because at one point in time that was a useful
optimisation technique for reducing import costs on application
startup. It was a potentially big win with the old "multiple stat
calls" import implementation, but I'm not aware of any more recent
benchmarks relative to the current listdir-caching based import
implementation.

So I think some interesting experiments to try measuring might be:

- pushing the "always imported" modules into a dedicated zip archive
- having the interpreter pre-seed sys.modules with the contents of
that dedicated archive
- freezing those modules and building them into the interpreter that way
- compiling the standalone top-level modules with Cython, and loading
them as extension modules
- compiling in the Cython generated modules as builtins (not currently
an option for packages & submodules due to [1])

The nice thing about those kinds of approaches is that they're all
fairly general purpose, and relate primarily to how the Python
interpreter is put together, rather than how the individual modules
are written in the first place.

(I'm not volunteering to run those experiments, though - just pointing
out some of the technical options we have available to us that don't
involve adding more handcrafted C extension modules to CPython)

[1] https://bugs.python.org/issue1644818

Cheers,
NIck.

P.S. Checking the current list of source modules implicitly loaded at
startup, I get:

>>> import sys
>>> sorted(k for k, m in sys.modules.items() if m.__spec__ is not None and 
>>> type(m.__spec__.loader).__name__ == "SourceFileLoader")
['_collections_abc', '_sitebuiltins', '_weakrefset', 'abc', 'codecs',
'encodings', 'encodings.aliases', 'encodings.latin_1',
'encodings.utf_8', 'genericpath', 'io', 'os', 'os.path', 'posixpath',
'rlcompleter', 'site', 'stat']


-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Brett Cannon
On Sun, Jul 23, 2017, 10:52 Michel Desmoulin, 
wrote:

>
>
> Le 23/07/2017 à 19:36, Brett Cannon a écrit :
> >
> >
> > On Sun, Jul 23, 2017, 00:53 Michel Desmoulin,  > > wrote:
> >
> >
> >
> > > Optimizing startup time is incredibly valuable,
> >
> > I've been reading that from the beginning of this thread but I've
> been
> > using python since the 2.4 and I never felt the burden of the
> > startup time.
> >
> > I'm guessing a lot of people are like me, they just don't express
> them
> > self because "better startup time can't be bad so let's not put a
> > barrier on this".
> >
> > I'm not against it, but since the necessity of a faster Python in
> > general has been a debate for years and is only finally catching up
> with
> > the work of Victor Stinner, can somebody explain me the deal with
> start
> > up time ?
> >
> > I understand where it can improve your lives. I just don't get why
> it's
> > suddenly such an explosion of expectations and needs.
> >
> >
> > It's actually always been something we have tried to improve, it just
> > comes in waves. For instance we occasionally re-examine what modules get
> > pulled in during startup. Importlib was optimized to help with startup.
> > This just happens to be the latest round of trying to improve the
> situation.
> >
> > As for why we care, every command-line app wants to at least appear
> > faster if not be faster because just getting to the point of being able
> > to e.g. print a version number is dominated by Python and app start-up.
>
>
> Fair enought.
>
> > And this is not guessing; I work with a team that puts out a command
> > line app and one of the biggest complaints they get is the startup time.
>
> This I don't get. When I run any command line utility in python (grin,
> ffind, pyped, django-admin.py...), the execute in a split second.
>
> I can't even SEE the different between:
>
> python3 -c "import os; [print(x) for x in os.listdir('.')]"
>
> and
>
> ls .
>
> I'm having a hard time understanding how the Python VM startup time can
> be perceived as a barriere here. I can understand if you have an
> application firing Python 1000 times a second, like a CGI service or
> some kind of code exec service. But scripting ?
>

So you're viewing it from a single OS and single machine perspective. Stuff
varies so much that you can't compare something like this based on a single
experience.

I also said "appear" on purpose.  Some people just compare Python against
other languages based on benchmarks like startup when choosing a language
so part of this is optics. This also applies when people compare Python 2
to 3.


> Now I can imagine that a given Python program can be slow to start up,
> because it imports a lot of things. But not the VM itself.
>

There's also the fact that some things we might do to speed up Python's own
startup will propagate to user code and so have a bigger effect, e.g.
making namedtuple cheaper reaches into user code that uses namedtuple.

IOW based on experience this is worth the time to look into.


>
> >
> > -brett
> >
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org 
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe:
> >
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
> >
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Michel Desmoulin


Le 23/07/2017 à 19:36, Brett Cannon a écrit :
> 
> 
> On Sun, Jul 23, 2017, 00:53 Michel Desmoulin,  > wrote:
> 
> 
> 
> > Optimizing startup time is incredibly valuable,
> 
> I've been reading that from the beginning of this thread but I've been
> using python since the 2.4 and I never felt the burden of the
> startup time.
> 
> I'm guessing a lot of people are like me, they just don't express them
> self because "better startup time can't be bad so let's not put a
> barrier on this".
> 
> I'm not against it, but since the necessity of a faster Python in
> general has been a debate for years and is only finally catching up with
> the work of Victor Stinner, can somebody explain me the deal with start
> up time ?
> 
> I understand where it can improve your lives. I just don't get why it's
> suddenly such an explosion of expectations and needs.
> 
> 
> It's actually always been something we have tried to improve, it just
> comes in waves. For instance we occasionally re-examine what modules get
> pulled in during startup. Importlib was optimized to help with startup.
> This just happens to be the latest round of trying to improve the situation.
> 
> As for why we care, every command-line app wants to at least appear
> faster if not be faster because just getting to the point of being able
> to e.g. print a version number is dominated by Python and app start-up.


Fair enought.

> And this is not guessing; I work with a team that puts out a command
> line app and one of the biggest complaints they get is the startup time.

This I don't get. When I run any command line utility in python (grin,
ffind, pyped, django-admin.py...), the execute in a split second.

I can't even SEE the different between:

python3 -c "import os; [print(x) for x in os.listdir('.')]"

and

ls .

I'm having a hard time understanding how the Python VM startup time can
be perceived as a barriere here. I can understand if you have an
application firing Python 1000 times a second, like a CGI service or
some kind of code exec service. But scripting ?

Now I can imagine that a given Python program can be slow to start up,
because it imports a lot of things. But not the VM itself.


> 
> -brett
> 
> ___
> Python-Dev mailing list
> Python-Dev@python.org 
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
> 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Brett Cannon
On Sun, Jul 23, 2017, 00:53 Michel Desmoulin, 
wrote:

>
>
> > Optimizing startup time is incredibly valuable,
>
> I've been reading that from the beginning of this thread but I've been
> using python since the 2.4 and I never felt the burden of the startup time.
>
> I'm guessing a lot of people are like me, they just don't express them
> self because "better startup time can't be bad so let's not put a
> barrier on this".
>
> I'm not against it, but since the necessity of a faster Python in
> general has been a debate for years and is only finally catching up with
> the work of Victor Stinner, can somebody explain me the deal with start
> up time ?
>
> I understand where it can improve your lives. I just don't get why it's
> suddenly such an explosion of expectations and needs.
>

It's actually always been something we have tried to improve, it just comes
in waves. For instance we occasionally re-examine what modules get pulled
in during startup. Importlib was optimized to help with startup. This just
happens to be the latest round of trying to improve the situation.

As for why we care, every command-line app wants to at least appear faster
if not be faster because just getting to the point of being able to e.g.
print a version number is dominated by Python and app start-up. And this is
not guessing; I work with a team that puts out a command line app and one
of the biggest complaints they get is the startup time.

-brett

___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Antoine Pitrou
On Sat, 22 Jul 2017 16:35:31 -0700
Steve Dower  wrote:
> 
> Yes, I’m aware of that, which is why I don’t have any specific suggestions 
> off-hand. But given the differences in file systems between Windows and other 
> OSs, it wouldn’t surprise me if there were a more optimal approach for NTFS 
> to amortize calls better. Perhaps not, but it is still the most expensive 
> part of startup that we have any ability to change, so it’s worth 
> investigating.

Can you expand on it being "the most expensive part of startup that we
have any ability to change"?

For example, how do Nick's benchmarks above fare on Windows?

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-23 Thread Michel Desmoulin


> Optimizing startup time is incredibly valuable, 

I've been reading that from the beginning of this thread but I've been
using python since the 2.4 and I never felt the burden of the startup time.

I'm guessing a lot of people are like me, they just don't express them
self because "better startup time can't be bad so let's not put a
barrier on this".

I'm not against it, but since the necessity of a faster Python in
general has been a debate for years and is only finally catching up with
the work of Victor Stinner, can somebody explain me the deal with start
up time ?

I understand where it can improve your lives. I just don't get why it's
suddenly such an explosion of expectations and needs.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-22 Thread Steve Dower
“Stat calls in the import system were optimized in importlib a while back”

Yes, I’m aware of that, which is why I don’t have any specific suggestions 
off-hand. But given the differences in file systems between Windows and other 
OSs, it wouldn’t surprise me if there were a more optimal approach for NTFS to 
amortize calls better. Perhaps not, but it is still the most expensive part of 
startup that we have any ability to change, so it’s worth investigating.

Cheers,
Steve

Top-posted from my Windows phone

From: Brett Cannon
Sent: Saturday, July 22, 2017 10:18
To: Steve Dower; Alex Walters
Cc: Python-Dev
Subject: Re: [Python-Dev] Python startup time


On Sat, Jul 22, 2017, 07:22 Steve Dower, <steve.do...@python.org> wrote:
I believe the trend is due to language like Python and Node.js, most of which 
aggressively discourage threading (more from the broader community than the 
core languages, but I see a lot of apps using these now), and also the higher 
reliability afforded by out-of-process tasks (that is, one crash doesn’t kill 
the entire app – e.g browser tabs).
 
Optimizing startup time is incredibly valuable, and having tried it a few times 
I believe that the import system (in essence, stat calls) is the biggest 
culprit. The tens of ms prior to the first user import can’t really go anywhere.

Stat calls in the import system were optimized in importlib a while back to be 
cached in finders so at this point you will have to remove a stat call to lower 
that cost or cache more which goes into breaking abstractions or designing new 
APIs.

-brett

 
Cheers,
Steve
 
Top-posted from my Windows phone
 
From: Alex Walters
Sent: Saturday, July 22, 2017 1:39
Cc: 'Python-Dev'

Subject: Re: [Python-Dev] Python startup time
 
> -Original Message-
> From: Python-Dev [mailto:python-dev-bounces+tritium-
> list=sdamon@python.org] On Behalf Of Paul Moore
> Sent: Saturday, July 22, 2017 4:14 AM
> To: David Mertz <me...@gnosis.cx>
> Cc: Barry Warsaw <ba...@python.org>; Python-Dev  d...@python.org>
> Subject: Re: [Python-Dev] Python startup time
 
 
> It's a bit of a chicken and egg problem - Windows users avoid
> excessive command line program invocation because startup time is
> high, so no-one optimises startup time because Windows users don't use
> short-lived command line programs. But I'm seeing a trend away from
> that - more and more Windows tools these days seem to be comfortable
> spawning subprocesses. I don't know what prompted that trend.
 
The programs I see that are comfortable spawning processes willy-nilly on
windows are mostly .net, which has a lot of the runtime assemblies cached by
the OS in the GAC - if you are spawning a second processes of yourself, or
something that uses the same libraries as you, the compile step on those can
be skipped.  Unless you are talking about python/non-.NET programs, in which
case, I have no answer.
> Paul
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-
> list%40sdamon.com
 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org
 
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/brett%40python.org

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-22 Thread Brett Cannon
On Sat, Jul 22, 2017, 07:22 Steve Dower, <steve.do...@python.org> wrote:

> I believe the trend is due to language like Python and Node.js, most of
> which aggressively discourage threading (more from the broader community
> than the core languages, but I see a lot of apps using these now), and also
> the higher reliability afforded by out-of-process tasks (that is, one crash
> doesn’t kill the entire app – e.g browser tabs).
>
>
>
> Optimizing startup time is incredibly valuable, and having tried it a few
> times I believe that the import system (in essence, stat calls) is the
> biggest culprit. The tens of ms prior to the first user import can’t really
> go anywhere.
>

Stat calls in the import system were optimized in importlib a while back to
be cached in finders so at this point you will have to remove a stat call
to lower that cost or cache more which goes into breaking abstractions or
designing new APIs.

-brett


>
> Cheers,
>
> Steve
>
>
>
> Top-posted from my Windows phone
>
>
>
> *From: *Alex Walters <tritium-l...@sdamon.com>
> *Sent: *Saturday, July 22, 2017 1:39
> *Cc: *'Python-Dev' <python-dev@python.org>
>
>
> *Subject: *Re: [Python-Dev] Python startup time
>
>
>
> > -Original Message-
>
> > From: Python-Dev [mailto:python-dev-bounces+tritium-
>
> > list=sdamon@python.org] On Behalf Of Paul Moore
>
> > Sent: Saturday, July 22, 2017 4:14 AM
>
> > To: David Mertz <me...@gnosis.cx>
>
> > Cc: Barry Warsaw <ba...@python.org>; Python-Dev 
> > d...@python.org>
>
> > Subject: Re: [Python-Dev] Python startup time
>
>
>
>
>
> > It's a bit of a chicken and egg problem - Windows users avoid
>
> > excessive command line program invocation because startup time is
>
> > high, so no-one optimises startup time because Windows users don't use
>
> > short-lived command line programs. But I'm seeing a trend away from
>
> > that - more and more Windows tools these days seem to be comfortable
>
> > spawning subprocesses. I don't know what prompted that trend.
>
>
>
> The programs I see that are comfortable spawning processes willy-nilly on
>
> windows are mostly .net, which has a lot of the runtime assemblies cached
> by
>
> the OS in the GAC - if you are spawning a second processes of yourself, or
>
> something that uses the same libraries as you, the compile step on those
> can
>
> be skipped.  Unless you are talking about python/non-.NET programs, in
> which
>
> case, I have no answer.
>
> > Paul
>
> > ___
>
> > Python-Dev mailing list
>
> > Python-Dev@python.org
>
> > https://mail.python.org/mailman/listinfo/python-dev
>
> > Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-
>
> > list%40sdamon.com
>
>
>
> ___
>
> Python-Dev mailing list
>
> Python-Dev@python.org
>
> https://mail.python.org/mailman/listinfo/python-dev
>
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-22 Thread Steve Dower
I believe the trend is due to language like Python and Node.js, most of which 
aggressively discourage threading (more from the broader community than the 
core languages, but I see a lot of apps using these now), and also the higher 
reliability afforded by out-of-process tasks (that is, one crash doesn’t kill 
the entire app – e.g browser tabs).

Optimizing startup time is incredibly valuable, and having tried it a few times 
I believe that the import system (in essence, stat calls) is the biggest 
culprit. The tens of ms prior to the first user import can’t really go anywhere.

Cheers,
Steve

Top-posted from my Windows phone

From: Alex Walters
Sent: Saturday, July 22, 2017 1:39
Cc: 'Python-Dev'
Subject: Re: [Python-Dev] Python startup time

> -Original Message-
> From: Python-Dev [mailto:python-dev-bounces+tritium-
> list=sdamon@python.org] On Behalf Of Paul Moore
> Sent: Saturday, July 22, 2017 4:14 AM
> To: David Mertz <me...@gnosis.cx>
> Cc: Barry Warsaw <ba...@python.org>; Python-Dev  d...@python.org>
> Subject: Re: [Python-Dev] Python startup time


> It's a bit of a chicken and egg problem - Windows users avoid
> excessive command line program invocation because startup time is
> high, so no-one optimises startup time because Windows users don't use
> short-lived command line programs. But I'm seeing a trend away from
> that - more and more Windows tools these days seem to be comfortable
> spawning subprocesses. I don't know what prompted that trend.

The programs I see that are comfortable spawning processes willy-nilly on
windows are mostly .net, which has a lot of the runtime assemblies cached by
the OS in the GAC - if you are spawning a second processes of yourself, or
something that uses the same libraries as you, the compile step on those can
be skipped.  Unless you are talking about python/non-.NET programs, in which
case, I have no answer.
 
> Paul
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-
> list%40sdamon.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/steve.dower%40python.org

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-22 Thread Alex Walters
> -Original Message-
> From: Python-Dev [mailto:python-dev-bounces+tritium-
> list=sdamon@python.org] On Behalf Of Paul Moore
> Sent: Saturday, July 22, 2017 4:14 AM
> To: David Mertz <me...@gnosis.cx>
> Cc: Barry Warsaw <ba...@python.org>; Python-Dev  d...@python.org>
> Subject: Re: [Python-Dev] Python startup time


> It's a bit of a chicken and egg problem - Windows users avoid
> excessive command line program invocation because startup time is
> high, so no-one optimises startup time because Windows users don't use
> short-lived command line programs. But I'm seeing a trend away from
> that - more and more Windows tools these days seem to be comfortable
> spawning subprocesses. I don't know what prompted that trend.

The programs I see that are comfortable spawning processes willy-nilly on
windows are mostly .net, which has a lot of the runtime assemblies cached by
the OS in the GAC - if you are spawning a second processes of yourself, or
something that uses the same libraries as you, the compile step on those can
be skipped.  Unless you are talking about python/non-.NET programs, in which
case, I have no answer.
 
> Paul
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-
> list%40sdamon.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-22 Thread Paul Moore
On 21 July 2017 at 23:53, David Mertz  wrote:
> I would guess that Windows users don't tend to run lots of command line
> tools where startup time dominates, as *nix users do.

Well, in the sense that many Windows users don't use the command line
at all, this is true. However, startup time is a definite problem for
Windows users who *do* use the command line, because process creation
cost is a lot higher than on Unix, so starting new commands is
*already* costly, and therefore minimising additional overhead is
crucial.

It's a bit of a chicken and egg problem - Windows users avoid
excessive command line program invocation because startup time is
high, so no-one optimises startup time because Windows users don't use
short-lived command line programs. But I'm seeing a trend away from
that - more and more Windows tools these days seem to be comfortable
spawning subprocesses. I don't know what prompted that trend.

Paul
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread David Mertz
I would guess that Windows users don't tend to run lots of command line
tools where startup time dominates, as *nix users do.

On Fri, Jul 21, 2017 at 3:21 PM, Barry Warsaw  wrote:

> On Jul 21, 2017, at 01:25 PM, Nikolaus Rath wrote:
>
> >That is what Emacs does, and it causes them a lot of trouble. They're
> >trying to move away from it at the moment, but the direction is not yet
> >clear. The keyword is "unexec", and it wrecks havoc with malloc.
>
> Emacs has been unexec'ing for as long as I can remember (which is longer
> than
> I can remember Python :).  I know that it's been problematic and there have
> been many efforts over the years to replace it, but I think it's been a
> fairly
> successful technique in practice, at least on platforms that support it.
> That's another problem with the approach of course; it's not universally
> possible to implement.
>
> -Barry
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> mertz%40gnosis.cx
>



-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread Skip Montanaro
Emacs has been unexec'ing for as long as I can remember (which is longer
than
I can remember Python :).  I know that it's been problematic and there have
been many efforts over the years to replace it, but I think it's been a
fairly
successful technique in practice, at least on platforms that support it.


I've been using Emacs far longer than Python. I remember having to invoke
temacs on something. Still, if I didn't know better, I could be convinced
you were referring to the GIL. :-)

Skip
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread Barry Warsaw
On Jul 21, 2017, at 01:25 PM, Nikolaus Rath wrote:

>That is what Emacs does, and it causes them a lot of trouble. They're
>trying to move away from it at the moment, but the direction is not yet
>clear. The keyword is "unexec", and it wrecks havoc with malloc.

Emacs has been unexec'ing for as long as I can remember (which is longer than
I can remember Python :).  I know that it's been problematic and there have
been many efforts over the years to replace it, but I think it's been a fairly
successful technique in practice, at least on platforms that support it.
That's another problem with the approach of course; it's not universally
possible to implement.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread Nikolaus Rath
On Jul 21 2017, David Mertz  wrote:
> How implausible is it to write out the actual memory image of a loaded
> Python process?

That is what Emacs does, and it causes them a lot of trouble. They're
trying to move away from it at the moment, but the direction is not yet
clear. The keyword is "unexec", and it wrecks havoc with malloc.

Best,
-Nikolaus
-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread INADA Naoki
On Fri, Jul 21, 2017 at 4:12 PM, David Mertz  wrote:
> How implausible is it to write out the actual memory image of a loaded
> Python process? I.e. on a specific machine, OS, Python version, etc? This
> can only be overhead initially, of course, but on subsequent runs it's just
> one memory map, which the cheapest possible operation.

FYI, you may be interested in very recent node.js security issue.
https://nodejs.org/en/blog/vulnerability/july-2017-security-releases/#node-js-specific-security-flaws
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread Antoine Pitrou
On Fri, 21 Jul 2017 00:12:20 -0700
David Mertz  wrote:
> How implausible is it to write out the actual memory image of a loaded
> Python process? I.e. on a specific machine, OS, Python version, etc? This
> can only be overhead initially, of course, but on subsequent runs it's just
> one memory map, which the cheapest possible operation.

You can't rely on the file being remapped at the same address when you
reload it.  So you'd have to write a relocation routine that's able to
find and fix *all* pointers inside the Python object tree and CPython's
internal structures (fixing the pointers is not necessarily difficult,
finding them without missing any is the difficult part).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-21 Thread David Mertz
How implausible is it to write out the actual memory image of a loaded
Python process? I.e. on a specific machine, OS, Python version, etc? This
can only be overhead initially, of course, but on subsequent runs it's just
one memory map, which the cheapest possible operation.

E.g.

$ python3.7 --write-image "import typing, re, os, numpy"

I imagine this creating a file like:

/tmp/__python__/python37-typing-re-os-numpy.mem

Then just terminating as if just that line had run, however long it takes
(but snapshotting before exit).

Then subsequent invocations would only restore the image to memory. Maybe:

$ pyrunner --load-image python37-typing-re-os-numpy myscript.py

The last line could be aliased of course. I suppose we'd need to check if
relevant file exists, and if not fall back to just ignoring the
'--load-image' flag and running plain old Python.

This helps not at all for something like AWS Lambda where each instance is
spun up fresh. But for the use-case of running many Python shell commands
at an interactive shell on one machine, it seems like that could be very
fast.

In my hypothetical I suppose pre-loading some collection of modules in the
image. Of course, the script may need to load others, and it may not use
some in the image. But users could decide their typical needed modules
themselves under this idea.

On Jul 20, 2017 11:27 PM, "Nick Coghlan"  wrote:

> On 21 July 2017 at 15:30, Cesare Di Mauro 
> wrote:
>
>>
>>
>> 2017-07-21 4:52 GMT+02:00 Nick Coghlan :
>>
>>> On 21 July 2017 at 12:44, Nick Coghlan  wrote:
>>> > We can separately measure the cost of unmarshalling the code object:
>>> >
>>> > $ python3 -m perf timeit -s "import typing; from marshal import loads;
>>> from
>>> > importlib.util import cache_from_source; cache =
>>> > cache_from_source(typing.__file__); data = open(cache,
>>> 'rb').read()[12:]"
>>> > "loads(data)"
>>> > .
>>> > Mean +- std dev: 286 us +- 4 us
>>>
>>> Slight adjustment here, as the cost of locating the cached bytecode
>>> and reading it from disk should really be accounted for in each
>>> iteration:
>>>
>>> $ python3 -m perf timeit -s "import typing; from marshal import loads;
>>> from importlib.util import cache_from_source" "cache =
>>> cache_from_source(typing.__spec__.origin); data = open(cache,
>>> 'rb').read()[12:]; loads(data)"
>>> .
>>> Mean +- std dev: 337 us +- 8 us
>>>
>>> That will have a bigger impact when loading from spinning disk or a
>>> network drive, but it's fairly negligible when loading from a local
>>> SSD or an already primed filesystem cache.
>>>
>>> Cheers,
>>> Nick.
>>>
>>> --
>>> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>>>
>> Thanks for your tests, Nick. It's quite evident that the marshal code
>> cannot improve the situation, so I regret from my proposal.
>>
>
> It was still a good suggestion, since it made me realise I *hadn't*
> actually measured the relative timings lately, so it was technically an
> untested assumption that module level code execution still dominated the
> overall import time.
>
> typing is also a particularly large & complex module, and bytecode
> unmarshalling represents a larger fraction of the import time for simpler
> modules like abc:
>
> $ python3 -m perf timeit -s "import abc; from marshal import loads; from
> importlib.util import cache_from_source" "cache =
> cache_from_source(abc.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .
> Mean +- std dev: 45.2 us +- 1.1 us
>
> $ python3 -m perf timeit -s "import abc; loader_exec =
> abc.__spec__.loader.exec_module" "loader_exec(abc)"
> .
> Mean +- std dev: 172 us +- 5 us
>
> $ python3 -m perf timeit -s "import abc; from importlib import reload"
> "reload(abc)"
> .
> Mean +- std dev: 280 us +- 14 us
>
> And _weakrefset:
>
> $ python3 -m perf timeit -s "import _weakrefset; from marshal import
> loads; from importlib.util import cache_from_source" "cache =
> cache_from_source(_weakrefset.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .
> Mean +- std dev: 57.7 us +- 1.3 us
>
> $ python3 -m perf timeit -s "import _weakrefset; loader_exec =
> _weakrefset.__spec__.loader.exec_module" "loader_exec(_weakrefset)"
> .
> Mean +- std dev: 129 us +- 6 us
>
> $ python3 -m perf timeit -s "import _weakrefset; from importlib import
> reload" "reload(_weakrefset)"
> .
> Mean +- std dev: 226 us +- 4 us
>
> The conclusion still holds (the absolute numbers here are likely still too
> small for the extra complexity of parallelising bytecode loading to pay off
> in any significant way), but it also helps us set reasonable expectations
> around how much of a gain we're likely to be able to get just from
> precompilation with Cython.
>
> That does actually raise a 

Re: [Python-Dev] Python startup time

2017-07-21 Thread Nick Coghlan
On 21 July 2017 at 15:30, Cesare Di Mauro  wrote:

>
>
> 2017-07-21 4:52 GMT+02:00 Nick Coghlan :
>
>> On 21 July 2017 at 12:44, Nick Coghlan  wrote:
>> > We can separately measure the cost of unmarshalling the code object:
>> >
>> > $ python3 -m perf timeit -s "import typing; from marshal import loads;
>> from
>> > importlib.util import cache_from_source; cache =
>> > cache_from_source(typing.__file__); data = open(cache,
>> 'rb').read()[12:]"
>> > "loads(data)"
>> > .
>> > Mean +- std dev: 286 us +- 4 us
>>
>> Slight adjustment here, as the cost of locating the cached bytecode
>> and reading it from disk should really be accounted for in each
>> iteration:
>>
>> $ python3 -m perf timeit -s "import typing; from marshal import loads;
>> from importlib.util import cache_from_source" "cache =
>> cache_from_source(typing.__spec__.origin); data = open(cache,
>> 'rb').read()[12:]; loads(data)"
>> .
>> Mean +- std dev: 337 us +- 8 us
>>
>> That will have a bigger impact when loading from spinning disk or a
>> network drive, but it's fairly negligible when loading from a local
>> SSD or an already primed filesystem cache.
>>
>> Cheers,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>>
> Thanks for your tests, Nick. It's quite evident that the marshal code
> cannot improve the situation, so I regret from my proposal.
>

It was still a good suggestion, since it made me realise I *hadn't*
actually measured the relative timings lately, so it was technically an
untested assumption that module level code execution still dominated the
overall import time.

typing is also a particularly large & complex module, and bytecode
unmarshalling represents a larger fraction of the import time for simpler
modules like abc:

$ python3 -m perf timeit -s "import abc; from marshal import loads; from
importlib.util import cache_from_source" "cache =
cache_from_source(abc.__spec__.origin); data = open(cache,
'rb').read()[12:]; loads(data)"
.
Mean +- std dev: 45.2 us +- 1.1 us

$ python3 -m perf timeit -s "import abc; loader_exec =
abc.__spec__.loader.exec_module" "loader_exec(abc)"
.
Mean +- std dev: 172 us +- 5 us

$ python3 -m perf timeit -s "import abc; from importlib import reload"
"reload(abc)"
.
Mean +- std dev: 280 us +- 14 us

And _weakrefset:

$ python3 -m perf timeit -s "import _weakrefset; from marshal import loads;
from importlib.util import cache_from_source" "cache =
cache_from_source(_weakrefset.__spec__.origin); data = open(cache,
'rb').read()[12:]; loads(data)"
.
Mean +- std dev: 57.7 us +- 1.3 us

$ python3 -m perf timeit -s "import _weakrefset; loader_exec =
_weakrefset.__spec__.loader.exec_module" "loader_exec(_weakrefset)"
.
Mean +- std dev: 129 us +- 6 us

$ python3 -m perf timeit -s "import _weakrefset; from importlib import
reload" "reload(_weakrefset)"
.
Mean +- std dev: 226 us +- 4 us

The conclusion still holds (the absolute numbers here are likely still too
small for the extra complexity of parallelising bytecode loading to pay off
in any significant way), but it also helps us set reasonable expectations
around how much of a gain we're likely to be able to get just from
precompilation with Cython.

That does actually raise a small microbenchmarking problem: for source and
bytecode imports, we can force the import system to genuinely rerun the
module or unmarshal the bytecode inside a single Python process, allowing
perf to measure it independently of CPython startup. While I'm pretty sure
it's possible to trick the import machinery into rerunning module level
init functions even for old-style extension modules (hence allowing us to
run similar tests to those above for a Cython compiled module), I don't
actually remember how to do it off the top of my head.

Cheers,
Nick.

P.S. I'll also note that in these cases where the import overhead is
proportionally significant for always-imported modules, we may want to look
at the benefits of freezing them (if they otherwise remain as pure Python
modules), or compiling them as builtin modules (if we switch them over to
Cython), in addition to looking at ways to make the modules themselves
faster. Being built directly into the interpreter binary is pretty much the
best case scenario for reducing import overhead.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Cesare Di Mauro
2017-07-21 4:52 GMT+02:00 Nick Coghlan :

> On 21 July 2017 at 12:44, Nick Coghlan  wrote:
> > We can separately measure the cost of unmarshalling the code object:
> >
> > $ python3 -m perf timeit -s "import typing; from marshal import loads;
> from
> > importlib.util import cache_from_source; cache =
> > cache_from_source(typing.__file__); data = open(cache,
> 'rb').read()[12:]"
> > "loads(data)"
> > .
> > Mean +- std dev: 286 us +- 4 us
>
> Slight adjustment here, as the cost of locating the cached bytecode
> and reading it from disk should really be accounted for in each
> iteration:
>
> $ python3 -m perf timeit -s "import typing; from marshal import loads;
> from importlib.util import cache_from_source" "cache =
> cache_from_source(typing.__spec__.origin); data = open(cache,
> 'rb').read()[12:]; loads(data)"
> .
> Mean +- std dev: 337 us +- 8 us
>
> That will have a bigger impact when loading from spinning disk or a
> network drive, but it's fairly negligible when loading from a local
> SSD or an already primed filesystem cache.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
>
Thanks for your tests, Nick. It's quite evident that the marshal code
cannot improve the situation, so I regret from my proposal.

I took a look at the typing module, and there are some small things that
can be optimized, but it'll not change the overall situation unfortunately.

Code execution can be improved. :) However, it requires a massive amount of
time experimenting...

Bests,
Cesare


Mail
priva di virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Nick Coghlan
On 21 July 2017 at 12:44, Nick Coghlan  wrote:
> We can separately measure the cost of unmarshalling the code object:
>
> $ python3 -m perf timeit -s "import typing; from marshal import loads; from
> importlib.util import cache_from_source; cache =
> cache_from_source(typing.__file__); data = open(cache, 'rb').read()[12:]"
> "loads(data)"
> .
> Mean +- std dev: 286 us +- 4 us

Slight adjustment here, as the cost of locating the cached bytecode
and reading it from disk should really be accounted for in each
iteration:

$ python3 -m perf timeit -s "import typing; from marshal import loads;
from importlib.util import cache_from_source" "cache =
cache_from_source(typing.__spec__.origin); data = open(cache,
'rb').read()[12:]; loads(data)"
.
Mean +- std dev: 337 us +- 8 us

That will have a bigger impact when loading from spinning disk or a
network drive, but it's fairly negligible when loading from a local
SSD or an already primed filesystem cache.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Nick Coghlan
On 21 July 2017 at 05:38, Cesare Di Mauro  wrote:

>
>
> 2017-07-20 19:23 GMT+02:00 Victor Stinner :
>
>> 2017-07-20 19:09 GMT+02:00 Cesare Di Mauro :
>> > I assume that Python loads compiled (.pyc and/or .pyo) from the stdlib.
>> That's something that also influences the startup time (compiling source vs
>> loading pre-compiled modules).
>>
>> My benchmark was "python3 -m perf command -- python3 -c pass": I don't
>> explicitly remove .pyc files, I expect that Python uses prebuilt .pyc
>> files from __pycache__.
>>
>> Victor
>>
>
> OK, that should be the best case.
>
> An idea to improve the situation might be to find an alternative structure
> for .pyc/pyo files, which allows to (partially) "parallelize" their loading
> (not execution, of course), or at least speed-up the process. Maybe a GSoC
> project for some student, if no core dev has time to investigate it.
>

Unmarshalling the code object from disk generally isn't the slow part -
it's the module level execution that takes time.

Using the typing module as an example, a full reload cycle takes almost 10
milliseconds:

$ python3 -m perf timeit -s "import typing; from importlib import reload"
"reload(typing)"
.
Mean +- std dev: 9.89 ms +- 0.46 ms

(Don't try timing "import typing" directly - the sys.modules cache
amortises the cost down to being measured in nanoseconds, since you're
effectively just measuring the speed of a dict lookup)

We can separately measure the cost of unmarshalling the code object:

$ python3 -m perf timeit -s "import typing; from marshal import loads; from
importlib.util import cache_from_source; cache =
cache_from_source(typing.__file__); data = open(cache, 'rb').read()[12:]"
"loads(data)"
.
Mean +- std dev: 286 us +- 4 us

Finding the module spec:

$ python3 -m perf timeit -s "from importlib.util import find_spec"
"find_spec('typing')"
.
Mean +- std dev: 69.2 us +- 2.3 us

And actually running the module's code (this includes unmarshalling the
code object, but *not* calculating the import spec):

$ python3 -m perf timeit -s "import typing; loader_exec =
typing.__spec__.loader.exec_module" "loader_exec(typing)"
.
Mean +- std dev: 9.68 ms +- 0.43 ms

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Cesare Di Mauro
2017-07-20 19:23 GMT+02:00 Victor Stinner :

> 2017-07-20 19:09 GMT+02:00 Cesare Di Mauro :
> > I assume that Python loads compiled (.pyc and/or .pyo) from the stdlib.
> That's something that also influences the startup time (compiling source vs
> loading pre-compiled modules).
>
> My benchmark was "python3 -m perf command -- python3 -c pass": I don't
> explicitly remove .pyc files, I expect that Python uses prebuilt .pyc
> files from __pycache__.
>
> Victor
>

OK, that should be the best case.

An idea to improve the situation might be to find an alternative structure
for .pyc/pyo files, which allows to (partially) "parallelize" their loading
(not execution, of course), or at least speed-up the process. Maybe a GSoC
project for some student, if no core dev has time to investigate it.

Cesare


Mail
priva di virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Victor Stinner
2017-07-20 19:09 GMT+02:00 Cesare Di Mauro :
> I assume that Python loads compiled (.pyc and/or .pyo) from the stdlib. 
> That's something that also influences the startup time (compiling source vs 
> loading pre-compiled modules).

My benchmark was "python3 -m perf command -- python3 -c pass": I don't
explicitly remove .pyc files, I expect that Python uses prebuilt .pyc
files from __pycache__.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Cesare Di Mauro
2017-07-19 16:26 GMT+02:00 Victor Stinner :

> 2017-07-19 15:22 GMT+02:00 Oleg Broytman :
> > On Wed, Jul 19, 2017 at 02:59:52PM +0200, Victor Stinner <
> victor.stin...@gmail.com> wrote:
> >> "Python is very slow to start on Windows 7"
> >> https://stackoverflow.com/questions/29997274/python-is-
> very-slow-to-start-on-windows-7
> >
> >However hard you are going to optimize Python you cannot fix those
> > "defenders", "guards" and "protectors". :-) This particular link can be
> > excluded from consideration.
>
> Sorry, I didn't read carefully each link I posted. Even for me knowing
> what Python does at startup, it's hard to explain why 3 people have
> different timing: 15 ms, 75 ms and 300 ms for example. In my
> experience, the following things impact Python startup:
>
> * -S option: loading or not the site module
> * Paths in sys.path: PYTHONPATH environment variable for example
> * .pth files files in sys.path
> * Python running in a virtual environment or not
> * Operating system: Python loads different modules at startup
> depending on the OS. Naoki INADA just removed _osx_support from being
> imported in the site module on macOS for example.
>
> My list is likely incomplete.
>
> In the performance benchmark suite, a controlled virtual environment
> is created to have a known set of modules. FYI running Python is a
> virtual environment is slower than "system" python which runs outside
> a virtual environment...
>
> Victor
>
> Hi Victor,

I assume that Python loads compiled (.pyc and/or .pyo) from the stdlib.
That's something that also influences the startup time (compiling source vs
loading pre-compiled modules).

Bests,
Cesare


Mail
priva di virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Nick Coghlan
On 20 July 2017 at 23:32, Stefan Behnel  wrote:
> So, before considering to write an accelerator module in C that replaces
> some existing Python module, and thus duplicating its entire source code
> with highly increased complexity, I'd like to remind you that simply
> compiling the Python module itself to C should give at least reasonable
> speed-ups *without* adding to the maintenance burden, and can be done
> optionally as part of the build process. We do that for Cython itself
> during its installation, for example.

And if folks are concerned about the potential bootstrapping issues
with this approach, the gist is that it would have to look something
like this:

Phase 0: freeze importlib
- build a CPython with only builtin and frozen module support
- use it to freeze importlib

Phase 1: traditional CPython
- build the traditional Python interpreter with no Cython accelerated modules

Phase 2: accelerated CPython
- if not otherwise available, use the traditional Python interpreter
to download & install Cython in a virtual environment
- run Cython to selectively precompile key modules (such as those
implicitly imported at startup)

Technically, phase 2 doesn't actually *change* CPython itself, since
the import system is already setup such that if an extension module
and a source module are side-by-side in the same directory, then the
extension module will take precedence. As a result, precompiling with
Cython is similar in many ways to precompiling to bytecode, its just
that the result is native machine code with Python C API calls, rather
than CPython bytecode.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Stefan Behnel
Ivan Levkivskyi schrieb am 20.07.2017 um 13:24:
> I agree the start-up time is important. There is something that is related.
> ABCMeta is currently implemented in Python.
> This makes it slow, creation of an ABC is 2x slower than creation of a
> normal class.
> However, ABCs are used by many medium and large size projects.
> Also, both abc and _collections_abc are imported at start-up (in particular
> importlib uses several ABCs, os also needs them for environments).
> Finally, all generics in typing module and user-defined generic types are
> ABCs (to allow interoperability with collections.abc).
> 
> My idea is to re-implement ABCMeta (and ingredients it depends on, like
> WeakSet) in C.

I know that this hasn't really been an accepted option so far (and it's
actually not an option for a few really early modules during startup), but
compiling a Python module with Cython will usually speed it up quite
noticibly (often 10-30%, sometimes more if you're lucky, e.g. [1]). And
that also applies to the startup time, simply because it's pre-compiled.

So, before considering to write an accelerator module in C that replaces
some existing Python module, and thus duplicating its entire source code
with highly increased complexity, I'd like to remind you that simply
compiling the Python module itself to C should give at least reasonable
speed-ups *without* adding to the maintenance burden, and can be done
optionally as part of the build process. We do that for Cython itself
during its installation, for example.

Stefan (Cython core developer)


[1] 3x faster URL routing by compiling a single Django module with Cython:
https://us.pycon.org/2017/schedule/presentation/693/

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Antoine Pitrou
On Thu, 20 Jul 2017 21:29:18 +0900
INADA Naoki  wrote:
> 
> WeakSet should be cared specially.
> Maybe, ABCMeta can be optimized first.
> 
> Currently, ABCMeta use three WeakSets.  But it can be delayed until
> `register` or
> `issubclass` is called.
> So even if WeakSet is implemented in Python, I think ABCMeta can be much 
> faster.

Simple uses of WeakSet can probably be replaced with regular sets +
weakref callbacks.  As long as you are not doing one of the delicate
things (such as iterate), it should be fine.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread INADA Naoki
Hi, Ivan.

First of all, Yes, please do it!


On Thu, Jul 20, 2017 at 8:24 PM, Ivan Levkivskyi  wrote:
> I agree the start-up time is important. There is something that is related.
> ABCMeta is currently implemented in Python.
> This makes it slow, creation of an ABC is 2x slower than creation of a
> normal class.

Additionally, ABC infects by inheritance.
When people use mix-in provided by collections.abc, the class is ABC even if
it's concrete class.

There are no documented/recommended way to inherit from ABC class
but not use ABCMeta.


> However, ABCs are used by many medium and large size projects.

Many people having other language background uses ABC for Java's interface
or Abstract Class.

So it may worth enough to have just Abstract, but not ABC.
See https://mail.python.org/pipermail/python-ideas/2017-July/046495.html


> Also, both abc and _collections_abc are imported at start-up (in particular
> importlib uses several ABCs, os also needs them for environments).
> Finally, all generics in typing module and user-defined generic types are
> ABCs (to allow interoperability with collections.abc).
>

Yes.  Even if site.py doesn't use typing, many application and
libraries will start
using typing.
And it's much slower than collections.abc.


> My idea is to re-implement ABCMeta (and ingredients it depends on, like
> WeakSet) in C.
> I didn't find such proposal on b.p.o., I have two questions:
> * Are there some potential problems with this idea (except that it may take
> some time and effort)?

WeakSet should be cared specially.
Maybe, ABCMeta can be optimized first.

Currently, ABCMeta use three WeakSets.  But it can be delayed until
`register` or
`issubclass` is called.
So even if WeakSet is implemented in Python, I think ABCMeta can be much faster.

> * Is it something worth doing as an optimization?
> (If answers are no and yes, then maybe I would spend part of my vacation in
> August on it.)
>
> --
> Ivan
>
>

Bests,
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Ivan Levkivskyi
I agree the start-up time is important. There is something that is related.
ABCMeta is currently implemented in Python.
This makes it slow, creation of an ABC is 2x slower than creation of a
normal class.
However, ABCs are used by many medium and large size projects.
Also, both abc and _collections_abc are imported at start-up (in particular
importlib uses several ABCs, os also needs them for environments).
Finally, all generics in typing module and user-defined generic types are
ABCs (to allow interoperability with collections.abc).

My idea is to re-implement ABCMeta (and ingredients it depends on, like
WeakSet) in C.
I didn't find such proposal on b.p.o., I have two questions:
* Are there some potential problems with this idea (except that it may take
some time and effort)?
* Is it something worth doing as an optimization?
(If answers are no and yes, then maybe I would spend part of my vacation in
August on it.)

--
Ivan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Victor Stinner
Hi,

I applied the patch above to count the number of times that Python is
run. Running the Python test suite with "./python -m test -j0 -rW"
runs Python 2,256 times.

Honestly, I expected more. I'm running tests with Python compiled in
debug mode. And in debug mode, Python startup time is much worse:

haypo@selma$ python3 -m perf command --inherit=PYTHONPATH -v -- ./python -c pass
command: Mean +- std dev: 46.4 ms +- 2.3 ms

FYI I'm using gcc -O0 rather than -Og to make compilation even faster.

Victor

diff --git a/Lib/site.py b/Lib/site.py
index 7dc1b04..4b0c167 100644
--- a/Lib/site.py
+++ b/Lib/site.py
@@ -540,6 +540,21 @@ def execusercustomize():
 (err.__class__.__name__, err))


+def run_counter():
+import fcntl
+
+fd = os.open("/home/haypo/prog/python/master/run_counter",
+ os.O_WRONLY | os.O_CREAT | os.O_APPEND)
+try:
+fcntl.flock(fd, fcntl.LOCK_EX)
+try:
+os.write(fd, b'\x01')
+finally:
+fcntl.flock(fd, fcntl.LOCK_UN)
+finally:
+os.close(fd)
+
+
 def main():
 """Add standard site-specific directories to the module search path.

@@ -568,6 +583,7 @@ def main():
 execsitecustomize()
 if ENABLE_USER_SITE:
 execusercustomize()
+run_counter()

 # Prevent extending of sys.path when python was started with -S and
 # site is imported later.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-20 Thread Terry Reedy

On 7/19/2017 10:05 AM, Nick Coghlan wrote:

P.S. I'll also note that we're not *actually* limited to resolving
such conflicts in public venues (even though I think that's a good
default habit for us to retain): as long as we report the outcome of
any mutual agreements about design priorities back to the relevant
public venue (e.g. a tracker issue), there's nothing wrong with
shifting our attempts to better understand each other's perspectives
to private email, IRC, video chat, etc.


I expect and hope that there will be discussion of this issue at the 
core developer sprint in September, with summary reports back here on pydev.



It can even make sense to reach out to other
core devs for help, since it's almost always easier for someone not
caught in the midst of an argument to see both sides of it, and
potentially spot a core of agreement amidst various surface level
disagreements :)


I always understood the Python development process, both for core and 
users, to be "Make it right; then make it faster", with the second 
clause conditioned on 'while keeping it right' and maybe, and especially 
for core development 'if significantly slow'.  (People can rightly work 
on speed of personal code for other reasons.)  I believe we pretty much 
agree on the principles.  The disagreement seems to be on whether a 
particular case is 'significantly slow'.  I believe that the burden of 
proof is with those who propose a change.


The burden of the proof depends on the final qualification: 'without 
adding unnecessary or extreme complexity'.  If there is no added 
complication, the burden is slight.  If not, we will likely disagree 
about complexity and its tradeoff with speed.


About 'keeping it right':  It has been mentioned that more complicated 
code *generally* makes it harder to 'see' that the code is (basically) 
correct. The second line of defense is the automated test suite.  I 
think, for instance, that someone interested in changing namedtuple (to 
a faster and presumably more complicated implementation) should check 
the coverage of the current code, with branches checked both ways. 
Then, bring the coverage up to 100% if is not already, and carefully 
check the test for possible missing cases.


A small static set of test cases cannot cover everything.  The third 
test of an implementation is accumulated user experience.  A new 
implementation starts at 0.  One way to increase that is test the 
implementation with 3rd-part code.  Another, I think, is through 
randomized testing.


Proposal 1: Depending on our confidence in a new implementation, 
simulate user experience with randomized tests, perhaps running for 
hours.  Example: we develop a random (unicode) identifier generator that 
starts with any of the legal initial codepoints and continues with a 
random number of legal follow codepoints.  Then test (old) and new 
namedtuple with random class and a random number of random field names. 
A developer could also use third-party packages, like hypothesis.  Code 
and a summary could be uploaded to bpo.  A summary could even go in the 
code file.


Note 1: Tim Peters did something like this when developing timsort.  He 
provided a nice summary of test cases and time results.


Note 2: Randomized tests require that either a) randomized inputs are 
verified by property or predicate, rather than by hard-coded values, or 
b) inputs are generated from outputs, where either the output or inverse 
generation are randomized.  Tests of sorting can use either 
is_sorted(list(sorted(random_input))) or 
list(sorted(random_shuffle(output))) == output.


Proposal 2: Add randomized tests here and there in the test suite.  Each 
randomized test x 30 buildbots x 2 runs/day x 365 days/year is about 
22000 random inputs a year.  Since each buildbot would be running a 
slightly different test, we need to act on and not ignore sporadic 
failures.  Victor Stinner's buildbot work is making this feasible.


--
Terry Jan Reedy




--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Zero Piraeus
:

On 19 July 2017 at 21:19, Steven D'Aprano  wrote:
> But the premise is wrong too. Those hypothetical people don't turn their
> Macs on in sequence, each person turning their computer on only after
> the previous person's Mac had finished booting. They effectively boot
> them up in parallel but offset, spread out over a 24 hour period, so
> about 3472 people booting up at the same time each minute of the day.
> Time savings for parallel processes don't add in the way Jobs adds them,
> if we treat this as 1440 parallel processes (one per minute of the day)
> we save 1440 hours a year.

Ah, but the relevant unit here is person-hours, not hours: Jobs is
claiming that *each* Mac user loses X% of *their* life to boot times,
and then adds all those slices of life together into N lifetimes
(which again, are counted in person-years, not years).

It's still wrong, though: longer boot times actually increase the
proportion of your life spent in meaningful activity (e.g. going to
the canteen and talking to someone).

 -[]z.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Steven D'Aprano
On Wed, Jul 19, 2017 at 04:11:24PM -0700, Chris Barker wrote:
> As long as we are talking anecdotes:
> 
> If it could save a person’s life, could you find a way to save ten seconds
> off the boot time? If there were five million people using the Mac, and it
> took ten seconds extra to turn it on every day, that added up to three
> hundred million or so hours per year people would save, which was the
> equivalent of at least one hundred lifetimes saved per year.
> 
> Steve Jobs.

And about a fifth of the time they spent standing in lines waiting to 
buy the latest unnecessary iGadget... 

But seriously, that calculation is completely bogus. Not only is Steve 
Job's arithmetic *completely* wrong, but the whole premise is nonsense.

Do the maths yourself: ten seconds per day is 3650 seconds in a year, 
which is slightly over an hour (3600 seconds). Multiply by five million 
users, that's about five million hours, not 300 million. So Jobs 
exaggerates the time saved by a factor of sixty.

(Or maybe Jobs was warning that Macs crash sixty times a day...)

But the premise is wrong too. Those hypothetical people don't turn their 
Macs on in sequence, each person turning their computer on only after 
the previous person's Mac had finished booting. They effectively boot 
them up in parallel but offset, spread out over a 24 hour period, so 
about 3472 people booting up at the same time each minute of the day. 
Time savings for parallel processes don't add in the way Jobs adds them, 
if we treat this as 1440 parallel processes (one per minute of the day) 
we save 1440 hours a year.

But really, the only meaningful calculation is the each person saves 10 
seconds per day. We can't even meaningfully say they save one hour a 
year: it doesn't come nicely packaged up for you all at once, so you can 
actually do something useful with it, nor can you save those ten seconds 
from one day to the next. You only get one shot at using them. What can 
you do with ten seconds per day? By the time you decide what to do with 
the extra time, it's already gone.

There are good reasons for speeding up boot time, but this sort of 
calculation is not one of them. I think it is in particularly bad taste 
to exaggerate the significance of it by putting it in terms of saving 
lives. You want to save real lives? How about fixing the conditions in 
the sweatshops that make Apple phones? And installing suicide nets 
around the building doesn't count.



-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Chris Barker
As long as we are talking anecdotes:

If it could save a person’s life, could you find a way to save ten seconds
off the boot time? If there were five million people using the Mac, and it
took ten seconds extra to turn it on every day, that added up to three
hundred million or so hours per year people would save, which was the
equivalent of at least one hundred lifetimes saved per year.

Steve Jobs.

(http://stevejobsdailyquote.com/2014/03/26/boot-time/)
It really does depend on how/what users are using Python for. In general,
Python has been moving more and more toward a "systems development
language" from a "scripting language". Which may make us think "scripting"
issues like startup time don't matter -- but,. of course, they matter a lot
to those use cases.


-CHB




On Wed, Jul 19, 2017 at 1:35 PM, Antoine Pitrou  wrote:

> On Wed, 19 Jul 2017 15:26:47 -0400
> Ben Hoyt  wrote:
> > Yes, agreed that startup time matters for scripting. I was talking to
> > someone on the Google Cloud SDK (CLI) team recently, and they said
> startup
> > time is a big deal for them ... it's especially problematic for shell tab
> > completion helpers, because every time you press tab the shell has to
> load
> > your Python program to do the completion.
>
> And also, for the same reason, for shell prompt additions such as
> git-prompt.  Mercurial had to write a C client (chg) to make this
> usable.
>
> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> chris.barker%40noaa.gov
>



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Antoine Pitrou
On Wed, 19 Jul 2017 15:26:47 -0400
Ben Hoyt  wrote:
> Yes, agreed that startup time matters for scripting. I was talking to
> someone on the Google Cloud SDK (CLI) team recently, and they said startup
> time is a big deal for them ... it's especially problematic for shell tab
> completion helpers, because every time you press tab the shell has to load
> your Python program to do the completion.

And also, for the same reason, for shell prompt additions such as
git-prompt.  Mercurial had to write a C client (chg) to make this
usable.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Ben Hoyt
Yes, agreed that startup time matters for scripting. I was talking to
someone on the Google Cloud SDK (CLI) team recently, and they said startup
time is a big deal for them ... it's especially problematic for shell tab
completion helpers, because every time you press tab the shell has to load
your Python program to do the completion. Even a couple dozen milliseconds
is noticeable when you're typing quickly.

-Ben

On Wed, Jul 19, 2017 at 3:15 PM, Larry Hastings  wrote:

>
>
> On 07/19/2017 05:59 AM, Victor Stinner wrote:
>
> Mercurial startup time is already 45.8x slower than Git whereas tested
> Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
> developers, with a startup time 2x - 3x slower...
>
>
> When Matt Mackall spoke at the Python Language Summit some years back, I
> recall that he specifically complained about Python startup time.  He said
> Python 3 "didn't solve any problems for [them]"--they'd already solved
> their Unicode hygiene problems--and that Python's slow startup time was
> already a big problem for them.  Python 3 being *even slower* to start
> was absolutely one of the reasons why they didn't want to upgrade.
>
> You might think "what's a few milliseconds matter".  But if you run
> hundreds of commands in a shell script it adds up.  git's speed is one of
> the few bright spots in its UX, and hg's comparative slowness here is a
> palpable disadvantage.
>
>
> So please continue efforts for make Python startup even faster to beat
> all other programming languages, and finally convince Mercurial to
> upgrade ;-)
>
>
> I believe Mercurial is, finally, slowly porting to Python 3.
>
> https://www.mercurial-scm.org/wiki/Python3
>
> Nevertheless, I can't really be annoyed or upset at them moving slowly to
> adopt Python 3, as Matt's objections were entirely legitimate.
>
>
> Cheers,
>
>
> */arry*
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> benhoyt%40gmail.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Larry Hastings



On 07/19/2017 05:59 AM, Victor Stinner wrote:

Mercurial startup time is already 45.8x slower than Git whereas tested
Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
developers, with a startup time 2x - 3x slower...


When Matt Mackall spoke at the Python Language Summit some years back, I 
recall that he specifically complained about Python startup time.  He 
said Python 3 "didn't solve any problems for [them]"--they'd already 
solved their Unicode hygiene problems--and that Python's slow startup 
time was already a big problem for them. Python 3 being /even slower/ to 
start was absolutely one of the reasons why they didn't want to upgrade.


You might think "what's a few milliseconds matter".  But if you run 
hundreds of commands in a shell script it adds up.  git's speed is one 
of the few bright spots in its UX, and hg's comparative slowness here is 
a palpable disadvantage.




So please continue efforts for make Python startup even faster to beat
all other programming languages, and finally convince Mercurial to
upgrade ;-)


I believe Mercurial is, finally, slowly porting to Python 3.

   https://www.mercurial-scm.org/wiki/Python3

Nevertheless, I can't really be annoyed or upset at them moving slowly 
to adopt Python 3, as Matt's objections were entirely legitimate.



Cheers,


//arry/
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Victor Stinner
2017-07-19 15:22 GMT+02:00 Oleg Broytman :
> On Wed, Jul 19, 2017 at 02:59:52PM +0200, Victor Stinner 
>  wrote:
>> "Python is very slow to start on Windows 7"
>> https://stackoverflow.com/questions/29997274/python-is-very-slow-to-start-on-windows-7
>
>However hard you are going to optimize Python you cannot fix those
> "defenders", "guards" and "protectors". :-) This particular link can be
> excluded from consideration.

Sorry, I didn't read carefully each link I posted. Even for me knowing
what Python does at startup, it's hard to explain why 3 people have
different timing: 15 ms, 75 ms and 300 ms for example. In my
experience, the following things impact Python startup:

* -S option: loading or not the site module
* Paths in sys.path: PYTHONPATH environment variable for example
* .pth files files in sys.path
* Python running in a virtual environment or not
* Operating system: Python loads different modules at startup
depending on the OS. Naoki INADA just removed _osx_support from being
imported in the site module on macOS for example.

My list is likely incomplete.

In the performance benchmark suite, a controlled virtual environment
is created to have a known set of modules. FYI running Python is a
virtual environment is slower than "system" python which runs outside
a virtual environment...

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Nick Coghlan
On 19 July 2017 at 22:59, Victor Stinner  wrote:
> == CPython core developers don't care? no, they do care ==
>
> Christian Heimes, Naoki INADA, Serhiy Storchaka, Yury Selivanov, me
> (Victor Stinner) and other core developers made multiple changes last
> years to reduce the number of imports at startup, optimize impotlib,
> etc.

I actually also care myself, since interpreter startup time feeds
directly into cost of execution when running in environments like AWS
Lambda which charge by the "gigabyte second" (i.e. you allocate a
certain amount of RAM to a particular command, and then get charged
for that RAM for the amount of time it takes to run, as measured with
subsecond precision - if you exceed the limits of the free tier,
anything you 're losing to language runtime startup in such an
environment translates almost directly to higher costs).

In aggregate, shaving time off CPython startup saves *scary* amounts
of collective compute time around the world - even though most runtime
environments don't track that as closely in financial terms as Lambda
does, we're still nudging the power & cooling requirements of data
centers slightly higher than they would otherwise be. So even when the
per-invocation impact of a performance improvement is small, it's
worth keeping in mind that CPython gets invoked a *lot*, whether it's
to respond to a web request, run a test, run a build, deploy another
application, analyse some data, etc :)

However, I'm also of the view that module & API maintainers *do* have
the authority to set the design priorities for the parts of the
standard library that they're personally responsible for, and if we'd
like them to change their minds based on information we have that they
don't, then reopening enhancement requests that they already closed is
*not* the way to go about it (as while the issue tracker is an
excellent venue for figuring out the technical details of a change, or
deciding whether or not an RFE is a good idea given a common
understanding of the relevant design priorities, it's almost always a
*terrible* venue for resolving outright disagreements as to what the
most relevant design priorities actually are).

Rather, the best available way to publicly request reconsideration is
the way Antoine did when he escalated the namedtuple question to
python-dev: by explicitly acknowledging that there's a conflict in
design priorities between core developers, and asking for a collective
discussion (and potentially a determination from Guido) as to the
right way forward for the project as a whole.

Cheers,
Nick.

P.S. I'll also note that we're not *actually* limited to resolving
such conflicts in public venues (even though I think that's a good
default habit for us to retain): as long as we report the outcome of
any mutual agreements about design priorities back to the relevant
public venue (e.g. a tracker issue), there's nothing wrong with
shifting our attempts to better understand each other's perspectives
to private email, IRC, video chat, etc. A non-trivial number of
previously vociferous arguments have been resolved amicably once the
main parties involved have had a chance to discuss them in person at a
conference or sprint. It can even make sense to reach out to other
core devs for help, since it's almost always easier for someone not
caught in the midst of an argument to see both sides of it, and
potentially spot a core of agreement amidst various surface level
disagreements :)

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2017-07-19 Thread Oleg Broytman
On Wed, Jul 19, 2017 at 02:59:52PM +0200, Victor Stinner 
 wrote:
> "Python is very slow to start on Windows 7"
> https://stackoverflow.com/questions/29997274/python-is-very-slow-to-start-on-windows-7

   However hard you are going to optimize Python you cannot fix those
"defenders", "guards" and "protectors". :-) This particular link can be
excluded from consideration.

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/p...@phdru.name
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python startup time

2017-07-19 Thread Victor Stinner
 Hi,

On Twitter, Raymond Hettinger wrote:

   "The decision making process on Python-dev is an anti-pattern,
governed by anecdotal data and ambiguity over what problem is solved."

https://twitter.com/raymondh/status/887069454693158912

About "anecdotal data", I would like to discuss the Python startup time.


== Python 3.7 compared to 2.7 ==

First of all, on speed.python.org, we have:

* Python 2.7: 6.4 ms with site, 3.0 ms without site (-S)
* master (3.7): 14.5 ms with site, 8.4 ms without site (-S)

Python 3.7 startup time is 2.3x slower with site (default mode), or
2.8x slower without site (-S command line option).

(I will skip Python 3.4, 3.5 and 3.6 which are much worse than Python 3.7...)

So if an user complained about Python 2.7 startup time: be prepared
for a 2x - 3x more angry user when "forced" to upgrade to Python 3!


== Mercurial vs Git, Python vs C, startup time ==

Startup time matters a lot for Mercurial since Mercurial is compared
to Git. Git and Mercurial have similar features, but Git is written in
C whereas Mercurial is written in Python. Quick benchmark on the
speed.python.org server:

* hg version: 44.6 ms +- 0.2 ms
* git --version: 974 us +- 7 us

Mercurial startup time is already 45.8x slower than Git whereas tested
Mercurial runs on Python 2.7.12. Now try to sell Python 3 to Mercurial
developers, with a startup time 2x - 3x slower...

I tested Mecurial 3.7.3 and Git 2.7.4 on Ubuntu 16.04.1 using "python3
-m perf command -- ...".


== CPython core developers don't care? no, they do care ==

Christian Heimes, Naoki INADA, Serhiy Storchaka, Yury Selivanov, me
(Victor Stinner) and other core developers made multiple changes last
years to reduce the number of imports at startup, optimize impotlib,
etc.

IHMO all these core developers are well aware of the competition of
programming languages, and honesty Python startup time isn't "good".
So let's compare it to other programming languages similar to Python.


== PHP, Ruby, Perl ==

I measured the startup time of other programming languages which are
similar to Python, still on the speed.python.org server using "python3
-m perf command -- ...":

* perl -e ' ': 1.18 ms +- 0.01 ms
* php -r ' ': 8.57 ms +- 0.05 ms
* ruby -e ' ': 32.8 ms +- 0.1 ms

Wow, Perl is quite good! PHP seems as good as Python 2 (but Python 3
is worse). Ruby startup time seems less optimized than other
languages.

Tested versions:

* perl 5, version 22, subversion 1 (v5.22.1)
* PHP 7.0.18-0ubuntu0.16.04.1 (cli) ( NTS )
* ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]


== Quick Google search ==

I also searched for "python startup time" and "python slow startup
time" on Google and found many articles. Some examples:

"Reducing the Python startup time"
http://www.draketo.de/book/export/html/498
=>   "The python startup time always nagged me (17-30ms) and I just
searched again for a way to reduce it, when I found this: The
Python-Launcher caches GTK imports and forks new processes to reduce
the startup time of python GUI programs."


https://nelsonslog.wordpress.com/2013/04/08/python-startup-time/
=> "Wow, Python startup time is worse than I thought."


"How to speed up python starting up and/or reduce file search while
loading libraries?"
https://stackoverflow.com/questions/15474160/how-to-speed-up-python-starting-up-and-or-reduce-file-search-while-loading-libra
=> "The first time I log to the system and start one command it takes
6 seconds just to show a few line of help. If I immediately issue the
same command again it takes 0.1s. After a couple of minutes it gets
back to 6s. (proof of short-lived cache)"


"How does one optimise the startup of a Python script/program?"
https://www.quora.com/How-does-one-optimise-the-startup-of-a-Python-script-program
=> "I wrote a Python program that would be used very often (imagine
'cd' or 'ls') for very short runtimes, how would I make it start up as
fast as possible?"


"Python Interpreter Startup time"
https://bytes.com/topic/python/answers/34469-pyhton-interpreter-startup-time


"Python is very slow to start on Windows 7"
https://stackoverflow.com/questions/29997274/python-is-very-slow-to-start-on-windows-7
=> "Python takes 17 times longer to load on my Windows 7 machine than
Ubuntu 14.04 running on a VM"
=> "returns in 0.614s on Windows and 0.036s on Linux"


"How to make a fast command line tool in Python" (old article Python 2.5.2)
https://files.bemusement.org/talks/OSDC2008-FastPython/
=> "(...) some techniques Bazaar uses to start quickly, such as lazy imports."

--

So please continue efforts for make Python startup even faster to beat
all other programming languages, and finally convince Mercurial to
upgrade ;-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Christian Heimes
Am 10.10.2013 02:18, schrieb Eric Snow:
 On Wed, Oct 9, 2013 at 8:30 AM, Christian Heimes
 christ...@python.org wrote:
 The os module imports MutableMapping from collections.abc. That
 import adds collections, collections.abc and eight more modules.
 I'm not sure if we can do anything about it, though.
 
 Well, that depends on how much we want to eliminate those 10
 imports. :)  Both environ and environb could be turned into lazy
 wrappers around an _Environ-created-when-needed.  If we used a
 custom module type for os [1], then adding descriptors for the two
 attributes is a piece of cake.  As it is, with a little metaclass
 magic (or even with explicit wrapping of the various dunder
 methods), we could drop those 10 imports from startup.

We don't have to use a custom module type to get rid of these imports
(but I like to get my hands a piece of chocolate cake *g*). We can
either implement yet another mutable mapping class for the os module.
That would remove the dependency on collections.abc.

Or we have to juggle the modules a bit so we can get to MutableMapping
without the extra stuff from collections.__init__. The abc and
_weakset modules are already loaded by the io module. Only
collections.__init__ imports _collections, operator, keyword, heapq,
itertools and reprlib.

I implemented both as an experiment. A lean and mean MutableMapping
works but it involves some code duplication. Next I moved
collections.abc to its former place _abcoll and installed a new
collections.abc module as facade.

$ hg mv Lib/collections/abc.py Lib/_abcoll.py
$ echo from _abcoll import *  Lib/collections/abc.py
$ echo from _abcoll import __all__  Lib/collections/abc.py
$ sed -i s/collections\.abc/_abcoll/ Lib/os.py


With three additional patches I'm down 19 modules:

$ ./python -c import sys; print(len(sys.modules))
34
$ hg revert --all .
$ ./python -c import sys; print(len(sys.modules))
53

Christian
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread R. David Murray
On Thu, 10 Oct 2013 07:41:19 +0300, yoav glazner yoavglaz...@gmail.com wrote:
 I'm not sure Droping imports is the best way to go, since every python
 script/app will import common modules right on the start and it will still
 seem like the interpeter boot is slow.
 
 making modules load faster seems like a better approch

Making any of the infrastructure faster is good.  But I certainly have
plenty of CLI scripts that import only os and sys, so reducing the
number of modules imported will be a win for me.

(Now, granted, a lot of those scripts *ought* to import argparse,
which imports a bunch of stuff, but they don't ;)

--David
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Christian Heimes
Am 10.10.2013 06:41, schrieb yoav glazner:
 I'm not sure Droping imports is the best way to go, since every python
 script/app will import common modules right on the start and it will
 still seem like the interpeter boot is slow.
 
 making modules load faster seems like a better approch

Not every script uses the re or collections module. Especially short
running and simple Python programs suffer from import overuse. Every
imported module adds extra syscalls and IO, too. These are costly
operations, especially on slow or embedded devices.


Benchmark of 1000 times python -c ''

Python 3.4dev with all my experimental patches:

  Avg: 0.705161 - 0.443613: 1.59x faster

2.7 - 3.4dev:

  Avg: 0.316177 - 0.669330: 2.12x slower

2.7 - 3.4dev with all my patches:

  Avg: 0.314879 - 0.449556: 1.43x slower

http://pastebin.com/NFrpa7Jh

Ain't bad! The benchmarks were conducted on a fast 8 core machine with SSD.

Christian


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Dirkjan Ochtman
On Thu, Oct 10, 2013 at 2:25 PM, Christian Heimes christ...@python.org wrote:
 Benchmark of 1000 times python -c ''

 Python 3.4dev with all my experimental patches:

   Avg: 0.705161 - 0.443613: 1.59x faster

 2.7 - 3.4dev:

   Avg: 0.316177 - 0.669330: 2.12x slower

 2.7 - 3.4dev with all my patches:

   Avg: 0.314879 - 0.449556: 1.43x slower

 http://pastebin.com/NFrpa7Jh

 Ain't bad! The benchmarks were conducted on a fast 8 core machine with SSD.

This seems promising. What OS are you using? On an older Linux server
with old-style HD's, the difference between 2.7 and 3.2 is much larger
for me:

Avg: 0.0312 - 0.1422: 4.56x slower

(In this case, I think it might be more useful to report as 0.11s
faster, though.)

Cheers,

Dirkjan
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Antoine Pitrou
Le Thu, 10 Oct 2013 14:36:26 +0200,
Dirkjan Ochtman dirk...@ochtman.nl a écrit :

 On Thu, Oct 10, 2013 at 2:25 PM, Christian Heimes
 christ...@python.org wrote:
  Benchmark of 1000 times python -c ''
 
  Python 3.4dev with all my experimental patches:
 
Avg: 0.705161 - 0.443613: 1.59x faster
 
  2.7 - 3.4dev:
 
Avg: 0.316177 - 0.669330: 2.12x slower
 
  2.7 - 3.4dev with all my patches:
 
Avg: 0.314879 - 0.449556: 1.43x slower
 
  http://pastebin.com/NFrpa7Jh
 
  Ain't bad! The benchmarks were conducted on a fast 8 core machine
  with SSD.
 
 This seems promising. What OS are you using? On an older Linux server
 with old-style HD's, the difference between 2.7 and 3.2 is much larger
 for me:

3.2 isn't the same as 3.4.

Thanks Christian for doing this, this is promising!

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Victor Stinner
Hi,

In an old issue, I proposed a change to not load the sysconfig module
when it's not needed. Nobody reviewed the patch, the issue was closed.

http://bugs.python.org/issue14057

When -s or -I option is used, we may skip completly the sysconfig
module. (It's already the case when -S is used.)

By the way, we should probably remove the site module. It's a pain for
the startup time :-)

Victor


2013/10/10 Christian Heimes christ...@python.org:
 Am 10.10.2013 02:18, schrieb Eric Snow:
 On Wed, Oct 9, 2013 at 8:30 AM, Christian Heimes
 christ...@python.org wrote:
 The os module imports MutableMapping from collections.abc. That
 import adds collections, collections.abc and eight more modules.
 I'm not sure if we can do anything about it, though.

 Well, that depends on how much we want to eliminate those 10
 imports. :)  Both environ and environb could be turned into lazy
 wrappers around an _Environ-created-when-needed.  If we used a
 custom module type for os [1], then adding descriptors for the two
 attributes is a piece of cake.  As it is, with a little metaclass
 magic (or even with explicit wrapping of the various dunder
 methods), we could drop those 10 imports from startup.

 We don't have to use a custom module type to get rid of these imports
 (but I like to get my hands a piece of chocolate cake *g*). We can
 either implement yet another mutable mapping class for the os module.
 That would remove the dependency on collections.abc.

 Or we have to juggle the modules a bit so we can get to MutableMapping
 without the extra stuff from collections.__init__. The abc and
 _weakset modules are already loaded by the io module. Only
 collections.__init__ imports _collections, operator, keyword, heapq,
 itertools and reprlib.

 I implemented both as an experiment. A lean and mean MutableMapping
 works but it involves some code duplication. Next I moved
 collections.abc to its former place _abcoll and installed a new
 collections.abc module as facade.

 $ hg mv Lib/collections/abc.py Lib/_abcoll.py
 $ echo from _abcoll import *  Lib/collections/abc.py
 $ echo from _abcoll import __all__  Lib/collections/abc.py
 $ sed -i s/collections\.abc/_abcoll/ Lib/os.py


 With three additional patches I'm down 19 modules:

 $ ./python -c import sys; print(len(sys.modules))
 34
 $ hg revert --all .
 $ ./python -c import sys; print(len(sys.modules))
 53

 Christian
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 https://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Benjamin Peterson
2013/10/9 Antoine Pitrou solip...@pitrou.net:
 Le Wed, 9 Oct 2013 10:29:30 +0200,
 Antoine Pitrou solip...@pitrou.net a écrit :
 Le Tue, 8 Oct 2013 15:43:40 -0400,
 Benjamin Peterson benja...@python.org a écrit :

  2013/10/8 R. David Murray rdmur...@bitdance.com:
   In this context, if we'd been *really* smart-lazy in CPython
   development, we'd have kept the memory and startup-time
   and...well, we probably do pretty well on CPU actually...smaller,
   so that when smartphones came along Python would have been the
   first high level language used on them, because it fit.  Then
   we'd all be able to be *much* lazier now :)
 
  Even on desktop, startup time leaves a lot to be desired.

 That's true. Anyone have any ideas to improve it?

 It's difficult to identify significant contributors but some possible
 factors:
 - marshal.loads() has become twice slower in 3.x (compared to 2.7)
 - instantiating a class is slow (type('foo', (), {}) takes around 25ms
   here)

Do you mean microsecond?

 $ ./python -m timeit type('foo', (), {})
1 loops, best of 3: 25.9 usec per loop


-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Antoine Pitrou
Le Thu, 10 Oct 2013 10:26:25 -0400,
Benjamin Peterson benja...@python.org a écrit :
 2013/10/9 Antoine Pitrou solip...@pitrou.net:
  Le Wed, 9 Oct 2013 10:29:30 +0200,
  Antoine Pitrou solip...@pitrou.net a écrit :
  Le Tue, 8 Oct 2013 15:43:40 -0400,
  Benjamin Peterson benja...@python.org a écrit :
 
   2013/10/8 R. David Murray rdmur...@bitdance.com:
In this context, if we'd been *really* smart-lazy in CPython
development, we'd have kept the memory and startup-time
and...well, we probably do pretty well on CPU
actually...smaller, so that when smartphones came along Python
would have been the first high level language used on them,
because it fit.  Then we'd all be able to be *much* lazier
now :)
  
   Even on desktop, startup time leaves a lot to be desired.
 
  That's true. Anyone have any ideas to improve it?
 
  It's difficult to identify significant contributors but some
  possible factors:
  - marshal.loads() has become twice slower in 3.x (compared to 2.7)
  - instantiating a class is slow (type('foo', (), {}) takes around
  25ms here)
 
 Do you mean microsecond?
 
  $ ./python -m timeit type('foo', (), {})
 1 loops, best of 3: 25.9 usec per loop

Yes, I meant that.

cheers

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Serhiy Storchaka

10.10.13 15:25, Christian Heimes написав(ла):

Benchmark of 1000 times python -c ''


What about python -S -c ''?


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Brett Cannon
On Thu, Oct 10, 2013 at 9:38 AM, Antoine Pitrou solip...@pitrou.net wrote:

 Le Thu, 10 Oct 2013 14:36:26 +0200,
 Dirkjan Ochtman dirk...@ochtman.nl a écrit :

  On Thu, Oct 10, 2013 at 2:25 PM, Christian Heimes
  christ...@python.org wrote:
   Benchmark of 1000 times python -c ''
  
   Python 3.4dev with all my experimental patches:
  
 Avg: 0.705161 - 0.443613: 1.59x faster
  
   2.7 - 3.4dev:
  
 Avg: 0.316177 - 0.669330: 2.12x slower
  
   2.7 - 3.4dev with all my patches:
  
 Avg: 0.314879 - 0.449556: 1.43x slower
  
   http://pastebin.com/NFrpa7Jh
  
   Ain't bad! The benchmarks were conducted on a fast 8 core machine
   with SSD.
 
  This seems promising. What OS are you using? On an older Linux server
  with old-style HD's, the difference between 2.7 and 3.2 is much larger
  for me:

 3.2 isn't the same as 3.4.


And I think that is a key point as imports sped up a good deal in Python
3.3 thanks to the stat caching. So if you want to compare 3.4 to 3.3 that
makes sense. And if you want to compare 2.7 to 3.4 as a selling point as
startup is the worst benchmark performer in that comparison then fine. But
otherwise leave 3.0 - 3.2 out of the discussion as they are red herrings.

And as to the suggestion of speeding up import itself: good luck with that
without changing semantics. =)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python startup time

2013-10-10 Thread Eric Snow
On Thu, Oct 10, 2013 at 4:18 AM, Christian Heimes christ...@python.org wrote:
 We don't have to use a custom module type to get rid of these imports
 (but I like to get my hands a piece of chocolate cake *g*). We can
 either implement yet another mutable mapping class for the os module.
 That would remove the dependency on collections.abc.

 Or we have to juggle the modules a bit so we can get to MutableMapping
 without the extra stuff from collections.__init__. The abc and
 _weakset modules are already loaded by the io module. Only
 collections.__init__ imports _collections, operator, keyword, heapq,
 itertools and reprlib.

I've created a ticket for the os/collections issue:
http://bugs.python.org/issue19218.  I also put a patch on there that
takes a metaclass approach.  It removes the need to fiddle with the
collections package or to implement the other MutableMapping methods
(along with the view classes) on _Environ.

-eric
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Python startup time

2013-10-09 Thread Antoine Pitrou
Le Wed, 9 Oct 2013 10:29:30 +0200,
Antoine Pitrou solip...@pitrou.net a écrit :
 Le Tue, 8 Oct 2013 15:43:40 -0400,
 Benjamin Peterson benja...@python.org a écrit :
 
  2013/10/8 R. David Murray rdmur...@bitdance.com:
   In this context, if we'd been *really* smart-lazy in CPython
   development, we'd have kept the memory and startup-time
   and...well, we probably do pretty well on CPU actually...smaller,
   so that when smartphones came along Python would have been the
   first high level language used on them, because it fit.  Then
   we'd all be able to be *much* lazier now :)
  
  Even on desktop, startup time leaves a lot to be desired.
 
 That's true. Anyone have any ideas to improve it?

It's difficult to identify significant contributors but some possible
factors:
- marshal.loads() has become twice slower in 3.x (compared to 2.7)
- instantiating a class is slow (type('foo', (), {}) takes around 25ms
  here)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


  1   2   >