Since we're all talking about making Python faster, I thought I'd drop some previous ideas I've had here in case (1) someone wants to actually do them, and (2) they really are new ideas that haven't failed in the past. Mostly I was thinking about startup time.

Here are the list of modules imported on clean startup on my Windows, US-English machine (from -v and cleaned up a bit):

import _frozen_importlib
import _imp
import sys
import '_warnings'
import '_thread'
import '_weakref'
import '_frozen_importlib_external'
import '_io'
import 'marshal'
import 'nt'
import '_thread'
import '_weakref'
import 'winreg'
import 'zipimport'
import '_codecs'
import 'codecs'
import 'encodings.aliases'
import 'encodings'
import 'encodings.mbcs'
import '_signal'
import 'encodings.utf_8'
import 'encodings.latin_1'
import '_weakrefset'
import 'abc'
import 'io'
import 'encodings.cp437'
import 'errno'
import '_stat'
import 'stat'
import 'genericpath'
import 'ntpath'
import '_collections_abc'
import 'os'
import '_sitebuiltins'
import 'sysconfig'
import '_locale'
import '_bootlocale'
import 'encodings.cp1252'
import 'site'

Obviously the easiest first thing is to remove or delay unnecessary imports. But a while ago I used a native profiler to trace through this and the most impactful modules were the encodings:

import 'encodings.mbcs'
import 'encodings.utf_8'
import 'encodings.latin_1'
import 'encodings.cp437'
import 'encodings.cp1252'

While I don't doubt that we need all of these for *some* reason, aliases, cp437 and cp1252 are relatively expensive modules to import. Mostly due to having large static dictionaries or data structures generated on startup.

Given this is static and mostly read-only information[1], I see no reason why we couldn't either generate completely static versions of them, or better yet compile the resulting data structures into the core binary.

([1]: If being able to write to some of the encoding data is used by some people, I vote for breaking that for 3.6 and making it read-only.)

This is probably the code snippet that bothered me the most:

    ### Encoding table
    encoding_table=codecs.charmap_build(decoding_table)

It shows up in many of the encodings modules, and while it is not a bad function in itself, we are obviously generating a known data structure on every startup. Storing these in static data is a tradeoff between disk space and startup performance, and one I think it likely to be worthwhile.

Anyway, just an idea if someone wants to try it and see what improvements we can get. I'd love to do it myself, but when it actually comes to finding time I keep coming up short.

Cheers,
Steve


P.S. If you just want to discuss optimisation techniques or benchmarking in general, without specific application to CPython 3.6, there's a whole internet out there. Please don't make me the cause of a pointless centithread. :)
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to