[Python-Dev] More optimisation ideas

Steve Dower Fri, 29 Jan 2016 11:12:12 -0800

Since we're all talking about making Python faster, I thought I'd dropsome previous ideas I've had here in case (1) someone wants to actuallydo them, and (2) they really are new ideas that haven't failed in thepast. Mostly I was thinking about startup time.

Here are the list of modules imported on clean startup on my Windows,US-English machine (from -v and cleaned up a bit):


import _frozen_importlib
import _imp
import sys
import '_warnings'
import '_thread'
import '_weakref'
import '_frozen_importlib_external'
import '_io'
import 'marshal'
import 'nt'
import '_thread'
import '_weakref'
import 'winreg'
import 'zipimport'
import '_codecs'
import 'codecs'
import 'encodings.aliases'
import 'encodings'
import 'encodings.mbcs'
import '_signal'
import 'encodings.utf_8'
import 'encodings.latin_1'
import '_weakrefset'
import 'abc'
import 'io'
import 'encodings.cp437'
import 'errno'
import '_stat'
import 'stat'
import 'genericpath'
import 'ntpath'
import '_collections_abc'
import 'os'
import '_sitebuiltins'
import 'sysconfig'
import '_locale'
import '_bootlocale'
import 'encodings.cp1252'
import 'site'

Obviously the easiest first thing is to remove or delay unnecessaryimports. But a while ago I used a native profiler to trace through thisand the most impactful modules were the encodings:


import 'encodings.mbcs'
import 'encodings.utf_8'
import 'encodings.latin_1'
import 'encodings.cp437'
import 'encodings.cp1252'

While I don't doubt that we need all of these for *some* reason,aliases, cp437 and cp1252 are relatively expensive modules to import.Mostly due to having large static dictionaries or data structuresgenerated on startup.

Given this is static and mostly read-only information[1], I see noreason why we couldn't either generate completely static versions ofthem, or better yet compile the resulting data structures into the corebinary.

([1]: If being able to write to some of the encoding data is used bysome people, I vote for breaking that for 3.6 and making it read-only.)


This is probably the code snippet that bothered me the most:

    ### Encoding table
    encoding_table=codecs.charmap_build(decoding_table)

It shows up in many of the encodings modules, and while it is not a badfunction in itself, we are obviously generating a known data structureon every startup. Storing these in static data is a tradeoff betweendisk space and startup performance, and one I think it likely to beworthwhile.

Anyway, just an idea if someone wants to try it and see whatimprovements we can get. I'd love to do it myself, but when it actuallycomes to finding time I keep coming up short.


Cheers,
Steve

P.S. If you just want to discuss optimisation techniques or benchmarkingin general, without specific application to CPython 3.6, there's a wholeinternet out there. Please don't make me the cause of a pointlesscentithread. :)

_______________________________________________
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] More optimisation ideas

Reply via email to