[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-15 Thread Irit Katriel


Change by Irit Katriel :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-15 Thread Irit Katriel


Irit Katriel  added the comment:


New changeset 40d2ac92f9a28a486156dafdbb613016bb1f6b98 by Irit Katriel in 
branch 'main':
bpo-45152: refactor the dis module to make handling of hasconst opcodes more 
generic (GH-28258)
https://github.com/python/cpython/commit/40d2ac92f9a28a486156dafdbb613016bb1f6b98


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-14 Thread Mark Shannon

Mark Shannon  added the comment:


New changeset c2f1e953371c25f6c42b599ba3d8797effbb503e by Irit Katriel in 
branch 'main':
bpo-45152: Add HAS_CONST macro and get_const_value() function and use… (#28262)
https://github.com/python/cpython/commit/c2f1e953371c25f6c42b599ba3d8797effbb503e


--
nosy: +Mark.Shannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-10 Thread Brandt Bucher


Change by Brandt Bucher :


--
nosy: +brandtbucher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-10 Thread Guido van Rossum


Guido van Rossum  added the comment:

We're not (yet) trying to optimize the code being executed (we should perhaps 
add an issue about class creation to https://github.com/faster-cpython/ideas).

I agree that marshal is very fast already; I doubt the extra cost of wordcode 
would pay off since it would increase the size. This could be tested but it 
would be a big experiment.

I'm not sure that compression will really help, since a typical decompressor 
delivers its output in some memory buffer. But this needs to be tested, and it 
might be a simpler experiment.

I'm pretty sure zipimport will improve things -- the main problem with it (as 
always) is that not all tooling understands it, since resource loading is 
different.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-10 Thread Irit Katriel


Irit Katriel  added the comment:

> (b) the allocation and initialization of a large number of objects

Yes, when I profiled import GC showed up as 20-30%.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-10 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

>From my experience, the largest cost in importing module (after I/O) is for 
>creating classes, especially classes with complex creation code: enums and 
>dataclasses (and namedtuples in past, but they were significanly optimized 
>since).

For I/O, would using zipimport or any other container help? It should reduce 
the number of stats and opens and allow to use compression.

As for increasing performance of demarshallization, it is already very fast 
(all is read from memory buffers, all repeated objects are cached). Switching 
to "wordcode" (32- or 64- bit instructions) and aligning all objects at the 
boundary of 4-8 bytes can marginally increase decoding speed, but it needs 
testing. Also more optimal using of references (like pickletools.optimize()) 
can reduce unmarshalling time at the cost of increasing marshalling time and 
memory consumption. It can also help with deterministic marshalling.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Guido van Rossum


Guido van Rossum  added the comment:

Raymond, I am super grateful that you are showing us the assembly language. 
That's the kind of thing we need to do more of if we want to be successful. (In 
this case I'm not surprised by the outcome, but I have never studied x86 
assembly and this was a nice explanation.)

Regarding speeding up marshal, I am not at all convinced that this is still I/O 
bound. Raymond, do you have data about this? In my own informal experiments I 
believe the main cost I found was (a) stat() operations, and (b) the allocation 
and initialization of a large number of objects. In my (limited) understanding, 
"freezing" modules (like in bpo-45020) is mostly a win because it avoids stat() 
operations (and listdir(), too). But it would be nice to do a detailed 
experiment to get real facts here.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

Idea for improving unmarshalling speed:  Since reading PYCs is I/O bound, it 
may be worthwhile to compress them.  When the idea came up previously, Antoine 
suggested choosing an algorithm with the fastest possible decompression speed 
(iirc, it was lzma4 at the time).  This should be an easy optimization that 
doesn't require changing the rest of the runtime.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Irit Katriel

Irit Katriel  added the comment:

I’m not sure we’re hoping this to be faster, just reduce the code object size 
and not be slower. (Smaller code object could be faster to unmarshal).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Raymond Hettinger

Raymond Hettinger  added the comment:

Thanks for the link.  This is a worthwhile experiment.  However, the potential 
gains will be hard to come by.

The workload of LOAD_CONST is very small.  After paying for the usual dispatch 
logic overhead, all it does is an index into a struct member and an incref.  
Both the co_consts table and the popular constant objects are already likely to 
be in the L1 data cache.  


##DEBUG_LABEL: TARGET_LOAD_CONST
movslq  %r15d, %rax ## OpArg fetch, typically a zero code 
register rename   
movq-368(%rbp), %rcx## 8-byte Reload to access co_consts
movq24(%rcx,%rax,8), %rax   ## The actual indexing operation  (3 
cycles)
incq(%rax)  ## The incref  


A specialized opcode for a specific constant like None can 1) eliminate the 
oparg fetch (likely saving nothing), and 2) eliminate the two sequentially 
dependent memory access (this is a win):

##DEBUG_LABEL: TARGET_LOAD_NONE
  ​  movq   __Py_NoneStruct@GOTPCREL(%rip), rax
incq(%rax)  ## The incref


Any more general opcode for loading small ints would still need the oparg fetch 
and the incref.  To win, it would need to convert the oparg into an int more 
efficiently than the two movq steps.  If the small int table is in a fixed 
location (not per-subinterpreter), then you can save 2 cycles with the simpler 
address computation:

##DEBUG_LABEL: TARGET_SMALLINT
movslq  %r15d, %rax ## OpArg fetch, typically a zero code 
register rename 
movq__Py_SmallInt@GOTPCREL(%rip), %rcx## Find an array of 
ints
movq(%rcx,%rax,8), %rax ## Cheaper address computation takes 1 
cycle
incq(%rax)  ## The incref 

The 2 cycle win (intel-only) will be partially offset by the additional 
pressure on the L1 data cache.  Right now, the co_consts is almost certainly in 
cache, holding only the constants that actually get used (at a rate of 8 per 
cache line).  Accesses into a small_int array will push other data out of L1.

IIRC, Serhiy already experimented with a LOAD_NONE opcode and couldn't get a 
measurable win.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Irit Katriel


Irit Katriel  added the comment:

It's related to this work: https://github.com/faster-cpython/ideas/issues/65

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

I didn't know this was in the works.  Can you point me to the relevant 
discussion about why LOAD_CONST would be split?

--
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Irit Katriel


Change by Irit Katriel :


--
pull_requests: +26682
pull_request: https://github.com/python/cpython/pull/28262

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Irit Katriel


Change by Irit Katriel :


--
keywords: +patch
pull_requests: +26678
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/28258

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Irit Katriel


Change by Irit Katriel :


--
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue45152] Prepare for splitting LOAD_CONST into several opcodes

2021-09-09 Thread Irit Katriel


New submission from Irit Katriel :

This issue is to prepare the code for splitting LOAD_CONST to several opcodes.

There are a number of places in the code where an opcode is compared to 
LOAD_CONST (such as dis, the compiler, and the peephole optimizer). These need 
to be refactored to make the query "is this a hasconst opcode", and the value 
calculation needs to be refactored into a single place, which can later be 
updated to get the value from places other than co_consts.

--
assignee: iritkatriel
components: Interpreter Core, Library (Lib)
messages: 401485
nosy: iritkatriel
priority: normal
severity: normal
status: open
title: Prepare for splitting LOAD_CONST into several opcodes
type: enhancement
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com