Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-05-15 Thread Cesare Di Mauro
2016-04-13 23:23 GMT+02:00 Victor Stinner :

> Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit
> parameter for function with annotation.
>

I never found 32-bit parameters, and not even 24-bit ones. I think that
their usage is as rare as all planets alignment. ;-)

That's why with in WPython I supported only 8, 16, and 32-bit parameters
(which are 6 bytes long).

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-05-15 Thread Cesare Di Mauro
2016-04-13 18:24 GMT+02:00 Victor Stinner :

> Demur Rumed proposes a different change to use a regular bytecode
> using 16-bit units: an instruction has always one 8-bit argument, it's
> zero if the instruction doesn't have an argument:
>
>http://bugs.python.org/issue26647
>
> According to benchmarks, it looks faster:
>
>   http://bugs.python.org/issue26647#msg263339
>
> IMHO it's a nice enhancement: it makes the code simpler. The most
> interesting change is made in Python/ceval.c:
>
> -if (HAS_ARG(opcode))
> -oparg = NEXTARG();
> +oparg = NEXTARG();
>
> This code is the very hot loop evaluating Python bytecode. I expect
> that removing a conditional branch here can reduce the CPU branch
> misprediction.
>

Correct. The old bytecode format wasn't so much predictable for the CPU.

>
> Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
> instructions. Later, we can discuss if it would be possible to ensure
> that the bytecode is always aligned to 16-bit in memory to fetch the
> two bytes using a uint16_t* pointer.
>
> Maybe we can overallocate 1 byte in codeobject.c and align manually
> the memory block if needed. Or ceval.c should maybe copy the code if
> it's not aligned?
>
> Raymond Hettinger proposes something like that, but it looks like
> there are concerns about non-aligned memory accesses:
>
>http://bugs.python.org/issue25823
>
> The cost of non-aligned memory accesses depends on the CPU
> architecture, but it can raise a SIGBUS on some arch (MIPS and
> SPARC?).
>
> Victor
>

It should not be a problem, since every PyObject is allocated with PyAlloc
(however I don't remember if it's the correct name) which AFAIK guarantees
a base 8 bytes alignment.

So, it's safe to use an unsigned int for keeping/referencing a word at the
time.

The only problem with such approach is related to the processor endianess,
but it can be solved with proper macros (like I did with WPython).

Regards,
Cesare
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-05-06 Thread Victor Stinner
Oh nice. Did you see my recent "bytecode" project?
http://bytecode.readthedocs.io/

Victor
Le 5 mai 2016 8:30 PM,  a écrit :

> Here is something I wrote because I was also unsatisfied with byteplay's
> API: https://github.com/zachariahreed/byteasm. Maybe it's useful in a
> discussion of "minimum viable" api for bytecode manipulation.
>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-05-05 Thread zreed
Here is something I wrote because I was also unsatisfied with byteplay's
API: https://github.com/zachariahreed/byteasm. Maybe it's useful in a
discussion of "minimum viable" api for bytecode manipulation.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-25 Thread Raymond Hettinger

> On Apr 24, 2016, at 2:31 PM, Victor Stinner  wrote:
> 
> 2016-04-24 23:16 GMT+02:00 Raymond Hettinger :
>>> On Apr 24, 2016, at 1:16 PM, Victor Stinner  
>>> wrote:
>>> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
>>> single 16-bit operation. It should be easy to implement it later, but
>>> I prefer to focus on changing the format of the bytecode.
>> 
>> Improving instruction decoding was the whole point and it was what 
>> kicked-off the work on the patch.  It is also where most of the performance 
>> improvement comes from and isn't the difficult part of the patch. The 
>> persnickety parts of the patch lay elsewhere, so there is really nothing to 
>> be gained gutting out our actual objective.
>> 
>> The OPs original patch had already gotten this part done and it ran fine for 
>> me.
> 
> Oh wait, my phrasing is unclear. I do want optimize the (opcode,
> oparg) fetch, I just suggested to split the patch in two parts, and
> first review carefully the first part.

Unless it is presenting a tough review challenge, we should do whatever we can 
to make it easier on the OP who seems to be working with very limited 
computational resources (I had to run the benchmarks for him because his setup 
lacked the requisite resources).  He's already put a lot of work into the patch 
which is pretty good shape when it arrived.  

The opcode/oparg fetch logic is mostly already isolated to the part of the 
patch that touches ceval.c.  I found that part to be relatively clean and 
clear.  The part that took the most time to go through was for peephole.c.

How about we let Yury and Serhiy take a pass at it as is.  And, if they would 
benefit from splitting the patch into parts, then perhaps one of us with better 
tooling can pitch in to the help the OP.


Raymond




___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-24 Thread Victor Stinner
2016-04-24 23:16 GMT+02:00 Raymond Hettinger :
>> On Apr 24, 2016, at 1:16 PM, Victor Stinner  wrote:
>> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
>> single 16-bit operation. It should be easy to implement it later, but
>> I prefer to focus on changing the format of the bytecode.
>
> Improving instruction decoding was the whole point and it was what kicked-off 
> the work on the patch.  It is also where most of the performance improvement 
> comes from and isn't the difficult part of the patch. The persnickety parts 
> of the patch lay elsewhere, so there is really nothing to be gained gutting 
> out our actual objective.
>
> The OPs original patch had already gotten this part done and it ran fine for 
> me.

Oh wait, my phrasing is unclear. I do want optimize the (opcode,
oparg) fetch, I just suggested to split the patch in two parts, and
first review carefully the first part.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-24 Thread Raymond Hettinger

> On Apr 24, 2016, at 1:16 PM, Victor Stinner  wrote:
> 
> I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
> single 16-bit operation. It should be easy to implement it later, but
> I prefer to focus on changing the format of the bytecode.

Improving instruction decoding was the whole point and it was what kicked-off 
the work on the patch.  It is also where most of the performance improvement 
comes from and isn't the difficult part of the patch. The persnickety parts of 
the patch lay elsewhere, so there is really nothing to be gained gutting out 
our actual objective.

The OPs original patch had already gotten this part done and it ran fine for me.


Raymond



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-24 Thread Victor Stinner
Hi Raymond,

2016-04-24 21:45 GMT+02:00 Raymond Hettinger :
> I think the word code patch should go in sooner rather than later.  Several 
> of us have been through the patch and it is in pretty good shape (some parts 
> still need work though).  The earlier this goes in, the more time we'll have 
> to shake out any unexpected secondary effects.

Yury Selivanov and Serhiy Storchaka told me that they will review
shortly the patch. I give them one more week and then I will push the
patch.

I agree that the patch is in a good shape. I reviewed first versions
of the change. I pushed some minor and obvious changes. I also asked
to revert unrelated changes.

I proposed to not try to optimize ceval.c to fetch (oparg, opval) in a
single 16-bit operation. It should be easy to implement it later, but
I prefer to focus on changing the format of the bytecode.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-24 Thread Raymond Hettinger

> On Apr 23, 2016, at 8:59 AM, Serhiy Storchaka  wrote:
> 
> I collected statistics for use opcodes with different arguments during 
> running CPython tests. Estimated size with using wordcode is 1.33 times less 
> than with using current bytecode.
> 
> [1] http://comments.gmane.org/gmane.comp.python.ideas/38293

I think the word code patch should go in sooner rather than later.  Several of 
us have been through the patch and it is in pretty good shape (some parts still 
need work though).  The earlier this goes in, the more time we'll have to shake 
out any unexpected secondary effects.

perfect-is-the-enemy-of-good-ly yours,


Raymond


P.S. The patch is smaller, more tractable, and in better shape than the C 
version of OrderedDict was when it went in.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-23 Thread Serhiy Storchaka

On 13.04.16 19:33, Guido van Rossum wrote:

Nice work. I think that for CPython, speed is much more important than
memory use for the code. Disk space is practically free for anything
smaller than a video. :-)


I collected statistics for use opcodes with different arguments during 
running CPython tests. Estimated size with using wordcode is 1.33 times 
less than with using current bytecode.


[1] http://comments.gmane.org/gmane.comp.python.ideas/38293

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-17 Thread Eric Fahlgren
Just on the off chance that it’s related, could it have something to do with 
the bug in findlabels?

 

http://bugs.python.org/issue26448

 

(I have high confidence that my patch fixes the problem, just haven’t gotten 
around to completing the tests.)

 

From: Demur Rumed [mailto:gunkm...@gmail.com] 
Sent: Saturday, April 16, 2016 17:05
To: python-dev@python.org
Subject: Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

 

 The outstanding bug with this patch right now is a regression in line numbers 
causing the test for http://bugs.python.org/issue9936 to fail. I've tried to 
debug it without success

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-16 Thread Demur Rumed
 The outstanding bug with this patch right now is a regression in line
numbers causing the test for http://bugs.python.org/issue9936 to fail. I've
tried to debug it without success
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-14 Thread Victor Stinner
Le jeudi 14 avril 2016, Nick Coghlan  a écrit :
>
> > IHMO it's not a big deal to update these projects for the future
> > Python 3.6. I can even help them to support the new bytecode format.
>
> We've also had previous discussions on adding a "minimum viable
> bytecode editing" API to the standard library, and updating these
> third party modules to support wordcode instead of bytecode could
> provide a good use-case-driven opportunity for defining that (i.e. it
> wouldn't be about providing an end user facing API directly, but
> rather about letting CPython take care of the bookkeeping details for
> things like lnotab and sorting out jump targets).

Yeah, I know well this discussion since it started with my PEP 511. I
wrote the bytecode as a tool for the discussion, to try to understand
better the use case. The main task was to design the API.

I first looked at byteplay and codetranformer projects, but I found
some issues in their design. Their API has some design issues. IMHO
their API is not the best to modify bytecode.

My goal is to support Bytecode.from_code(code).to_code()==code: store
enough information to be able to emit again exactly the same bytecode
(line numbers, exact argument value, etc.).

I started with a long email, but I decided to document differences in
bytecode documentation:
https://bytecode.readthedocs.org/en/latest/byteplay_codetransformer.html

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Nick Coghlan
On 14 April 2016 at 08:26, Victor Stinner  wrote:
> 2016-04-14 0:11 GMT+02:00 Ryan Gonzalez :
>> So code that depends on iterating through bytecode via HAS_ARG is going to
>> break...
>
> Sure. This change is backward incompatible for applications parsing
> bytecode in C or Python. That's why the patch also has to update the
> dis module.
>
> I don't see how you plan to keep the backwad compatibility, since the
> argument size changed from 2 bytes to 1 byte. You must update your
> code (written in C or Python or whatever).
>
> Hopefully, the dis was enhanced in Python 3.4: get_instructions() now
> gives nice Instructon objects rather than only pure text output.
>
> FYI I wrote my own library to decode and decode bytecode. It provides
> abstract bytecode objects to easily modify bytecode:
> https://bytecode.readthedocs.org/
>
> I suggest to use such library (or simply the dis module for simple
> needs) if you have to handle bytecode, rather than writing your own
> code.
>
> I know a few other projects which handle directly bytecode:
>
> * https://pypi.python.org/pypi/codetransformer
> * https://github.com/serprex/byteplay
> * https://pypi.python.org/pypi/coverage
>
> IHMO it's not a big deal to update these projects for the future
> Python 3.6. I can even help them to support the new bytecode format.

+1

We've also had previous discussions on adding a "minimum viable
bytecode editing" API to the standard library, and updating these
third party modules to support wordcode instead of bytecode could
provide a good use-case-driven opportunity for defining that (i.e. it
wouldn't be about providing an end user facing API directly, but
rather about letting CPython take care of the bookkeeping details for
things like lnotab and sorting out jump targets).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Yury Selivanov



On 2016-04-13 12:24 PM, Victor Stinner wrote:

Can someone please review the change?


+1 for the change.  I can take a look at the patch in a few days.

Yury
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Victor Stinner
2016-04-14 0:11 GMT+02:00 Ryan Gonzalez :
> So code that depends on iterating through bytecode via HAS_ARG is going to
> break...

Sure. This change is backward incompatible for applications parsing
bytecode in C or Python. That's why the patch also has to update the
dis module.

I don't see how you plan to keep the backwad compatibility, since the
argument size changed from 2 bytes to 1 byte. You must update your
code (written in C or Python or whatever).

Hopefully, the dis was enhanced in Python 3.4: get_instructions() now
gives nice Instructon objects rather than only pure text output.

FYI I wrote my own library to decode and decode bytecode. It provides
abstract bytecode objects to easily modify bytecode:
https://bytecode.readthedocs.org/

I suggest to use such library (or simply the dis module for simple
needs) if you have to handle bytecode, rather than writing your own
code.

I know a few other projects which handle directly bytecode:

* https://pypi.python.org/pypi/codetransformer
* https://github.com/serprex/byteplay
* https://pypi.python.org/pypi/coverage

IHMO it's not a big deal to update these projects for the future
Python 3.6. I can even help them to support the new bytecode format.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Ryan Gonzalez
So code that depends on iterating through bytecode via HAS_ARG is going to
break...

Darn it. :/

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something’s wrong.
http://kirbyfan64.github.io/
On Apr 13, 2016 4:44 PM, "Victor Stinner"  wrote:

> Le mercredi 13 avril 2016, Ryan Gonzalez  a écrit :
>
>> What is the value of HAS_ARG going to be now?
>>
>
> I asked Demur to keep HAS_ARG(). Not really for backward compatibility,
> but for the dis module: to keep a nice assembler. There are also debug
> traces in ceval.c which use it.
>
> For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0
> rather than reading the bytecode) for operators with no argument. Or maybe
> it's completly useless :-)
>
> Victor
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Victor Stinner
Le mercredi 13 avril 2016, Ryan Gonzalez  a écrit :

> What is the value of HAS_ARG going to be now?
>

I asked Demur to keep HAS_ARG(). Not really for backward compatibility, but
for the dis module: to keep a nice assembler. There are also debug traces
in ceval.c which use it.

For ceval.c, we might use HAS_ARG() to micro-optimize oparg=0 (hardcode 0
rather than reading the bytecode) for operators with no argument. Or maybe
it's completly useless :-)

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Eric Fahlgren
The EXTENDED_ARG is included in the multibyte ops, I treat it just like any
other operator.  Here's a snippet of my hacked-dis.dis output, which made
it clear to me that I could just count them as an "operator with word
operand."

Line 3000: x = x if x or not x and x is None else x
0001dc83 7c 00 00 LOAD_FAST   x
0001dc86 91 01 00 EXTENDED_ARG1
0001dc89 70 9f dc JUMP_IF_TRUE_OR_POP L1dc9f
0001dc8c 7c 00 00 LOAD_FAST   x
0001dc8f 0c   UNARY_NOT
0001dc90 91 01 00 EXTENDED_ARG1
0001dc93 6f 9f dc JUMP_IF_FALSE_OR_POPL1dc9f
0001dc96 7c 00 00 LOAD_FAST   x
0001dc99 74 01 00 LOAD_GLOBAL None
0001dc9c 6b 08 00 COMPARE_OP  'is'
  L1dc9f:
0001dc9f 91 01 00 EXTENDED_ARG1
0001dca2 72 ab dc POP_JUMP_IF_FALSE   L1dcab
0001dca5 7c 00 00 LOAD_FAST   x
0001dca8 6e 03 00 JUMP_FORWARDL1dcae (+3)
  L1dcab:
0001dcab 7c 00 00 LOAD_FAST   x
  L1dcae:
0001dcae 7d 00 00 STORE_FAST  x


On Wed, Apr 13, 2016 at 2:23 PM, Victor Stinner 
wrote:

> 2016-04-13 23:02 GMT+02:00 Eric Fahlgren :
> > Percentage of 1-byte args= 96.80%
>
> Yeah, I expected such high ratio. Good news that you confirm it.
>
>
> > Non-argument ops =53,719
> > One-byte args=   368,787
> > Multi-byte args  =12,191
>
> Again, only a very few arguments take multiple bytes. Good, the
> bytecode will be smaller.
>
> IMHO it's more a nice side effect than a real goal. The runtime
> performance matters more than the size of the bytecode, it's not like
> a bytecode take 4 MB. It's probably closer to 1 KB and so can probably
> benefit of the fatest CPU caches.
>
>
> > Just for the record, here's my arithmetic:
> > byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs
> > wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs
>
> If multiByteArgs means any size > 1 byte, the wordCodeSize formula is
> wrong:
>
> - no parameter: 2 bytes
> - 8-bit parameter: 2 bytes
> - 16-bit parameter: 4 bytes
> - 24-bit parameter: 6 bytes
> - 32-bit parameter: 8 bytes
>
> But you wrote that you didn't see EXTEND_ARG, so I guess that
> multibyte means 16-bit in your case, and so your formula is correct.
>
> Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit
> parameter for function with annotation.
>
>
> > (It is interesting to note that I have never encountered an EXTENDED_ARG
> operator in the wild, only in my own synthetic examples.)
>
> As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with
> annotations.
>
> Victor
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Ryan Gonzalez
What is the value of HAS_ARG going to be now?

--
Ryan
[ERROR]: Your autotools build scripts are 200 lines longer than your
program. Something’s wrong.
http://kirbyfan64.github.io/
On Apr 13, 2016 11:26 AM, "Victor Stinner"  wrote:

> Hi,
>
> In the middle of recent discussions about Python performance, it was
> discussed to change the Python bytecode. Serhiy proposed to reuse
> MicroPython short bytecode to reduce the disk space and reduce the
> memory footprint.
>
> Demur Rumed proposes a different change to use a regular bytecode
> using 16-bit units: an instruction has always one 8-bit argument, it's
> zero if the instruction doesn't have an argument:
>
>http://bugs.python.org/issue26647
>
> According to benchmarks, it looks faster:
>
>   http://bugs.python.org/issue26647#msg263339
>
> IMHO it's a nice enhancement: it makes the code simpler. The most
> interesting change is made in Python/ceval.c:
>
> -if (HAS_ARG(opcode))
> -oparg = NEXTARG();
> +oparg = NEXTARG();
>
> This code is the very hot loop evaluating Python bytecode. I expect
> that removing a conditional branch here can reduce the CPU branch
> misprediction.
>
> I reviewed first versions of the change, and IMHO it's almost ready to
> be merged. But I would prefer to have a review from a least a second
> core reviewer.
>
> Can someone please review the change?
>
> --
>
> The side effect of wordcode is that arguments in 0..255 now uses 2
> bytes per instruction instead of 3, so it also reduce the size of
> bytecode for the most common case.
>
> Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
> of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
> bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
> argument for keyword defaults and 24-bit argument for annotations.
> Other common instruction known to use large argument are jumps for
> bytecode longer than 256 bytes.
>
> --
>
> Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
> instructions. Later, we can discuss if it would be possible to ensure
> that the bytecode is always aligned to 16-bit in memory to fetch the
> two bytes using a uint16_t* pointer.
>
> Maybe we can overallocate 1 byte in codeobject.c and align manually
> the memory block if needed. Or ceval.c should maybe copy the code if
> it's not aligned?
>
> Raymond Hettinger proposes something like that, but it looks like
> there are concerns about non-aligned memory accesses:
>
>http://bugs.python.org/issue25823
>
> The cost of non-aligned memory accesses depends on the CPU
> architecture, but it can raise a SIGBUS on some arch (MIPS and
> SPARC?).
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Victor Stinner
2016-04-13 23:02 GMT+02:00 Eric Fahlgren :
> Percentage of 1-byte args= 96.80%

Yeah, I expected such high ratio. Good news that you confirm it.


> Non-argument ops =53,719
> One-byte args=   368,787
> Multi-byte args  =12,191

Again, only a very few arguments take multiple bytes. Good, the
bytecode will be smaller.

IMHO it's more a nice side effect than a real goal. The runtime
performance matters more than the size of the bytecode, it's not like
a bytecode take 4 MB. It's probably closer to 1 KB and so can probably
benefit of the fatest CPU caches.


> Just for the record, here's my arithmetic:
> byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs
> wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs

If multiByteArgs means any size > 1 byte, the wordCodeSize formula is wrong:

- no parameter: 2 bytes
- 8-bit parameter: 2 bytes
- 16-bit parameter: 4 bytes
- 24-bit parameter: 6 bytes
- 32-bit parameter: 8 bytes

But you wrote that you didn't see EXTEND_ARG, so I guess that
multibyte means 16-bit in your case, and so your formula is correct.

Hopefully, I don't expect 32-bit parameters in the wild, only 24-bit
parameter for function with annotation.


> (It is interesting to note that I have never encountered an EXTENDED_ARG 
> operator in the wild, only in my own synthetic examples.)

As I wrote, EXTENDED_ARG can be seen when MAKE_FUNCTION is used with
annotations.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Eric Fahlgren
On Wednesday, April 13, 2016 09:25, Victor Stinner wrote:
> The side effect of wordcode is that arguments in 0..255 now uses 2 bytes per
> instruction instead of 3, so it also reduce the size of bytecode for the most
> common case.
> 
> Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead of 3.
> Arguments are supported up to 32-bit: 24-bit uses 3 units (6 bytes), 32-bit 
> uses 4
> units (8 bytes). MAKE_FUNCTION uses 16-bit argument for keyword defaults and
> 24-bit argument for annotations.
> Other common instruction known to use large argument are jumps for bytecode
> longer than 256 bytes.

A couple months ago during an earlier discussion of wordcode, I got curious 
enough to instrument dis.dis so that I could calculate the actual size changes 
expected in practice.  I ran it on a large chunk of our product code, here are 
the results (looks best with a fixed font).  I suspect the fairly significant 
reduction in footprint will also give better cache hit characteristics, so we 
might see some "magic" speed ups from that, too.

Code-generating source lines =70,792
Total bytes  = 1,196,653
Argument-bearing operators   =   380,978
Operands over 1 byte long=12,191
Extended arguments   = 0
Percentage of 1-byte args= 96.80%

Total operators  =   434,697
Non-argument ops =53,719
One-byte args=   368,787
Multi-byte args  =12,191
Byte code size   = 1,196,653
Word code size   =   893,776
Word:byte size   = 74.69%

Just for the record, here's my arithmetic:
byteCodeSize = 1*nonArgumentOps + 3*oneByteArgs + 3*multiByteArgs
wordCodeSize = 2*nonArgumentOps + 2*oneByteArgs + 4*multiByteArgs

(It is interesting to note that I have never encountered an EXTENDED_ARG 
operator in the wild, only in my own synthetic examples.)

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Guido van Rossum
Nice work. I think that for CPython, speed is much more important than
memory use for the code. Disk space is practically free for anything
smaller than a video. :-)

On Wed, Apr 13, 2016 at 9:24 AM, Victor Stinner
 wrote:
> Hi,
>
> In the middle of recent discussions about Python performance, it was
> discussed to change the Python bytecode. Serhiy proposed to reuse
> MicroPython short bytecode to reduce the disk space and reduce the
> memory footprint.
>
> Demur Rumed proposes a different change to use a regular bytecode
> using 16-bit units: an instruction has always one 8-bit argument, it's
> zero if the instruction doesn't have an argument:
>
>http://bugs.python.org/issue26647
>
> According to benchmarks, it looks faster:
>
>   http://bugs.python.org/issue26647#msg263339
>
> IMHO it's a nice enhancement: it makes the code simpler. The most
> interesting change is made in Python/ceval.c:
>
> -if (HAS_ARG(opcode))
> -oparg = NEXTARG();
> +oparg = NEXTARG();
>
> This code is the very hot loop evaluating Python bytecode. I expect
> that removing a conditional branch here can reduce the CPU branch
> misprediction.
>
> I reviewed first versions of the change, and IMHO it's almost ready to
> be merged. But I would prefer to have a review from a least a second
> core reviewer.
>
> Can someone please review the change?
>
> --
>
> The side effect of wordcode is that arguments in 0..255 now uses 2
> bytes per instruction instead of 3, so it also reduce the size of
> bytecode for the most common case.
>
> Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
> of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
> bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
> argument for keyword defaults and 24-bit argument for annotations.
> Other common instruction known to use large argument are jumps for
> bytecode longer than 256 bytes.
>
> --
>
> Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
> instructions. Later, we can discuss if it would be possible to ensure
> that the bytecode is always aligned to 16-bit in memory to fetch the
> two bytes using a uint16_t* pointer.
>
> Maybe we can overallocate 1 byte in codeobject.c and align manually
> the memory block if needed. Or ceval.c should maybe copy the code if
> it's not aligned?
>
> Raymond Hettinger proposes something like that, but it looks like
> there are concerns about non-aligned memory accesses:
>
>http://bugs.python.org/issue25823
>
> The cost of non-aligned memory accesses depends on the CPU
> architecture, but it can raise a SIGBUS on some arch (MIPS and
> SPARC?).
>
> Victor
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Wordcode: new regular bytecode using 16-bit units

2016-04-13 Thread Victor Stinner
Hi,

In the middle of recent discussions about Python performance, it was
discussed to change the Python bytecode. Serhiy proposed to reuse
MicroPython short bytecode to reduce the disk space and reduce the
memory footprint.

Demur Rumed proposes a different change to use a regular bytecode
using 16-bit units: an instruction has always one 8-bit argument, it's
zero if the instruction doesn't have an argument:

   http://bugs.python.org/issue26647

According to benchmarks, it looks faster:

  http://bugs.python.org/issue26647#msg263339

IMHO it's a nice enhancement: it makes the code simpler. The most
interesting change is made in Python/ceval.c:

-if (HAS_ARG(opcode))
-oparg = NEXTARG();
+oparg = NEXTARG();

This code is the very hot loop evaluating Python bytecode. I expect
that removing a conditional branch here can reduce the CPU branch
misprediction.

I reviewed first versions of the change, and IMHO it's almost ready to
be merged. But I would prefer to have a review from a least a second
core reviewer.

Can someone please review the change?

--

The side effect of wordcode is that arguments in 0..255 now uses 2
bytes per instruction instead of 3, so it also reduce the size of
bytecode for the most common case.

Larger argument, 16-bit argument (0..65,535), now uses 4 bytes instead
of 3. Arguments are supported up to 32-bit: 24-bit uses 3 units (6
bytes), 32-bit uses 4 units (8 bytes). MAKE_FUNCTION uses 16-bit
argument for keyword defaults and 24-bit argument for annotations.
Other common instruction known to use large argument are jumps for
bytecode longer than 256 bytes.

--

Right now, ceval.c still fetchs opcode and then oparg with two 8-bit
instructions. Later, we can discuss if it would be possible to ensure
that the bytecode is always aligned to 16-bit in memory to fetch the
two bytes using a uint16_t* pointer.

Maybe we can overallocate 1 byte in codeobject.c and align manually
the memory block if needed. Or ceval.c should maybe copy the code if
it's not aligned?

Raymond Hettinger proposes something like that, but it looks like
there are concerns about non-aligned memory accesses:

   http://bugs.python.org/issue25823

The cost of non-aligned memory accesses depends on the CPU
architecture, but it can raise a SIGBUS on some arch (MIPS and
SPARC?).

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com