[issue36839] Support the buffer protocol in code objects

2020-07-01 Thread Stefan Krah


Stefan Krah  added the comment:

For clarification, the incident I referred to is entirely unrelated to *this* 
issue:  Of course Dino Viehland has been rational, friendly and competent.

I wanted to point out that people might have had a formative experience 
*elsewhere*.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2020-07-01 Thread STINNER Victor


Change by STINNER Victor :


--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2020-07-01 Thread Inada Naoki


Inada Naoki  added the comment:

FWI, I wrote my idea in python-ideas mailing list.

https://mail.python.org/archives/list/python-id...@python.org/message/VKBXY7KDI2OGESB7IPAMAIIHKR4TC7TQ/

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2020-07-01 Thread Stefan Krah


Stefan Krah  added the comment:

Though code objects do not concern me directly, as the author of
memoryview I now concur with Inada-san that we should not support
hacks in the Python code base that benefit a single company.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2020-07-01 Thread Stefan Krah


Stefan Krah  added the comment:

After seeing people from a certain company defend a bizarre and 
broken stack with falsehoods, I apologize to Inada-san and now
assume that he had a similar experience.

--
nosy: +skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-05 Thread Brett Cannon


Brett Cannon  added the comment:

"I'm sorry, I thought "fantasy" was good metaphor."

No problem! Saying something is "fantasy" like "it's based on fantasy" -- in 
North America at least -- is like saying "you're fooling yourself" or "you're 
delusional".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-04 Thread Inada Naoki


Inada Naoki  added the comment:

I'm sorry, I thought "fantasy" was good metaphor.
I just meant "the estimation seems too optimistic and rough. discussion should 
not based on it".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-04 Thread Brett Cannon


Brett Cannon  added the comment:

Let's please keep this respectful. Saying people are basing things "on fantasy" 
or that "people need to develop reading skills" is not helpful.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-04 Thread Stefan Krah


Change by Stefan Krah :


--
nosy:  -skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-04 Thread Stefan Krah


Stefan Krah  added the comment:

There's no point discussing unless people develop reading skills.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-04 Thread Inada Naoki


Inada Naoki  added the comment:

> Stefan Krah  added the comment:
>
> 720MB <= "3-4 dozen" * 20 MB <= 960MB.  Per server.
>
> It has all been said. :-)

I don't understand what message you are replying.
I'm not interested in the number.  Who asked MBs / server?

Absolute number is not important when optimizing.

"I can reduce 3sec" is not important, unless total time is given.
3.5s -> 0.5s is great.  But 60 hours -> 59h59m57s is not great.

Hi didn't write anything about how much memory used by process.
Additionally, he measured serialized size, not memory usage.
It doesn't make any sense.

I don't trust even 20MB/process saving.  It is only in his mind.
This proposal is based on fantasy, not based on pragmatic
survey.

I think we should not take more time to discuss, until he proof the idea
by actually saving 20MB / process and provide detailed report.

>
> I don't understand the objections about alignment. malloc() and obmalloc()
> are at least 8-byte aligned, mmap() with NULL as the first parameter
> is page-aligned.
>

I think Serhiy meant alignment in pyc file.  bytes object in pyc file is not
aligned.  So it's bad idea to use mmap for pyc files.

But Instagram may use original serialize format instead of pyc files,
and it may align bytes object.  So it may be not a problem.
No one other than Instagrammer knows, because there is no
detailed information.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-04 Thread Stefan Krah


Stefan Krah  added the comment:

720MB <= "3-4 dozen" * 20 MB <= 960MB.  Per server.

It has all been said. :-)


I don't understand the objections about alignment. malloc() and obmalloc()
are at least 8-byte aligned, mmap() with NULL as the first parameter
is page-aligned.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-04 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Yet one consideration. The bytecode is actually a wordcode, a sequence of 
16-bit words, and the start of the array must be properly aligned. There is no 
problem if it is a data of the bytes object, but if you have a mapping of the 
pyc file the alignment is not guarantied. You need to change also the marshal 
protocol or save the bytecode in other file. This is a large change.

So you need to make several large changes (changes in the import system, change 
in the pyc file) for using this feature. Supporting the buffer protocol in code 
objects is minor change to this. If you need to patch other parts of Python it 
is not hard to patch also the code object in your custom build. Other Python 
users will not have benefit from this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-03 Thread Inada Naoki


Inada Naoki  added the comment:

On Tue, Jun 4, 2019 at 8:45 AM Dino Viehland  wrote:
>
> The 20MB of savings is actually the amount of byte code that exists in the IG 
> code base.

Wait, do you expect you can reduce 100% saving of co_code object?
co_code takes 0byte?
You said "the overhead it just takes a byte code w/ 16 opcodes before
it breaks even" later.
So I believe you must understand there is overhead.

>  I was just measuring the web site code, and not the other various Python 
> code in the process (e.g. no std lib code, no 3rd party libraries, etc...).  
> The IG code base is pretty monolithic and starting up the site requires about 
> half of the code to get imported.  So I think the 20MB per process is a 
> pretty realistic number.

How many MBs your application takes?  20MB of 200MB is attractive.
20MB of 20GB is not attractive.
Absolute number is not important here.  You should use ratio.

>
> It's certainly true that the byte code isn't the #1 source of memory here 
> (the code objects themselves are pretty big), but in the serialized state it 
> ends up representing 25% of the serialized data.

"25% of the serialized data" is misleading information.
You are trying to reduce memory usage, not serialized data size.
You should use ratio of total RAM usage, not serialized data size.

> I can't make any promises about open sourcing the import system, but I can 
> certainly look into that as well.

Open sourcing is not necessary.  Try it by yourself and share the real
number, instead of
number of your estimation.
How much RSS is used by Python, and how much of it you can reduce by
this system, actually?

Regards,
-- 
Inada Naoki  

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-03 Thread Dino Viehland


Dino Viehland  added the comment:

The 20MB of savings is actually the amount of byte code that exists in the IG 
code base.  I was just measuring the web site code, and not the other various 
Python code in the process (e.g. no std lib code, no 3rd party libraries, 
etc...).  The IG code base is pretty monolithic and starting up the site 
requires about half of the code to get imported.  So I think the 20MB per 
process is a pretty realistic number.

I've also created a C extension and the object implementing the buffer protocol 
looks like:

typedef struct {
PyObject_HEAD
const char* data;
size_t size;
Py_ssize_t hash;
CIceBreaker *breaker;
size_t exports;
PyObject* code_obj; /* borrowed reference, the code object keeps us alive */
} CIceBreakerCode;

All of the modules are currently getting compiled into a single memory mapped 
file and then these objects get created which implement the buffer protocol for 
each function.  So the overhead it just takes a byte code w/ 16 opcodes before 
it breaks even, so it is significantly lighter weight than using a memoryview 
object.

It's certainly true that the byte code isn't the #1 source of memory here (the 
code objects themselves are pretty big), but in the serialized state it ends up 
representing 25% of the serialized data.  I would expect when you add in ref 
counts and typing information it's not quite as good, but reducing the overhead 
of code by 20% is still a pretty nice win.

I can't make any promises about open sourcing the import system, but I can 
certainly look into that as well.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-02 Thread Stefan Krah


Stefan Krah  added the comment:

On Sun, Jun 02, 2019 at 02:38:21AM +, Inada Naoki wrote:
> What instance means?  code object? co_code?
> Anyway, Dino didn't propose such thing.  He only proposed accepting buffer 
> object for code constructor.
> He didn't describe how to use buffer object.  Python doesn't provide share 
> buffer object inter processes.

He did, at a high level. in the original mail:

"If code objects supported the buffer protocol it would be possible to load
 code from a memory mapped file which is shared across all of the processes."


It is not very detailed, but gives the rationale.  I assumed that the
new shared memory support would be used, but it would be nice to hear
more details.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

The managed buffer can be shared if it refers to the same bytes. But different 
code objects in the same process refer to different bytes, therefore they 
should have distinct managed buffers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-01 Thread Inada Naoki


Inada Naoki  added the comment:

> And I understood that Dino proposed to share one code instance as a memory 
> mapped file for *all* processes.

What instance means?  code object? co_code?
Anyway, Dino didn't propose such thing.  He only proposed accepting buffer 
object for code constructor.
He didn't describe how to use buffer object.  Python doesn't provide share 
buffer object inter processes.

There are no enough concrete information for estimate benefits.
We're discussing imagination, not concrete idea.

Let's stop discuss more, and wait Dino provide concrete example which can be 
used to estimate benefits.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-01 Thread Stefan Krah


Stefan Krah  added the comment:

The managed buffer can be shared:

>>> b = b'12345'
>>> m1 = memoryview(b)
>>> m2 = memoryview(m1)
>>> gc.get_referents(m1)[0]

>>> gc.get_referents(m2)[0]



And I understood that Dino proposed to share one code instance as a memory 
mapped file for *all* processes.

--
nosy: +skrah

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-01 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

The size of the code object is at least 144 bytes.

>>> def f(): pass
... 
>>> sys.getsizeof(f.__code__)
144

If the function uses parameters, variables or constants (and you cannot do much 
useful without using them), their size should be added too. It overwhelms the 
size of the raw bytecode.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-01 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> * They need to use lightweight object for buffer.  At least,
> memoryview object is large (192byte
>  on Python 3.7.3 amd64).

Actually it is larger, because you should add the size of internal objects. In 
3.8:

>>> sys.getsizeof(memoryview(b''))
184
>>> sys.getsizeof(gc.get_referents(memoryview(b''))[0])
128

312 bytes total, not counting padding. The average size of co_code in Lib/*.py 
is 85 bytes. Unless Instagram uses gigantic functions and methods (and no 
comprehensions or lambdas), the net benefit will be negative.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-06-01 Thread Inada Naoki


Inada Naoki  added the comment:

On Sat, Jun 1, 2019 at 2:47 AM Brett Cannon  wrote:
>
> Brett Cannon  added the comment:
>
> RE: "I think it needs significant benefits for typical users, not only for 
> Instagram. If only Instagram get benefit from this, keep it as Instagram's 
> internal patch."
>
> But who's typical in this case? You? Me? We're talking code objects, 
> something that the typical Python user doesn't even know exists if you take 
> "typical" to mean "common" or "majority". I suspect if we asked Python 
> developers if they knew what lnotab was an abbreviation of they wouldn't 
> know, let alone how it worked.
>
> And just because Instagram did this work and is making it public doesn't mean 
> (a) others wouldn't use it or (b) others are already doing it privately. 
> There are plenty of other companies with massive server installs of Python 
> beyond Instagram.
>
> My point is we have to be very careful when we start to claim we know what is 
> typical or useful to the wider community considering how huge and diverse the 
> Python community is.
>

I'm sorry for bad word choice.  I didn't mean "majority" or "common".
For example, I was +1 for gc.freeze(), even if majority users doesn't use it.

My point is (a) it's very hard others utilize it, and (b) it's very
hard others confirm the benefit.

And I suspect that Dino's estimation is wrong, but no one can verify
the estimation
because he didn't show enough concrete information.

Elaborately speaking,

* They need to use custom importer and serializer other than marshal
to import code objects.
* They need to use lightweight object for buffer.  At least,
memoryview object is large (192byte
  on Python 3.7.3 amd64).

Even if Instaram can not open source their import system, they can be
show more concrete
data.  For example, their this [1] report is very good at considering
gc.freeze().

[1] 
https://instagram-engineering.com/dismissing-python-garbage-collection-at-instagram-4dca40b29172

While I agree that mmap pyc files is interesting idea, I feel allowing
buffer object for co_code
is wrong way to achieve it (*).  But there are no enough data to
discuss the benefit.

Current status is Dino claims this idea have benefit in Instagram, but
no one can confirm the benefit.
I think we shouldn't discuss without reliable or reproducible data for
this type of proposal.

(*) One possible idea is code object have just a pointer and length
for co_code, and reference to
   object who owns the data (mmaped file).  code object can load
co_lnotab and docstring lazily too.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-31 Thread Brett Cannon


Brett Cannon  added the comment:

RE: "I think it needs significant benefits for typical users, not only for 
Instagram. If only Instagram get benefit from this, keep it as Instagram's 
internal patch."

But who's typical in this case? You? Me? We're talking code objects, something 
that the typical Python user doesn't even know exists if you take "typical" to 
mean "common" or "majority". I suspect if we asked Python developers if they 
knew what lnotab was an abbreviation of they wouldn't know, let alone how it 
worked.

And just because Instagram did this work and is making it public doesn't mean 
(a) others wouldn't use it or (b) others are already doing it privately. There 
are plenty of other companies with massive server installs of Python beyond 
Instagram.

My point is we have to be very careful when we start to claim we know what is 
typical or useful to the wider community considering how huge and diverse the 
Python community is.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-30 Thread Inada Naoki


Inada Naoki  added the comment:

> In the Instagram case there's about 20mb of byte code total and there are 3-4 
> dozen worker processes running per server.  The byte code also represents the 
> second largest section of memory as far as serialized code objects are 
> concerned, the only larger one is strings (which are about 25mb).  

How did you measure it?  How much % of the RAM usage of process?  What 
"serialized code objects are concerned" means?  Don't you concern RAM usage in 
process?


> FWIW, I don't see the problem with supporting any read-only "buffer" object, 
> rather than just bytes objects, for the string of bytes in a code object.  
> That's all that Dino is proposing.

It means byte code is changed in anytime by anyone other than code object.
code object is not immutable anymore.  "read-only" means only code object 
doesn't change the co_code.  Anyone else can modify it without breaking Python 
(and Python/C API) semantics.

For example, per-opcode cache can't assume opcode is not changed.
I think there are some other possible optimization from co_code is constant.

Another complexity is GC.  We allowed only bytes here (no subclass allowed) and 
it allow us to code object is non-GC object.

If we allow any object implementing buffer protocol, we should make code object 
as GC object.  We can untrack the code object if co_code is bytes (like tuples 
does), but all code objects needs 16byte header (like tuples requires it...)

Of course we can break the rule and allow Python makes circular reference not 
collected.

But please don't think changing from bytes to arbitrary objects is simple 
change.

I think it needs significant benefits for typical users, not only for Instagram.
If only Instagram get benefit from this, keep it as Instagram's internal patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-30 Thread Dino Viehland


Dino Viehland  added the comment:

Well given that we're one day away from 3.8 Beta 1 I'm not going to rush this 
in :)  In particular the discussion has made me wonder if I can also do more 
with sharing strings using the legacy unicode strings (which I don't believe 
will require any runtime changes).  So I'll keep pushing on this and hopefully 
be able to demonstrate an even large win from the concept.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-30 Thread Brett Cannon


Brett Cannon  added the comment:

I agree with Eric. While I understand what Serhiy is saying about code objects 
being more than a bytearray of bytecode, the buffer protocol is still a view on 
an object, and that view happens to be for a subset which I think is acceptable 
as I think of what a buffer of a code object would be other than the bytecode.

I also think read-only is a good enough guarantee. We're talking low-level 
stuff here and if anyone were to muck with this stuff they are already playing 
with fire.

--
nosy: +brett.cannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-30 Thread Eric Snow


Eric Snow  added the comment:

FWIW, I don't see the problem with supporting any read-only "buffer" object, 
rather than just bytes objects, for the string of bytes in a code object.  
That's all that Dino is proposing.  The change is not invasive and solves a 
real need.  The additional maintenance burden is negligible.  Furthermore, the 
accommodation makes sense conceptually.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-30 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I concur with Inada. This would add complexity in the code object.

Note also that the code object is not just a sequence of bytes. It is a complex 
object which contains references to a dozen of objects: bytes objects, strings, 
integers, tuples, dicts. I suppose that the byte code itself is only a tiny 
part of the memory consumed by the code object in large program.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-30 Thread Dino Viehland


Dino Viehland  added the comment:

In the Instagram case there's about 20mb of byte code total and there are 3-4 
dozen worker processes running per server.  The byte code also represents the 
second largest section of memory as far as serialized code objects are 
concerned, the only larger one is strings (which are about 25mb).  
 
Anyway, that leads to around 1gb of memory savings per server, which is the 
reason why this was an interesting optimization to pursue when it's applied to 
many servers.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-28 Thread Inada Naoki


Inada Naoki  added the comment:

> Could someone create a buffer object which still allows the underlying memory 
> to be written?  Sure.  But I can use ctypes to modify byte code today as well 
> with something like "ctypes.cast(id(f.__code__.co_code) + 32, 
> ctypes.POINTER(ctypes.c_char)) [0] = 101" 

You are comparing apple and orange.
Breanking memory inside immutable object by ctypes is far different from 
mutating mutable memory.

It introduce more weakness and complexity into code object.

At least, you need to demonstrate the benefit.

When importing module, there are many objects are created.  Why avoiding decref 
only for co_code make much difference?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-28 Thread Dino Viehland


Dino Viehland  added the comment:

Sure, but immutable/const is almost always a language level guarantee.  The 
only case where that's not true is when you have OS/hardware level memory 
protection and that doesn't apply to any of Python's existing byte codes.

So from a Python perspective, code objects are remaining immutable - they can 
only be created by objects which expose the read-only buffer protocol.  So for 
example passing in a memoryview(b'abc') will work here while a 
memoryview(bytearray(b'abc')) will fail.  And because when asking for a non 
read-write view the buffer implementer needs to be consistent  for all callers 
(https://docs.python.org/3/c-api/buffer.html#c.PyBUF_WRITABLE) it seems that 
this invariant should hold for all objects being passed in.

Could someone create a buffer object which still allows the underlying memory 
to be written?  Sure.  But I can use ctypes to modify byte code today as well 
with something like "ctypes.cast(id(f.__code__.co_code) + 32, 
ctypes.POINTER(ctypes.c_char)) [0] = 101" 

So people will still be able to do nasty things, but there are certainly 
guards in place to strongly discourage them from doing so.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-28 Thread Inada Naoki


Inada Naoki  added the comment:

read-only is slightly different than const / immutable.

const / immutable means anyone can not modify the memory.

read-only means only the memory should not be modified through
the buffer.  But underlaying memory block can be modified by
owner who creates buffer.

For example, you can create read only buffer from bytearray,
or even raw C char array.  It doesn't violate semantics of read-only.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-28 Thread Dino Viehland


Dino Viehland  added the comment:

The PR actually checks that the buffer is read-only (this was also a concern 
that Mark Shannon had).  And the Python buffer protocol says that you need to 
consistently hand out read-only buffers.  So while someone could create a 
buffer and mutate it outside of the buffer protocol it should be really 
read-only.  

As far as the ref counts, it's not just the ref counts for the code byte 
strings that are potentially problematic.  But it's the ref counts on all of 
the random other objects which the code objects are on the same page as, as 
well as other random read-write data that could be on those pages.  There's 
also an additional benefit in that code objects not loaded before forking can 
continue to share their memory as well.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-27 Thread Inada Naoki


Inada Naoki  added the comment:

I don't like this.
It removes guarantee that code object is constant / immutable.

> Because the code objects are on random pages and are ref counted the ref 
> counts can cause all of the code to not be shared across processes.

These ref counts are not incremented or decremented typically, until shutting 
down.
If you want to avoid CoW when shutting down, you can use os._exit.

--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-07 Thread Dino Viehland


Change by Dino Viehland :


--
nosy: +eric.snow

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-07 Thread Dino Viehland


Change by Dino Viehland :


--
keywords: +patch
pull_requests: +13093
stage: needs patch -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36839] Support the buffer protocol in code objects

2019-05-07 Thread Dino Viehland


New submission from Dino Viehland :

Many Python deployments involve large code based that are used in scenarios 
where processes are fork-and-exec'd.  When running in these environments code 
objects can end up occupying a lot of memory.  Because the code objects are on 
random pages and are ref counted the ref counts can cause all of the code to 
not be shared across processes.

If code objects supported the buffer protocol it would be possible to load code 
from a memory mapped file which is shared across all of the processes.

--
assignee: dino.viehland
components: Interpreter Core
messages: 341808
nosy: dino.viehland
priority: normal
severity: normal
stage: needs patch
status: open
title: Support the buffer protocol in code objects
type: enhancement
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com