[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-12 Thread Guido van Rossum


Guido van Rossum  added the comment:

Okay, thanks. We may do one of the other ideas (maybe co_flags & CO_DOCSTRING).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-12 Thread Inada Naoki


Inada Naoki  added the comment:

Although I still feel reducing 16% tuples is attractive, no one support the 
idea.

I leave this as-is for now, and will go to lazy-loading docstring (maybe, 
co_linetable too) later.

--
resolution:  -> rejected
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-03 Thread Inada Naoki


Inada Naoki  added the comment:

Lazy filling func.__doc__ has only 3~5% performance gain. And it has small 
backward incompatibility.

```
>>> def foo(): "foo"
...
>>> def bar(): "bar"
...
>>> bar.__code__ = foo.__code__
>>> bar.__doc__
'foo'  # was 'bar'
```


Note that non-constant docstring (and PEP 649 will) have larger overhead. Some 
people don't write docstring for private/local functions, but write annotation 
for code completion and/or type checking.

```
$ load-none-remove-docstring/release/bin/pyperf timeit --duplicate=100 "def 
f(x: int, y: str) -> float: pass"
.
Mean +- std dev: 111 ns +- 2 ns

$ load-none-remove-docstring/release/bin/pyperf timeit --duplicate=100 "def 
f(x, y): 'doc'"
.
Mean +- std dev: 63.9 ns +- 2.1 ns
```

So I think 2~3ns is a "tiny fraction" here.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-03 Thread Inada Naoki


Change by Inada Naoki :


--
pull_requests: +27065
pull_request: https://github.com/python/cpython/pull/28704

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-03 Thread Inada Naoki


Inada Naoki  added the comment:

> Do you have any explanation of this?

I think its because current PyFunction_New tries to get docstring always.
See this pull request (lazy-func-doc). 
https://github.com/python/cpython/pull/28704

lazy-func-doc is faster than co-docstring and remove-docstring in both of 
with/without docstring.

```
# co-docstring vs lazy-func-doc.
$ load-none-co-docstring/release/bin/pyperf timeit --compare-to 
./cpython/release/bin/python3 --python-names lazy-func-doc:co-docstring 
--duplicate=100 "def f(): pass"
lazy-func-doc: . 58.6 ns +- 1.6 ns
co-docstring: . 60.3 ns +- 2.0 ns

Mean +- std dev: [lazy-func-doc] 58.6 ns +- 1.6 ns -> [co-docstring] 60.3 ns +- 
2.0 ns: 1.03x slower

$ load-none-co-docstring/release/bin/pyperf timeit --compare-to 
./cpython/release/bin/python3 --python-names lazy-func-doc:co-docstring 
--duplicate=100 "def f(): 'doc'"
lazy-func-doc: . 59.6 ns +- 1.1 ns
co-docstring: . 62.3 ns +- 1.7 ns

Mean +- std dev: [lazy-func-doc] 59.6 ns +- 1.1 ns -> [co-docstring] 62.3 ns +- 
1.7 ns: 1.05x slower

# remove docstring vs lazy-func-doc

$ load-none-remove-docstring/release/bin/pyperf timeit --compare-to 
./cpython/release/bin/python3 --python-names lazy-func-doc:remove-docstring 
--duplicate=100 "def f(): pass"
lazy-func-doc: . 58.0 ns +- 1.1 ns
remove-docstring: . 60.5 ns +- 1.5 ns

Mean +- std dev: [lazy-func-doc] 58.0 ns +- 1.1 ns -> [remove-docstring] 60.5 
ns +- 1.5 ns: 1.04x slower

$ load-none-remove-docstring/release/bin/pyperf timeit --compare-to 
./cpython/release/bin/python3 --python-names lazy-func-doc:remove-docstring 
--duplicate=100 "def f(): 'doc'"
lazy-func-doc: . 59.9 ns +- 2.3 ns
remove-docstring: . 63.5 ns +- 1.5 ns

Mean +- std dev: [lazy-func-doc] 59.9 ns +- 2.3 ns -> [remove-docstring] 63.5 
ns +- 1.5 ns: 1.06x slower
```

Note that this benchmark runs on my MacBook. Results may be bit unstable, 
although I don't touch anything (especially, browser) during the run.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-03 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Hint: you can specify several arguments for multiline code. E.g. timeit -s 
"setup1" -s "setup2" "test1" "test2".

> And as a bonus, creating function without docstring is little faster.

Do you have any explanation of this?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Inada Naoki


Inada Naoki  added the comment:

> So overhead is around 2%. And this 2% is problem only for "creating function 
> with annotation, without docstring, never called, in loop" situation.

My bad, "creating function with docstring, without annotation, nevercalled in 
loop" situation.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Inada Naoki


Inada Naoki  added the comment:

And as a bonus, creating function without docstring is little faster.

```
$ cpython/release/bin/pyperf timeit --duplicate=100 "def f(): pass"
.
Mean +- std dev: 62.5 ns +- 1.2 ns

$ load-none-remove-docstring/release/bin/pyperf timeit --duplicate=100 "def 
f(): pass"
.
Mean +- std dev: 60.5 ns +- 1.3 ns
```

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Inada Naoki


Inada Naoki  added the comment:

> The difference 5.1 ns is the cost of additional LOAD_CONST. It is around 8% 
> (but can be 12% or 2%). The cost of setting docstring externally will be the 
> same.


I don't have bare metal machine for now so I don't know why annotation is so 
slow. But cost of setting docstring is lighter.

```
# main branch
$ cpython/release/bin/pyperf timeit --duplicate=100 "def f():
>   'docstring'"
.
Mean +- std dev: 61.5 ns +- 1.3 ns

# https://github.com/methane/cpython/pull/37
$ load-none-remove-docstring/release/bin/pyperf timeit --duplicate=100 "def f():
>   'docstring'"
.
Mean +- std dev: 62.9 ns +- 1.5 ns

$ load-none-remove-docstring/release/bin/pyperf timeit --duplicate=100 "def 
f(x: 'int', y: 'str') -> 'float': pass"
.
Mean +- std dev: 65.1 ns +- 4.3 ns

$ load-none-remove-docstring/release/bin/pyperf timeit --duplicate=100 "def 
f(x: 'int', y: 'str') -> 'float': 'docstring'"
.
Mean +- std dev: 66.3 ns +- 2.6 ns

$ load-none-remove-docstring/release/bin/pyperf timeit --duplicate=100 "def 
f(x: 'int', y: 'str') -> 'float': 'docstring'
> f(None,None)"
.
Mean +- std dev: 131 ns +- 6 ns
```

So overhead is around 2%. And this 2% is problem only for "creating function 
with annotation, without docstring, never called, in loop" situation.
In regular situation, this overhead will be negligible.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> I'm confused by your phrase "take data" -- do you mean remove these? Or wht 
> do you propose we do with them?

I thought that function's name and qualname are set in the code that creates a 
function instead of copying from the code object. Similarly as what Inada-san 
propose for docstring. Perhaps it was in the past. Also, the documentation 
tells that annotations is a tuple of strings, so it should be known at the 
compile time. I propose to make it an attribute of the code object and copy to 
the function object when create a function. It saves a LOAD_CONST.

> Smaller, maybe. Measurably faster? Can you demonstrate that with a patch?

$ ./python -m pyperf timeit --duplicate=100  "def f(x: 'int', y: 'str') -> 
'float': pass"
Mean +- std dev: 64.6 ns +- 4.2 ns
$ ./python -m pyperf timeit --duplicate=100  "def f(x, y): pass"
Mean +- std dev: 59.5 ns +- 2.4 ns

The difference 5.1 ns is the cost of additional LOAD_CONST. It is around 8% 
(but can be 12% or 2%). The cost of setting docstring externally will be the 
same.

> Oh, what do you mean exactly by "local functions" -- is that any function, or 
> only a function nested inside another function?

Global functions and methods of global classes created at import time and only 
once. But functions nested inside another function are created every time when 
the external function is created, and they can even be created in a loop. It is 
a tiny cost, by why make it larger?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Guido van Rossum


Guido van Rossum  added the comment:

Serhiy:
> I propose an opposite change -- take data known at compile time (name, 
> qualname and annotations).

I'm confused by your phrase "take data" -- do you mean remove these? Or wht do 
you propose we do with them?

> It will make the code for creating new function smaller and faster.

Smaller, maybe. Measurably faster? Can you demonstrate that with a patch?

> It is what we want to achieve -- reducing import time, but additionally it 
> will reduce time of creating local functions.

I don't think the creation time of local functions is burdensome. (Creating a 
class is much more so.)

Oh, what do you mean exactly by "local functions" -- is that any function, or 
only a function nested inside another function?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I propose an opposite change -- take data known at compile time (name, qualname 
and annotations). It will make the code for creating new function smaller and 
faster. It is what we want to achieve -- reducing import time, but additionally 
it will reduce time of creating local functions.

Arguments for saving few bytes do not look convincing to me. It is why we use 
caches -- memory is cheaper than the CPU time. And in most cases there is no 
any saving.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Irit Katriel

Irit Katriel  added the comment:

> This not saves only memory usage, but also import time too.

Do you see a measurable impact on import time?

With LOAD_NONE I saw speedup of 8% on micro benchmarks but it didn’t make any 
difference overall.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-10-02 Thread Inada Naoki

Inada Naoki  added the comment:

> For the sqlalchemy example: the saving in co_consts is about 1.6k (200 
> pointers), but an increase in bytecode size of 2.4k.

Please see number of co_constatns tuples. (d) saved 1307 tuples compared to (b).
`sys.getsizeof(())` is 40 on 64bit machine. So 1307 tuples is 50k bytes. This 
not saves only memory usage, but also import time too.

Although bytecode size is increased, they are released soon right after 
importing module because `LOAD_CONST` is in module or class code.

So there is a significant gain overall.

> It’s not clear that LOAD_NONE/LOAD_COMMON_CONST are worth doing. Any way the 
> docstring question is not necessarily related to that.

I combined with LOAD_NONE because this issue and LOAD_NONE/LOAD_COMMON_CONST 
has synergy.

But I don't combine it in next time.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-30 Thread Irit Katriel

Irit Katriel  added the comment:

It’s not clear that LOAD_NONE/LOAD_COMMON_CONST are worth doing. Any way the 
docstring question is not necessarily related to that.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-30 Thread Mark Shannon


Mark Shannon  added the comment:

Since the docstring itself will always be present (attached to the function 
object), removing a docstring from a co_consts tuple will only save one pointer 
(8 bytes).

Given that, it would appear that (d) uses *more* memory than (b).

For the sqlalchemy example: the saving in co_consts is about 1.6k (200 
pointers), but an increase in bytecode size of 2.4k.

Either way, the difference is a tiny fraction of the total memory used for code 
objects.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-30 Thread Inada Naoki


Inada Naoki  added the comment:

My machine at the office (used for benchmarking) is hanged up and I need to go 
to the office to reboot. So I don't have benchmark machine for now.

Please prioritize LOAD_NONE/LOAD_COMMON_CONST than this. It is hard to maintain 
merged branches. Merging LOAD_NONE/LOAD_COMMON_CONST into main branch makes 
this issue easier.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-30 Thread Inada Naoki


Inada Naoki  added the comment:

I used this tool to count co_const size and numbers.
https://github.com/faster-cpython/tools/pull/6

Target is asyncio in the main branch.

main (b34dd58f):
Total: 31 files; 1,068 code objects; 12,741 lines; 39,208 opcodes; 3,880 total 
size of co_consts; 738 number of co_consts

LOAD_NONE (https://github.com/python/cpython/pull/28376):
Total: 31 files; 1,068 code objects; 12,741 lines; 39,208 opcodes; 3,617 total 
size of co_consts; 743 number of co_consts

(b) LOAD_NONE + CO_DOCSTRING (b: https://github.com/methane/cpython/pull/36):
Total: 31 files; 1,068 code objects; 12,741 lines; 39,208 opcodes; 3,272 total 
size of co_consts; 732 number of co_consts

(d) LOAD_NONE + remove docstring from code (d: 
https://github.com/methane/cpython/pull/37):
Total: 31 files; 1,068 code objects; 12,741 lines; 39,469 opcodes;  3,255 total 
size of co_consts; 574 number of co_consts

number of co_consts:
main -> (b) = 738 -> 732 (-6, -0.8%)
(b) -> (d) = 732 -> 574   (-158, -21.6%)

total size of co_consts:
main -> (b) = 3880 -> 3272 (-608, -15.7%)
(b) -> (d) = 3272 -> 3255  (-17, -0.5%)  (*)

(*) It seems tiny difference. But note that code objects for modules and 
classes will be released after execution. So (d) will have smaller total size 
of remaining co_consts after execution.

---

Another target is SQLAlchemy-1.4.25/lib

main (b34dd58f):
Total: 236 files; 11,802 code objects; 179,284 lines; 372,983 opcodes; 46,091 
total size of co_consts; 7,979 number of co_consts

LOAD_NONE (https://github.com/python/cpython/pull/28376):
Total: 236 files; 11,802 code objects; 179,284 lines; 372,983 opcodes; 43,272 
total size of co_consts; 7,980 number of co_consts

(b) LOAD_NONE + CO_DOCSTRING (b: https://github.com/methane/cpython/pull/36):
Total: 236 files; 11,802 code objects; 179,284 lines; 372,983 opcodes; 39,599 
total size of co_consts; 7,833 number of co_consts

(d) LOAD_NONE + remove docstring from code (d: 
https://github.com/methane/cpython/pull/37):
Total: 236 files; 11,802 code objects; 179,284 lines; 375,396 opcodes; 39,418 
total size of co_consts; 6,526 number of co_consts

number of co_consts:
main -> (b) = 7979 -> 7833 (-146, -1.83%)
(b) -> (d) = 7833 -> 6526   (-1307, -16.7%)

total size of co_consts:
main -> (b) = 46091 -> 39599 (-6492, -14.1%)
(b) -> (d) = 39599 -> 39418  (-141, -0.36%)

---

Conclusion: (b) reduces total size of co_consts significantly, and (d) reduces 
both of total size and number of co_consts significantly.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-30 Thread Mark Shannon


Mark Shannon  added the comment:

I strongly favor (b) over (d).

(d) adds more complexity to MAKE_FUNCTION.

MAKE_FUNCTION represents a measurable fraction of execution time for many 
programs. The more flags and branches it has, the harder it is to optimize.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-29 Thread Inada Naoki


Inada Naoki  added the comment:

> I'm not in favor of (c) co_doc either any more (for the reasons you state). I 
> would go for (b), a CO_DOCSTRING flag plus co_consts[0]. I expect that 
> co_consts sharing to be a very minor benefit, but you could easily count this 
> with another small change to the count script.

OK. Let's reject (c).
I expect that CO_DOCSTRING benefit is much more minor than co_consts sharing. I 
will compare (b) with (d).

> Nested function creation could perhaps become a fraction faster if we didn't 
> copy the docstring into the function object, leaving it func_doc NULL, making 
> func.__doc__ a property that falls back on co_consts[0] if the flag is set.

Copying the docstring is way faster than creating annotations. So I don't think 
nested function creation time is main issue.

> I expect lazy docstrings to be in the distant future (I experimented quite a 
> bit with different marshal formats to support this and it wasn't easy at all) 
> but I don't want to exclude it.

Since code object is immutable/hashable, removing docstring from code object 
makes this idea easy.

For example, we can store long docstrings in some db (e.g. sqlite, dbm) in the 
__pycache__ directory and store its id to func.__doc__. When func.__doc__ is 
accessed, it can load the docstring from db.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-29 Thread Guido van Rossum


Guido van Rossum  added the comment:

Not so fast.

I'm not in favor of (c) co_doc either any more (for the reasons you state). I 
would go for (b), a CO_DOCSTRING flag plus co_consts[0]. I expect that 
co_consts sharing to be a very minor benefit, but you could easily count this 
with another small change to the count script.

Moving the docstring to the surrounding object would not make much of a 
difference time- or speed-wise but I think it's the wrong thing to do since it 
dissociates the docstring from the function.

Nested function creation could perhaps become a fraction faster if we didn't 
copy the docstring into the function object, leaving it func_doc NULL, making 
func.__doc__ a property that falls back on co_consts[0] if the flag is set.

I expect lazy docstrings to be in the distant future (I experimented quite a 
bit with different marshal formats to support this and it wasn't easy at all) 
but I don't want to exclude it.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-29 Thread Inada Naoki


Inada Naoki  added the comment:

> There is a clear disadvantage in moving the docstring from the function's 
> code object to the enclosing code object:
>
> Docstrings are rarely looked at (relative to other operations on functions). 
> Inner functions and comprehensions are created many times for the same code 
> object, and their docstring are (almost) never inspected.
>
> Given that, the obvious enhancement is to create docstrings lazily to reduce 
> the overhead of creating a function.

Docstrings are created during unmarshaling, not when creating function. And 
since comprehensions don't have docstring, GH-28138 has zero cost for 
comprehensions.

Summary of disadvantages of each approaches:

a) status quo (e.g. co_consts[0])

* We can not share co_consts tuples between functions using same constant but 
having different docstring.
* Even when a function don't use docstring, it need to have co_consts[0]=None.

b) CO_DOCSTRING + co_consts[0] (ref: 
https://github.com/iritkatriel/cpython/pull/30 )

* Additional flag for code object
* We can not share co_consts tuples between functions using same constant but 
having different docstring.

c) co_doc

* Increases size of all code object, including non-functions (e.g. 
comprehensions, classes, modules)
  * One pointer per code object is cheap enough.
* Need additional `r_object()` call during unmarshal.
  * Need to measure this overhead.

d) MAKE_FUNCTION (GH-28138)

* Additional flag for MAKE_FUNCTION
* Push docstring to stack (e.g. one LOAD_CONST) only when the function has 
docstring.
  * LOAD_CONST is much cheaper than MAKE_FUNCTION
  * It is cheaper than annotations too.

---

I think (d) is the best and (c) is the second best.
Since no one support (d) for now, I will create a pull request for (c).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-29 Thread Mark Shannon


Mark Shannon  added the comment:

There is a clear disadvantage in moving the docstring from the function's code 
object to the enclosing code object:

Docstrings are rarely looked at (relative to other operations on functions). 
Inner functions and comprehensions are created many times for the same code 
object, and their docstring are (almost) never inspected.

Given that, the obvious enhancement is to create docstrings lazily to reduce 
the overhead of creating a function.

This change would prevent that enhancement.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-27 Thread Inada Naoki


Inada Naoki  added the comment:

Mark, would you take a look, please?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-03 Thread Guido van Rossum


Guido van Rossum  added the comment:

Let's wait until Mark Shannon is back from vacation (another week).

Note that class docstrings *are* contained in the class body code object -- 
there's executable code equivalent to

__doc__ = "this is the docstring"

But I agree it's not easily found without analyzing the bytecode.

Maybe the status quo is best after all? I would like to be able to identify 
code objects for functions, we could add a bit to co_flags for that.

--
nosy: +Mark.Shannon

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-03 Thread Inada Naoki


Inada Naoki  added the comment:

> I am still not convinced that it's a good idea to put the docstring in the 
> surrounding code object. I'd like to be able to see it when I introspect a 
> code object, not just when introspecting a function object (I may be 
> analyzing code only, and it's hard to connect the code object with the 
> NEW_FUNCTION opcode in the parent code object -- you have to scan the 
> bytecode, which is fragile.)

I think that reasoning is not strong enough to add new member to code object.

* Modules and classes don't get docstring from their code objects. Why only 
functions need to store docstring?
* Lambdas, comprehensions, and PEP 649 (if acceptted) uses code objects but no 
docstring. Why they need to pay cost of `co_doc` member? (cost = memory + 
unmarshal time).

Code objects have filename and firstlineno. And there are many functions 
without docstring. So removing docstring from code object won't make inspection 
hard so much.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-03 Thread Guido van Rossum


Guido van Rossum  added the comment:

I am still not convinced that it's a good idea to put the docstring in the 
surrounding code object. I'd like to be able to see it when I introspect a code 
object, not just when introspecting a function object (I may be analyzing code 
only, and it's hard to connect the code object with the NEW_FUNCTION opcode in 
the parent code object -- you have to scan the bytecode, which is fragile.)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-09-03 Thread Inada Naoki


Change by Inada Naoki :


--
keywords: +patch
pull_requests: +26577
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/28138

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-30 Thread Inada Naoki


Inada Naoki  added the comment:

This is WIP pull request. https://github.com/methane/cpython/pull/35
Some tests are failing because of bpo-36521.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-30 Thread Inada Naoki


Inada Naoki  added the comment:

I grepped top 5000 downloaded packages and I can not find any real use of 
PyFunction_New(WithQualName).
So I don't know what is current workflow of PyFunction_New.

My current wip implementation adds new API (e.g. PyFunction_NewWithDoc()).
Old API keep using co_consts[0] for docstring for backward compatibility.

Adding code.co_doc is not free.

* All code objects have one additional pointer. So it eats memory.
* Unmarshal need to call `r_object()` for all code objects. So it increase 
startup time.

Note that code objects is not for only functions. Class, modules, lambdas, 
comprehensions uses code objects without docstring.
And if PEP 649 is accepted, even function annotations will use code objects. It 
will double the number of code objects in the highly annotated source files.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-30 Thread Guido van Rossum


Guido van Rossum  added the comment:

I think we shouldn't change *which* code object contains the docstring 
(changing anything about that is likely to disturb someone's workflow in a way 
that's hard to fix) -- only how PyFunction_New finds that docstring in the code 
object (if that breaks someone's workflow, the fix will be obvious).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-30 Thread Irit Katriel

Irit Katriel  added the comment:

Are you suggesting that anyone who calls PyFunction_New needs to add a doc 
string assignment following the call? This is public api, so that would break 
people’s working code.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-29 Thread Inada Naoki


Inada Naoki  added the comment:

> I think that would require a change in the signature of PyFunction_New.

I don't think so. For example, func_annotation don't require changing 
PyFunction_New().

```
case TARGET(MAKE_FUNCTION): {
PyObject *codeobj = POP();
PyFunctionObject *func = (PyFunctionObject *)
PyFunction_New(codeobj, GLOBALS());
(snip)
if (oparg & 0x04) {
assert(PyTuple_CheckExact(TOP()));
func->func_annotations = POP();
}
```

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

If we are going to move docstring out of co_consts, I would make it a code 
object attribute rather than argument of MAKE_FUNCTION. It saves time on 
function creation.

Most functions do not change docstring after creation. It is the same as code 
docstring, so it consumes zero memory.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Inada Naoki


Inada Naoki  added the comment:

> If docstring is not in co_consts, all co_consts are empty tuple. The empty 
> tuple is nearly zero-cost because its a singleton.

My wrong. Two setters will have `(None,)` tuple. But such tuple can be merged 
at compile time for now. And "common const" [1] approach will make them empty 
tuple.

[1] https://github.com/iritkatriel/cpython/pull/27

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Inada Naoki


Inada Naoki  added the comment:

> Why all the hating on docstrings? What have docstrings done wrong? 

Oh, I don't hate docstrings. I just want to move it from code object to 
function object.
Remove docstring during unmarshal is your idea, not mine.

My main motivation is reducing code size. See this example.

```
class Foo:
def name(self):
"""Return my name"""
return self._name

def set_name(self, name):
"""Set my name"""
self._name = name

def age(self):
"""Return my age"""
return self._age

def set_age(self, age):
"""Set my age"""
self._age = age

>>> Foo.name.__code__.co_consts
('Return my name',)
>>> Foo.set_name.__code__.co_consts
('Set my name', None)
>>> Foo.age.__code__.co_consts
('Return my age',)
>>> Foo.set_age.__code__.co_consts
('Set my age', None)
```

If docstring is not in co_consts, all co_consts are empty tuple. The empty 
tuple is nearly zero-cost because its a singleton.

When comparing adding code.co_doc vs func.__doc__, "we can release old 
docstring" is a (small) pros. But it is no my main motivation.

Classes and modules don't use co_consts[0] anyway. So setting `func.__doc__` is 
better for consistency too.


> I know there's the -OO flag that strips docstrings, but it doesn't work well 
> and I think it was a mistake.

Some libraries (e.g. SQLAlchemy) have very huge docstrings. `-OO` can save 10% 
RAM.

I like an idea adding per-file flag for "don't remove docstring in -OO mode", 
because docstrings can be used runtime in some cases (e.g. docopt).
But it is out of scope of this issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Guido van Rossum


Guido van Rossum  added the comment:

Why all the hating on docstrings? What have docstrings done wrong? I know 
there's the -OO flag that strips docstrings, but it doesn't work well and I 
think it was a mistake.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Inada Naoki


Inada Naoki  added the comment:

> This would actually make it harder to strip docstrings e.g. during 
> unmarshalling, since you don't know which constants refer to docstrings.

We can not strip class docstring anyway.

One idea to strip docstring during startup: Add new opcode only for storing 
__doc__.
We can use it for both of func and class. The opcode will store None if "remove 
docstring during startup" option is enabled. And surrounding code objects will 
be released after executing global/class body.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Inada Naoki


Inada Naoki  added the comment:

> You'd just be moving the problem though -- the docstring would have be 
> included in the co_consts array of the surrounding code object instead of the 
> function object.

As far as I know, surrounding code objects (e.g. global, class body) will be 
removed right after they are executed.
So removing docstring by `func.__doc__ = None` can release memory for 
docstring, while we can not reduce startup time by this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Guido van Rossum


Guido van Rossum  added the comment:

> I'd like to remove docstring from code object at all.
> func.__doc__ can be set by MAKE_FUNCTION or STORE_ATTR.

You'd just be moving the problem though -- the docstring would have be included 
in the co_consts array of the surrounding code object instead of the function 
object.

This would actually make it harder to strip docstrings e.g. during 
unmarshalling, since you don't know which constants refer to docstrings.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Irit Katriel


Irit Katriel  added the comment:

> I'd like to remove docstring from code object at all.
> func.__doc__ can be set by MAKE_FUNCTION or STORE_ATTR.


I think that would require a change in the signature of PyFunction_New.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-08-28 Thread Inada Naoki


Inada Naoki  added the comment:

I'd like to remove docstring from code object at all.
func.__doc__ can be set by MAKE_FUNCTION or STORE_ATTR.

Pros are:

* Code objects can be bit smaller than adding co_doc.
  * Many code objects don't have docstrings. (e.g. lambdas, somprehensions, and 
PEP 649)
* We can strip docstring on runtime and free some memory.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-07-04 Thread Guido van Rossum


Guido van Rossum  added the comment:

In 3.11 the code object will definitely change. We may well put the docstring 
in a dedicated attribute.

--
nosy: +Guido.van.Rossum
versions: +Python 3.11 -Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2021-07-04 Thread Irit Katriel


Change by Irit Katriel :


--
nosy: +gvanrossum, iritkatriel

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2019-04-05 Thread Inada Naoki


Inada Naoki  added the comment:

There is idea about reading docstring lazily, when func.__doc__ is accessed.

I don't think the idea can be implemented by 3.8.  But if we change code object 
now, I want new API can be used to implement this idea.

One breaking change is better than two.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2019-04-05 Thread Terry J. Reedy


Terry J. Reedy  added the comment:

So we have the same issue with f.__name__ and f.__code__.co_name becoming 
unsynchronized.

FWIW, I would prefer that the code docstring be co_doc, rather than hidden in 
co_constants, so that 'name' and 'doc' follow the same pattern.

--
nosy: +terry.reedy

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2019-04-04 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

I think it is for historical reasons. Currently statements consisting of a 
constant expression are not compiled to a bytecode and do not add a value to 
co_consts. But when this optimization was not yet added, the first element of 
co_consts with a docstring was a docstring. So why add co_doc if the docstring 
is already available?

This can be changed, but this is a breaking change, and what we will got 
instead?

Function's __name__ is set from code object's co_name.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2019-04-03 Thread Inada Naoki


Change by Inada Naoki :


--
nosy: +inada.naoki

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2019-04-03 Thread Raymond Hettinger


Raymond Hettinger  added the comment:

> co_consts[0] is used for setting the initial value of __doc__.

Why is __doc__ set this way, but __name__ is set directly on the function 
object?  Setting __doc__ from the code object seems like an odd implementation 
hack that puts the responsibility in the wrong place and that leaves a dangling 
reference when __doc__ is updated.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2019-04-03 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

co_consts[0] is used for setting the initial value of __doc__. See 
PyFunction_NewWithQualName().

consts = ((PyCodeObject *)code)->co_consts;
if (PyTuple_Size(consts) >= 1) {
doc = PyTuple_GetItem(consts, 0);
if (!PyUnicode_Check(doc))
doc = Py_None;
}
else
doc = Py_None;
Py_INCREF(doc);
op->func_doc = doc;

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue36521] Consider removing docstrings from co_consts in code objects

2019-04-03 Thread Raymond Hettinger


New submission from Raymond Hettinger :

Function objects provide __doc__ as a documented writeable attribute.  However, 
code objects also have the same information in co_consts[0].  When __doc__ is 
changed, the latter keeps a reference to the old string.  Also, the disassembly 
shows that co_consts[0] is never used.  Can we remove the entry in co_consts?  
It looks like a compilation artifact rather than something that we need or want.


>>> def f(x):
'y'

>>> f.__doc__
'y'
>>> f.__code__.co_consts[0]
'y'
>>> f.__doc__ = 'z'
>>> f.__code__.co_consts[0]
'y'

>>> from dis import dis
>>> dis(f)
  2   0 LOAD_CONST   1 (None)
  2 RETURN_VALUE

--
components: Interpreter Core
messages: 339422
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Consider removing docstrings from co_consts in code objects
type: resource usage
versions: Python 3.8

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com