[Python-Dev] Usefulness of site-python?

2013-10-23 Thread Antoine Pitrou

Hello,

I've just discovered there is a little-known feature in site.py: if a
$PREFIX/lib/site-python exists (e.g. /usr/lib/site-python), it is added
to sys.path in addition to the versioned site-packages. But only under
Unix ("if os.sep == '/'").

Has anyone seen that feature in the real world? Debian doesn't use
site-python, but its own /usr/share/pyshared.

For the record, it was added in b53347c8260e with the following commit
message:

user:Guido van Rossum 
date:Wed Sep 03 21:41:30 1997 +
files:   Lib/site.py
description:
Give in to Mike Meyer -- add *both* lib/python1.5/packages and
lib/site-python to the path (if they exist).  This is a reasonable
compromise.


Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Switch subprocess stdin to a socketpair, attempting to fix issue #19293 (AIX

2013-10-23 Thread Antoine Pitrou
Le Tue, 22 Oct 2013 10:54:03 +0200,
Victor Stinner  a écrit :

> Hi,
> 
> Would it be possible to use os.pipe() on all OSes except AIX?
> 
> Pipes and socket pairs may have minor differences, but some
> applications may rely on these minor differences. For example, is the
> buffer size the same? For example, in test.support, we have two
> constants: PIPE_MAX_SIZE (4 MB) and SOCK_MAX_SIZE (16 MB).

For the record, pipe I/O seems a little faster than socket I/O under
Linux:

$ ./python -m timeit -s "import os, socket; a,b = socket.socketpair(); 
r=a.fileno(); w=b.fileno(); x=b'x'*1000" "os.write(w, x); os.read(r, 1000)"
100 loops, best of 3: 1.1 usec per loop

$ ./python -m timeit -s "import os, socket; a,b = socket.socketpair(); 
x=b'x'*1000"
"a.sendall(x); b.recv(1000)"
100 loops, best of 3: 1.02 usec per loop

$ ./python -m timeit -s "import os; r, w = os.pipe(); x=b'x'*1000" "os.write(w, 
x); os.read(r, 1000)"
100 loops, best of 3: 0.82 usec per loop


Regards

Antoine.


> 
> Victor
> 
> 2013/10/22 guido.van.rossum :
> > http://hg.python.org/cpython/rev/2a0bda8d283d
> > changeset:   86557:2a0bda8d283d
> > user:Guido van Rossum 
> > date:Mon Oct 21 20:37:14 2013 -0700
> > summary:
> >   Switch subprocess stdin to a socketpair, attempting to fix issue
> > #19293 (AIX hang).
> >
> > files:
> >   Lib/asyncio/unix_events.py|  29 +-
> >   Lib/test/test_asyncio/test_unix_events.py |   7 ++
> >   2 files changed, 32 insertions(+), 4 deletions(-)
> >
> >
> > diff --git a/Lib/asyncio/unix_events.py b/Lib/asyncio/unix_events.py
> > --- a/Lib/asyncio/unix_events.py
> > +++ b/Lib/asyncio/unix_events.py
> >  if stdin == subprocess.PIPE:
> >  self._pipes[STDIN] = None
> > +# Use a socket pair for stdin, since not all platforms
> > +# support selecting read events on the write end of a
> > +# socket (which we use in order to detect closing of
> > the
> > +# other end).  Notably this is needed on AIX, and works
> > +# just fine on other platforms.
> > +stdin, stdin_w = self._loop._socketpair()
> >  if stdout == subprocess.PIPE:
> >  self._pipes[STDOUT] = None
> >  if stderr == subprocess.PIPE:



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Switch subprocess stdin to a socketpair, attempting to fix issue #19293 (AIX

2013-10-23 Thread Victor Stinner
"For the record, pipe I/O seems a little faster than socket I/O under Linux"

In and old (2006) email on LKML (Linux kernel), I read:
"as far as I know pipe() is now much faster than socketpair(), because pipe()
uses the zero-copy mechanism."
https://lkml.org/lkml/2006/9/24/121

On Linux, splice() can also be used with pipes for zero-copy
operations. I don't know if splice() works with socketpair(). Well, I
don't think that Python uses splice() now, but it may be interesting
to use it. Or sendfile() uses it maybe internally?

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] cpython: Switch subprocess stdin to a socketpair, attempting to fix issue #19293 (AIX

2013-10-23 Thread Antoine Pitrou
Le Wed, 23 Oct 2013 13:53:40 +0200,
Victor Stinner  a écrit :

> "For the record, pipe I/O seems a little faster than socket I/O under
> Linux"
> 
> In and old (2006) email on LKML (Linux kernel), I read:
> "as far as I know pipe() is now much faster than socketpair(),
> because pipe() uses the zero-copy mechanism."
> https://lkml.org/lkml/2006/9/24/121
> 
> On Linux, splice() can also be used with pipes for zero-copy
> operations. I don't know if splice() works with socketpair().

splice() only works with pipes. socketpair() returns sockets, which are
not pipes :-)

> Well, I
> don't think that Python uses splice() now, but it may be interesting
> to use it.

Where do you want to use it?

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Usefulness of site-python?

2013-10-23 Thread Guido van Rossum
On Wed, Oct 23, 2013 at 2:46 AM, Antoine Pitrou  wrote:

>
> Hello,
>
> I've just discovered there is a little-known feature in site.py: if a
> $PREFIX/lib/site-python exists (e.g. /usr/lib/site-python), it is added
> to sys.path in addition to the versioned site-packages. But only under
> Unix ("if os.sep == '/'").
>
> Has anyone seen that feature in the real world? Debian doesn't use
> site-python, but its own /usr/share/pyshared.
>
> For the record, it was added in b53347c8260e with the following commit
> message:
>
> user:Guido van Rossum 
> date:Wed Sep 03 21:41:30 1997 +
> files:   Lib/site.py
> description:
> Give in to Mike Meyer -- add *both* lib/python1.5/packages and
> lib/site-python to the path (if they exist).  This is a reasonable
> compromise.
>

Wow, a blast from the past. I don't see python-dev archives from those
times but I'm sure there must have been discussion about whether
site-packages were shareable between Python versions. I think history has
shown that it's better to install them per version, so I suggest that we
remove that feature. (People who want it can always patch up their own
$PYTHONPATH.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] About 'superobject' and descriptors

2013-10-23 Thread hakril lse
Hi,

I have a question about a choice of implementation concerning
'superobject' with the descriptors.

When a 'superobject' looks for a given attribute, it runs through the
mro of the object.
If it finds a descriptor, the 'superobject' calls the __get__ method
with 'starttype = su->obj_type' as third argument (in typeobject.c:
super_getattro).

So, the 'type' argument of __get__ does not give more information
about the 'real calling type' in this case.
It seems that this is just a redundant information of inst.__class__.

For example:

# A.descr is a descriptor
# B inherit from A
# C inherit from B

c = C()
c.descr
super(C, c).descr
super(B, c).descr

In these 3 cases the __get__ method is called with the same arguments
that are : __get__(descr, c, C).

If this behavior is really expected: Could you explain why ? because
it means that I am missing something obvious.
Because, at first sight, the 'type' argument seems to be the perfect
place to get the type of the 'real calling class'.

Thank you,

-- 
hakril
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-23 Thread Victor Stinner
Hi,

I was at the restaurant with Charles-François and Antoine yesterday to
discuss the PEP 454 (tracemalloc). They gave me a lot of advices to
improve the PEP. Most remarks were request to remove code :-) I also
improved surprising/strange APIs (like the infamous
GroupedStats.compate_to(None)).

HTML version:
http://www.python.org/dev/peps/pep-0454/

See also the documentation of the implementation, especially examples:
http://www.haypocalc.com/tmp/tracemalloc/library/tracemalloc.html#examples


Major changes:

* GroupedStats.compare_to()/statistics() now returns a list of
Statistic instances instead of a tuple with 5 items
* StatsDiff class has been removed
* Metrics have been removed
* Remove Filter.match*() methods
* Replace get_object_trace() function with get_object_traceback()
* More complete list of prior work. There are 11 Python projects to
debug memory leaks! I mentioned that PySizer implemented something
similar to tracemalloc 8 years ago. I also rewrote the Rationale
section
* Rename some classes, attributes and functions

Mercurial log of the PEP:
http://hg.python.org/peps/log/f851d4a1622a/pep-0454.txt



PEP: 454
Title: Add a new tracemalloc module to trace Python memory allocations
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 3-September-2013
Python-Version: 3.4


Abstract


This PEP proposes to add a new ``tracemalloc`` module to trace memory
blocks allocated by Python.


Rationale
=

Classic generic tools like Valgrind can get the C traceback where a
memory block was allocated. Using such tools to analyze Python memory
allocations does not help because most memory blocks are allocated in
the same C function, in ``PyMem_Malloc()`` for example. Moreover, Python
has an allocator for small object called "pymalloc" which keeps free
blocks for efficiency. This is not well handled by these tools.

There are debug tools dedicated to the Python language like ``Heapy``
``Pympler`` and ``Meliae`` which lists all live objects using the
garbage module (functions like ``gc.get_objects()``,
``gc.get_referrers()`` and ``gc.get_referents()``), compute their size
(ex: using ``sys.getsizeof()``) and group objects by type. These tools
provide a better estimation of the memory usage of an application.  They
are useful when most memory leaks are instances of the same type and
this type is only instantiated in a few functions. Problems arise when
the object type is very common like ``str`` or ``tuple``, and it is hard
to identify where these objects are instantiated.

Finding reference cycles is also a difficult problem.  There are
different tools to draw a diagram of all references.  These tools
cannot be used on large applications with thousands of objects because
the diagram is too huge to be analyzed manually.


Proposal


Using the customized allocation API from PEP 445, it becomes easy to
set up a hook on Python memory allocators. A hook can inspect Python
internals to retrieve Python tracebacks. The idea of getting the current
traceback comes from the faulthandler module. The faulthandler dumps
the traceback of all Python threads on a crash, here is the idea is to
get the traceback of the current Python thread when a memory block is
allocated by Python.

This PEP proposes to add a new ``tracemalloc`` module, as a debug tool
to trace memory blocks allocated by Python. The module provides the
following information:

* Statistics on allocated memory blocks per filename and per line
  number: total size, number and average size of allocated memory blocks
* Computed differences between two snapshots to detect memory leaks
* Traceback where a memory block was allocated

The API of the tracemalloc module is similar to the API of the
faulthandler module: ``enable()``, ``disable()`` and ``is_enabled()``
functions, an environment variable (``PYTHONFAULTHANDLER`` and
``PYTHONTRACEMALLOC``), and a ``-X`` command line option (``-X
faulthandler`` and ``-X tracemalloc``). See the
`documentation of the faulthandler module
`_.

The idea of tracing memory allocations is not new. It was first
implemented in the PySizer project in 2005. PySizer was implemented
differently: the traceback was stored in frame objects and some Python
types were linked the trace with the name of object type. PySizer patch
on CPython adds a overhead on performances and memory footprint, even if
the PySizer was not used. tracemalloc attachs a traceback to the
underlying layer, to memory blocks, and has no overhead when the module
is disabled.

The tracemalloc module has been written for CPython. Other
implementations of Python may not be able to provide it.


API
===

To trace most memory blocks allocated by Python, the module should be
enabled as early as possible by setting the ``PYTHONTRACEMALLOC``
environment variable to ``1``, or by using ``-X tracemalloc`` command
line option. 

Re: [Python-Dev] [Python-checkins] cpython: Switch subprocess stdin to a socketpair, attempting to fix issue #19293 (AIX

2013-10-23 Thread Charles-François Natali
> For the record, pipe I/O seems a little faster than socket I/O under
> Linux:
>
> $ ./python -m timeit -s "import os, socket; a,b = socket.socketpair(); 
> r=a.fileno(); w=b.fileno(); x=b'x'*1000" "os.write(w, x); os.read(r, 1000)"
> 100 loops, best of 3: 1.1 usec per loop
>
> $ ./python -m timeit -s "import os, socket; a,b = socket.socketpair(); 
> x=b'x'*1000"
> "a.sendall(x); b.recv(1000)"
> 100 loops, best of 3: 1.02 usec per loop
>
> $ ./python -m timeit -s "import os; r, w = os.pipe(); x=b'x'*1000" 
> "os.write(w, x); os.read(r, 1000)"
> 100 loops, best of 3: 0.82 usec per loop

That's a raw write()/read() benchmark, but it's not taking something
important into account: pipes/socket are usually used to communicate
between concurrently running processes. And in this case, an important
factor is the pipe/socket buffer size: the smaller it is, the more
context switches (due to blocking writes/reads) you'll get, which
greatly decreases throughput.
And by default, Unix sockets have large buffers than pipes (between 4K
and 64K for pipes depending on the OS):

I wrote a quick benchmark forking a child process, with the parent
writing data through the pipe, and waiting for the child to read it
all. here are the results (on Linux):

# time python /tmp/test.py pipe

real0m2.479s
user0m1.344s
sys 0m1.860s

# time python /tmp/test.py socketpair

real0m1.454s
user0m1.242s
sys 0m1.234s

So socketpair is actually faster.

But as noted by Victor, there a slight differences between pipes and
sockets I can think of:
- pipes guarantee write atomicity if less than PIPE_BUF is written,
which is not the case for sockets
- more annoying: in subprocess, the pipes are not set non-blocking:
after a select()/poll() returns a FD write-ready, we write less than
PIPE_BUF at a time to avoid blocking: this likely wouldn't work with a
socketpair

But this patch doesn't touch subprocess itself, and the FDs is only
used by asyncio, which sets them non-blocking: so this could only be
an issue for the spawned process, if it does rely on the two
pipe-specific behaviors above.

OTOH, having a unique implementation on all platforms makes sense, and
I don't know if it'll actually be a problem in practice, we we could
ship as-is and wait until someone complains ;-)

cf
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-23 Thread Kristján Valur Jónsson

This might be a good place to make some comments.
I have discussed some of this in private with Victor, but wanted to make them 
here, for the record.

Mainly, I agree with removing code.  I'd like to go further, since in my 
experience, the less code in C, the better.

1) really, all that is required in terms of data is the traceback.get_traces() 
function.  Further, it _need_ not return addresses since they are not required 
for analysis.  It is sufficient for it to return a list of (traceback, size, 
count) tuples.   I understand that the get_stats function is useful for quick 
information so it can be kept, although it provides no added information, only 
convenience
2) get_object_address() and get_trace(address) functions seem redundant.  All 
that is required is get_object_traceback(), I think.
3) set_traceback_limit().  Truncating tracebacks is bad.  Particularly if it is 
truncated at the top end of the callstack, because then information looses 
cohesion, namely, the common connection point, the root.  If traceback limits 
are required, I suggest being able to specifiy that we truncate the leaf-end of 
the tracebacks.
4) add_filter().  This is unnecessary. Information can be filtered on the 
python side.  Defining Filter as a C type is not necessary.  Similarly, module 
level filter functions can be dropped.
5) Filter, Snapshot, GroupedStats, Statistics:  These classes, if required, can 
be implemented in a .py module.
6) Snapshot dump/load():  It is unusual to see load and save functions taking 
filenames in a python module, and a module implementing its own file IO.  I 
have suggested simply to add Pickle support.  Alternatively, support file-like 
objects or bytes (loads/dumps)

My experience is that performance and memory use hardly ever matters when you 
are doing diagnostic analysis of a program.  By definition, you are examining 
your program in a lab and you can afford 2 times, or 10 times, the memory use, 
and the slowing down of the program by 2 to 10.  I think it might be premature 
to move all of the statistics and analysis into the PEP and into C, because a) 
it assumes the need to optimize and b) it sets the specification in stone, 
before the module gets the chance to be honed by actual real-world use cases.

I'd also like to point out (just to say "I told you so" :) ) that this module 
is precisely the reason I suggested we include "const char *file, int lineno" 
in the API for PEP 445, because that would allow us, in debug builds, to get 
one extra stack level, namely the position of the actual C allocation in the 
python source.

If the above sounds negative, then that's not the intent.  I'm really happy 
Victor is putting in this effort here and I know this will be an essential tool 
for the future Python developer.  Those that brave the jump to version 3, that 
is :)

Cheers,

Kristján


Frá: Python-Dev [[email protected]] fyrir 
hönd Victor Stinner [[email protected]]
Sent: 23. október 2013 18:25
To: Python Dev
Efni: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

Hi,

I was at the restaurant with Charles-François and Antoine yesterday to
discuss the PEP 454 (tracemalloc). They gave me a lot of advices to
improve the PEP. Most remarks were request to remove code :-) I also
improved surprising/strange APIs (like the infamous
GroupedStats.compate_to(None)).

HTML version:
http://www.python.org/dev/peps/pep-0454/

See also the documentation of the implementation, especially examples:
http://www.haypocalc.com/tmp/tracemalloc/library/tracemalloc.html#examples


Major changes:

* GroupedStats.compare_to()/statistics() now returns a list of
Statistic instances instead of a tuple with 5 items
* StatsDiff class has been removed
* Metrics have been removed
* Remove Filter.match*() methods
* Replace get_object_trace() function with get_object_traceback()
* More complete list of prior work. There are 11 Python projects to
debug memory leaks! I mentioned that PySizer implemented something
similar to tracemalloc 8 years ago. I also rewrote the Rationale
section
* Rename some classes, attributes and functions

Mercurial log of the PEP:
http://hg.python.org/peps/log/f851d4a1622a/pep-0454.txt



PEP: 454
Title: Add a new tracemalloc module to trace Python memory allocations
Version: $Revision$
Last-Modified: $Date$
Author: Victor Stinner 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 3-September-2013
Python-Version: 3.4


Abstract


This PEP proposes to add a new ``tracemalloc`` module to trace memory
blocks allocated by Python.


Rationale
=

Classic generic tools like Valgrind can get the C traceback where a
memory block was allocated. Using such tools to analyze Python memory
allocations does not help because most memory blocks are allocated in
the same C function, in ``PyMem_Malloc()`` for example. Moreover, Python
has an allocator for small object called 

[Python-Dev] pathlib (PEP 428) status

2013-10-23 Thread Charles-François Natali
Hi,

What's the current status of pathlib? Is it targeted for 3.4?

It would be a really nice addition, and AFAICT it has already been
maturing a while on pypi, and discussed several times here.
If I remember correctly, the only remaining issue was stat()'s result caching.

cf
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pathlib (PEP 428) status

2013-10-23 Thread Christian Heimes
Am 23.10.2013 23:37, schrieb Charles-François Natali:
> Hi,
> 
> What's the current status of pathlib? Is it targeted for 3.4?
> 
> It would be a really nice addition, and AFAICT it has already been
> maturing a while on pypi, and discussed several times here.
> If I remember correctly, the only remaining issue was stat()'s result caching.

Hi,

I'd like to see pathlib in 3.4 as well. Last week at PyCon.DE in Cologne
several people have asked me about pathlib. We even had a BarCamp
session about path libraries for Python. A couple of German Python users
have promised to contribute doc improvements soonish.

AFAIK stat caching and a os.listdir() generator with stat `recycling`
(dirent->d_type) are open issues. I suggest Python 3.4 should ignore
these features for now but prepare the API and documentation for future
enhancements.

+1 for PEP 428

Christian
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] About 'superobject' and descriptors

2013-10-23 Thread Nick Coghlan
On 24 Oct 2013 03:37, "hakril lse"  wrote:
>
> Hi,
>
> I have a question about a choice of implementation concerning
> 'superobject' with the descriptors.
>
> When a 'superobject' looks for a given attribute, it runs through the
> mro of the object.
> If it finds a descriptor, the 'superobject' calls the __get__ method
> with 'starttype = su->obj_type' as third argument (in typeobject.c:
> super_getattro).
>
> So, the 'type' argument of __get__ does not give more information
> about the 'real calling type' in this case.
> It seems that this is just a redundant information of inst.__class__.
>
> For example:
>
> # A.descr is a descriptor
> # B inherit from A
> # C inherit from B
>
> c = C()
> c.descr
> super(C, c).descr
> super(B, c).descr
>
> In these 3 cases the __get__ method is called with the same arguments
> that are : __get__(descr, c, C).
>
> If this behavior is really expected: Could you explain why ? because
> it means that I am missing something obvious.
> Because, at first sight, the 'type' argument seems to be the perfect
> place to get the type of the 'real calling class'.

The third argument is just there to handle the case where the instance is
None (i.e. lookup directly on the class rather than an instance).

Cheers,
Nick.

>
> Thank you,
>
> --
> hakril
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] About 'superobject' and descriptors

2013-10-23 Thread Steven D'Aprano
Hi Hakril,

I think this question is probably off-topic for this list. This list is 
for development of the Python compiler, not for development with Python, 
and questions like this should probably go to [email protected] 
(also mirrored as comp.lang.python if you have Usenet access). But I 
have a brief observation below:

On Wed, Oct 23, 2013 at 07:29:03PM +0200, hakril lse wrote:

> For example:
> 
> # A.descr is a descriptor
> # B inherit from A
> # C inherit from B
> 
> c = C()
> c.descr
> super(C, c).descr
> super(B, c).descr
> 
> In these 3 cases the __get__ method is called with the same arguments
> that are : __get__(descr, c, C).
>
> If this behavior is really expected: Could you explain why ? because
> it means that I am missing something obvious.
> Because, at first sight, the 'type' argument seems to be the perfect
> place to get the type of the 'real calling class'.

I'm afraid I don't understand what you mean by "real calling class", if 
it is not the class of the instance doing the calling.

I have a feeling that perhaps you think that calls to super are 
equivalent to "the parent of the class", and so you expect:

c.desc => desc.__get__(c, C)
super(C, c).desc => desc.__get__(c, B)
super(B, c).desc => desc.__get__(c, A)

but that would not be correct. super is equivalent to "the next class in 
the MRO of the instance doing the calling", and in all cases, the 
"real-calling instance" is c, and the "real calling class" is the class 
of c, namely C.

Extended discussion of this should go to python-list, but I think I have 
the basics right.


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!

2013-10-23 Thread Victor Stinner
Hi,

2013/10/23 Kristján Valur Jónsson :
> This might be a good place to make some comments.
> I have discussed some of this in private with Victor, but wanted to make them 
> here, for the record.

Yes, I prefer to discuss the PEP on python-dev. It's nice to get more
feedback, I expect to get a better API at the end!

Oh, you have a lot of remarks, I will try to reply to all of them.


> 1) really, all that is required in terms of data is the 
> traceback.get_traces() function.  Further, it _need_ not return addresses 
> since they are not required for analysis.  It is sufficient for it to return 
> a list of (traceback, size, count) tuples.   I understand that the get_stats 
> function is useful for quick information so it can be kept, although it 
> provides no added information, only convenience
> 2) get_object_address() and get_trace(address) functions seem redundant.  All 
> that is required is get_object_traceback(), I think.

The use case of get_traces() + get_object_trace() is to retrieve the
traceback of all alive Python objects for tools like Melia, Pympler or
Heapy. The only motivation is performance.

I wrote a benchmark using 10^6 objects and... get_traces() x 1 +
get_object_address() x N is 40% *slower* than calling
get_object_traceback() x N. So get_object_traceback() is faster for
this use case, especially if you don't want to the traceback of all
objects, but only a few of them.

Charles-Francois already asked me to remove everything related to
address, so let's remove two more functions:

- remove get_object_address()
- remove get_trace()
- get_traces() returns a list
- remove 'address' key type of Snapshot.group_by()
- 'traceback' key type of Snapshot.group_by() groups traces by
traceback, instead of (address, traceback) => it is closer to what you
suggested me privately (generate "top stats" but keep the whole
traceback)


> 1) really, all that is required in terms of data is the 
> traceback.get_traces() function.  Further, it _need_ not return addresses 
> since they are not required for analysis.  It is sufficient for it to return 
> a list of (traceback, size, count) tuples.   I understand that the get_stats 
> function is useful for quick information so it can be kept, although it 
> provides no added information, only convenience

For the get_stats() question, the motivation is also performances.
Let's try a benchmark on my laptop.

Test 1. With the Python test suite, 467,738 traces limited to 1 frame:

* take a snapshot with traces (call get_traces()): 293 ms
* write the snapshot on disk: 167 ms
* load the snapshot from disk: 184 ms
* group by filename *using stats*: 24 ms (754 different filenames)
* group by line *using stats*: 28 ms (31827 different lines)
* group by traceback using traces: 333 ms (31827 different tracebacks,
the traceback is limited to 1 frame)


Test 2. With the Python test suite, 495,571 traces limited to 25 frame:

* take a snapshot without traces (call get_stats()): 35 ms
* take a snapshot with traces (call get_stats() and get_traces()): 532 ms
* write the snapshot on disk: 565 ms
* load the snapshot from disk: 739 ms
* group by filename *using stats*:  25 ms (906 different filenames)
* group by line *using stats*: 22 ms (36940 different lines)
* group by traceback using traces: 786 ms (66314 different tracebacks)


Test 3. tracemalloc modified to not use get_stats() anymore, only use
traces. With the Python test suite, 884719 traces limited to 1 frame:

* take a snapshot with traces (call get_traces()): 531 ms
* write the snapshot on disk: 278 ms
* load the snapshot from disk: 298 ms
* group by filename *using traces*: 706 ms (1329 different filenames)
* group by line *using traces*: 724 ms (55,349 different lines)
* group by traceback using traces: 731 ms (55,349 different
tracebacks, the traceback is limited to 1 frame)

I'm surprised: it's faster than the benchmark I ran some weeks ago.
Maybe I optimized something? The most critical operation, taking a
snapshot takes half a second, so it's enough efficient.

Let's remove even more code:

- remove get_stats()
- remove Snapshot.stats

Snapshot.group_by() can easily recompute statistics by filename and
line number from traces.

(To be honest, get_stats() and get_traces() used together have an
issue: they may be inconsistent if some objects are allocated between.
Snapshot.apply_filters() has to apply filters on traces and stats for
example. It's simpler to only manipulate traces.)



> 3) set_traceback_limit().  Truncating tracebacks is bad.  Particularly if it 
> is truncated at the top end of the callstack, because then information looses 
> cohesion, namely, the common connection point, the root.  If traceback limits 
> are required, I suggest being able to specifiy that we truncate the leaf-end 
> of the tracebacks.

If the traceback is truncated and 90% of all memory is allocated at
the same Python line: I prefer to have the get the most recent frame,
than the n-th function from main() which may indi

[Python-Dev] PEP 451 update

2013-10-23 Thread Eric Snow
I've had some offline discussion with Brett and Nick about PEP 451
which has led to some meaningful clarifications in the PEP.  In the
interest of pulling further discussions back onto this
(archived/public) list, here's an update of what we'd discussed and
where things are at. :)

* path entry finders indicate that they found part of a possible
namespace package by returning a spec with no loader set (but with
submodule_search_locations set).  Brett wanted some clarification on
this.
* The name/path signature and attributes of file-based finders in
importlib will no longer be changing.  Brett had some suggestions on
the proposed change and it became clear that the the change was
actually pointless.
* I've asserted that there shouldn't be much difficulty in adjusting
pkgutil and other modules to work with ModuleSpec.
* Brett asked for clarification on whether the "load()" example from
the PEP would be realized implicitly by the import machinery or
explicitly as a method on ModuleSpec.  This has bearing on the ability
of finders to return instances of ModuleSpec subclasses or even
ModuleSpec-like objects (a la duck typing).  The answer is the it will
not be a method on ModuleSpec, so it is effectively just part of the
general import system implementation.  Finders may return any object
that provides the attributes of ModuleSpec.  I will be updating the
PEP to make these points clear.

* Nick suggested writing a draft patch for the language reference
changes (the import page).  Such a patch will be a pretty good
indicator of the impact of PEP 451 on the import system and should
highlight any design flaws in the API.  This is on my to-do list
(hopefully by tomorrow).
* Nick also suggested moving all ModuleSpec methods to a separate
class that will simply make use of a separate, existing ModuleSpec
instance.  This will help address several issues, particularly by
relaxing the constraints on what finders can return, but also by
avoiding the unnecessary exposure of the methods via every
module.__spec__.  I plan on going with this, but currently am trying
out the change to see if there are any problems I've missed.  Once I
feel good about it I'll update the PEP.

That about sums up our discussions.  I have a couple of outstanding
updates to the PEP to make when I get a chance, as well as putting up
a language reference patch for review.

-eric
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com