Re: [Python-Dev] Document performance requirements?

2006-07-25 Thread Neal Becker
On Friday 21 July 2006 7:49 am, Nick Coghlan wrote:
 Neal Becker wrote:
  For a recent project I needed to select a container.  There are plenty of
  python data structures to choose from.  It seems that information on
  performance is missing (or not easy to find).
 
  I think Python should include performance in the documentation of common
  data structures to help users select the appropriate types.  Something in
  the style of c++ STL.

 Do you mean absolute performance, or do you mean algorithmic order
 guarantees? I thought the latter were already documented. . .


The latter.  Where are they documented?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-24 Thread Armin Rigo
Hi Giovanni,

On Sun, Jul 23, 2006 at 03:30:50PM +0200, Giovanni Bajo wrote:
 I'm not sure big-O tells the whole truth. For instance, do we want to allow
 an implementation to use a hash table as underlying type for a list? It
 would match big-O requirements, but would still be slower than a plain array
 because of higher overhead of implementation (higher constant factor).

A big-O difference can make the difference between a program that takes
0.5 seconds or 2 hours to run.  This is more important than a constant
factor difference, which different implementations are bound to exhibit
anyway.

 And if this is allowed, I would like to find in CPython tutorials and
 documentations a simple statement like: to implement the list and match its
 requirements, CPython choose a simple array as underlying data structure.

Yes, the big-O notes don't have to be too technical: the docs should
tell people to think about Python lists as simple arrays, and the O
requirements follow naturally.


A bientot,

Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-24 Thread Scott Dial
Between the two of you, I think you have made the case that the language 
specification is better to not include such details. As you both note, 
it is difficult to capture the essence of what is desired from the 
performance of the implementation. To tag on other version, what about 
Big-O space concerns with things like list.sort. I'm sure there are 
other things to add as well.

It seems reasonable to me that everyone has the same interests in mind 
when they write a program. Make it good, make it fast, make it small, 
etc. These sort of details should work themselves out if they are 
actually important. All of these algorithms should be treated as 
implementation accidents.

Having the information about CPython's implementation in the docs would 
be good. And go most of the way towards having everyone on the same page.

-- 
Scott Dial
[EMAIL PROTECTED]
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-23 Thread Giovanni Bajo
Armin Rigo wrote:

 I think that O-wise the current CPython situation should be documented
 as a minimal requirement for implementations of the language, with
 just one exception: the well-documented don't rely on this hack in
 2.4 to make repeated 'str += str' amortized linear, for which the 2.3
 quadratic behavior is considered compliant enough.

 I suppose that allowing implementations to provide better algorithmic
 complexities than required is fine, although I can think of some
 problems with that (e.g. nice and efficient user code that would
 perform horribly badly on CPython).

I'm not sure big-O tells the whole truth. For instance, do we want to allow
an implementation to use a hash table as underlying type for a list? It
would match big-O requirements, but would still be slower than a plain array
because of higher overhead of implementation (higher constant factor).

And if this is allowed, I would like to find in CPython tutorials and
documentations a simple statement like: to implement the list and match its
requirements, CPython choose a simple array as underlying data structure.
-- 
Giovanni Bajo

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-22 Thread Martin v. Löwis
Jason Orendorff wrote:
 On 7/21/06, Nick Coghlan [EMAIL PROTECTED] wrote:
 However, I'm also struggling to think of a case other than list vs deque 
 where
 the choice of a builtin or standard library data structure would be dictated
 by big-O() concerns.
 
 OK, but that doesn't mean the information is unimportant.  +1 on
 making this something of a priority.  People looking for this info
 should find it in the obvious place.  Some are unobvious. (How fast is
 dict.__eq__ on average? Worst case?)

Contributions are welcome.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-22 Thread Armin Rigo
Hi,

On Sat, Jul 22, 2006 at 12:33:45PM +1000, Nick Coghlan wrote:
 Agreed, but there's more to doing that than just writing down the O() implied 
 by the current CPython implementation - it's up to Guido to decide which of 
 the constraints are part of the language definition, and which are 
 implementation accidents

I think that O-wise the current CPython situation should be documented
as a minimal requirement for implementations of the language, with
just one exception: the well-documented don't rely on this hack in 2.4
to make repeated 'str += str' amortized linear, for which the 2.3
quadratic behavior is considered compliant enough.

I suppose that allowing implementations to provide better algorithmic
complexities than required is fine, although I can think of some
problems with that (e.g. nice and efficient user code that would perform
horribly badly on CPython).


Armin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Document performance requirements?

2006-07-21 Thread Neal Becker
For a recent project I needed to select a container.  There are plenty of
python data structures to choose from.  It seems that information on
performance is missing (or not easy to find).

I think Python should include performance in the documentation of common
data structures to help users select the appropriate types.  Something in
the style of c++ STL.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-21 Thread Nick Coghlan
Neal Becker wrote:
 For a recent project I needed to select a container.  There are plenty of
 python data structures to choose from.  It seems that information on
 performance is missing (or not easy to find).
 
 I think Python should include performance in the documentation of common
 data structures to help users select the appropriate types.  Something in
 the style of c++ STL.

Do you mean absolute performance, or do you mean algorithmic order guarantees? 
I thought the latter were already documented. . .

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-21 Thread Neal Becker
Nick Coghlan wrote:

 Neal Becker wrote:
 For a recent project I needed to select a container.  There are plenty of
 python data structures to choose from.  It seems that information on
 performance is missing (or not easy to find).
 
 I think Python should include performance in the documentation of common
 data structures to help users select the appropriate types.  Something in
 the style of c++ STL.
 
 Do you mean absolute performance, or do you mean algorithmic order
 guarantees? I thought the latter were already documented. . .
 

The latter.  Where is it documented?

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-21 Thread Nick Coghlan
Neal Becker wrote:
 On Friday 21 July 2006 7:49 am, Nick Coghlan wrote:
 Neal Becker wrote:
 For a recent project I needed to select a container.  There are plenty of
 python data structures to choose from.  It seems that information on
 performance is missing (or not easy to find).

 I think Python should include performance in the documentation of common
 data structures to help users select the appropriate types.  Something in
 the style of c++ STL.
 Do you mean absolute performance, or do you mean algorithmic order
 guarantees? I thought the latter were already documented. . .

 
 The latter.  Where are they documented?

Just because I think something, it doesn't mean it's true :)

The only reference I can actually find is the one in the collections module 
docs pointing out that collections.deque permits O(1) insertions and removals 
at the beginning of the sequence, as well as at the end (whereas lists are 
O(n) for operations at the beginning due to the resulting memory copying).

However, I'm also struggling to think of a case other than list vs deque where 
the choice of a builtin or standard library data structure would be dictated 
by big-O() concerns.

list vs array.array is based on memory efficiency
list vs deque is based on whether or not you need O(1) push/pop at both ends
list vs set is based on whether or not ordering matters
set vs dict is based on whether or not you need to map keys to values

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Document performance requirements?

2006-07-21 Thread Jason Orendorff
On 7/21/06, Nick Coghlan [EMAIL PROTECTED] wrote:
 However, I'm also struggling to think of a case other than list vs deque where
 the choice of a builtin or standard library data structure would be dictated
 by big-O() concerns.

OK, but that doesn't mean the information is unimportant.  +1 on
making this something of a priority.  People looking for this info
should find it in the obvious place.  Some are unobvious. (How fast is
dict.__eq__ on average? Worst case?)

-j
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-21 Thread Giovanni Bajo
Jason Orendorff wrote:

 However, I'm also struggling to think of a case other than list vs
 deque where the choice of a builtin or standard library data
 structure would be dictated by big-O() concerns.

 OK, but that doesn't mean the information is unimportant.  +1 on
 making this something of a priority.  People looking for this info
 should find it in the obvious place.  Some are unobvious. (How fast is
 dict.__eq__ on average? Worst case?)

I also found out that most people tend to think of Python's lists as a
magical data structure optimized for many operations (like a rope or
something complex like that). Documenting that it's just a bare vector
(std::vector in C++) would be of great help.
-- 
Giovanni Bajo

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-21 Thread James Y Knight
On Jul 21, 2006, at 12:45 PM, Giovanni Bajo wrote:
 Jason Orendorff wrote:

 However, I'm also struggling to think of a case other than list vs
 deque where the choice of a builtin or standard library data
 structure would be dictated by big-O() concerns.

 OK, but that doesn't mean the information is unimportant.  +1 on
 making this something of a priority.  People looking for this info
 should find it in the obvious place.  Some are unobvious. (How  
 fast is
 dict.__eq__ on average? Worst case?)

 I also found out that most people tend to think of Python's lists as a
 magical data structure optimized for many operations (like a rope or
 something complex like that). Documenting that it's just a bare vector
 (std::vector in C++) would be of great help.

Indeed, I was talking to someone a while back who thought that lists  
were magically hashed, in that he did something like:
dictionary = open(/usr/share/dict/words).readlines()

and then expected:
word in dictionary

would be fast. And was very surprised when it turned out to be slow a  
linear search of the list. :)

James

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Document performance requirements?

2006-07-21 Thread Nick Coghlan
Jason Orendorff wrote:
 On 7/21/06, Nick Coghlan [EMAIL PROTECTED] wrote:
 However, I'm also struggling to think of a case other than list vs 
 deque where
 the choice of a builtin or standard library data structure would be 
 dictated
 by big-O() concerns.
 
 OK, but that doesn't mean the information is unimportant.  +1 on
 making this something of a priority.  People looking for this info
 should find it in the obvious place.  Some are unobvious. (How fast is
 dict.__eq__ on average? Worst case?)

Agreed, but there's more to doing that than just writing down the O() implied 
by the current CPython implementation - it's up to Guido to decide which of 
the constraints are part of the language definition, and which are 
implementation accidents (e.g. CPython's list.sort() operation was stable for 
at least one release before GvR made stability part of the definition of the 
method at the language level).

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com