Re: len() on mutables vs. immutables

2013-02-07 Thread Terry Reedy


On 2/7/2013 8:09 PM, Demian Brecht wrote:


http://demianbrecht.github.com/posts/2013/02/07/understanding-len/



When len() is called passing an immutable built-in type (such as a
string), I'd assume that the overhead in doing so is simply a function
call and there are no on-call calculations done. Is that correct?

I'd also assume that mutable built-in types (such as a bytearray) would
cache their size internally as a side effect of mutation operations. Is
that correct? If so, is it safe to assume that at least all built-in
types observe this behavior, or are there some that incur an O(n) cost
on every len() call?


The language specification specifies behavior, not resource usage. 
However, CPython's concrete collection classes all require knowing how 
many items they contain for their operation. And they 'know' that they 
must respond to len() inquiries (including for truth value) and for 
sequences, deal with index and slice operations. So you may assume that 
len() simply accesses a private internal attribute.


Keep in mind that 'immutables' have to be internally mutated to set 
their values, so from the interpreter viewpoint, there is little 
difference between mutable and immutable. The latter simply lack 
publicly accessible mutation methods.


--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2013-02-07 Thread Demian Brecht
So, it's taken me a little while longer than I figured to actually get the
time to dig around for the question that I had (added to the bottom of
this message for context).. Pretty mundane stuff, but I did the digging
(3.4.0a). Hopefully the results will help anyone else with the same
questions.

http://demianbrecht.github.com/posts/2013/02/07/understanding-len/


However, my research brought up a question (I'm assuming someone here can
answer this):

If a memoryview is representing a non-continuguous block of memory (> 1
ndim), will len(obj) not return incorrect results? It seems to be
reporting the shape of the 0th dim at the moment.. Or is there something
that I'm missing altogether?

Thanks,

Demian Brecht
http://demianbrecht.github.com



On 2012-10-18 5:26 PM, "Terry Reedy"  wrote:

>On 10/18/2012 2:42 PM, Demian Brecht wrote:
>
>> Awesome. Pretty much what I figured. Of course, I'll have to dig around
>> the source just to confirm this with my own eyes (more just curiosity
>> than anything),
>
>If you do, please followup with a report.
>
>-- 
>Terry Jan Reedy
>
>-- 
>http://mail.python.org/mailman/listinfo/python-list

** Quote for context on a necro'd post **

I'm curious as to the implementation (I'd be happy to dig through the
source, just don't have the time right now). I've seen various
implementations across interpreters in the past (some which have been
rather shocking) and I'd like to get some insight into Python (well,
CPython at this point anyway).

When len() is called passing an immutable built-in type (such as a
string), I'd assume that the overhead in doing so is simply a function
call and there are no on-call calculations done. Is that correct?

I'd also assume that mutable built-in types (such as a bytearray) would
cache their size internally as a side effect of mutation operations. Is
that correct? If so, is it safe to assume that at least all built-in
types observe this behavior, or are there some that incur an O(n) cost
on every len() call?

Obviously this can't be controlled with custom types that implement
their own __len__, I'm only asking about Python's built-ins.

Thanks,
Demian


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Terry Reedy

On 10/18/2012 2:42 PM, Demian Brecht wrote:


Awesome. Pretty much what I figured. Of course, I'll have to dig around
the source just to confirm this with my own eyes (more just curiosity
than anything),


If you do, please followup with a report.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Terry Reedy

On 10/18/2012 3:18 PM, Prasad, Ramit wrote:

Terry Reedy wrote:

On 10/18/2012 1:23 PM, Demian Brecht wrote:


When len() is called passing an immutable built-in type (such as a
string), I'd assume that the overhead in doing so is simply a function
call and there are no on-call calculations done. Is that correct?


See below.


I'd also assume that mutable built-in types (such as a bytearray) would
cache their size internally as a side effect of mutation operations. Is


Or the length could be the difference of two pointers -- address of the
first empty slot minus address of first item.


that correct? If so, is it safe to assume that at least all built-in
types observe this behavior,


str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and
ranges should all return len in O(1) time. That includes the possibility
of a subtraction as indicated above.



Why does pointer arithmetic work for dicts?


It would only possibly be for lists, bytearrays, and, array module 
arrays.They are all over-allocated and need pointer to beginning, and 
either len or pointers to current end and allocated end. The current 
authoritative answer is in the current code itself.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Demian Brecht
On Thu, Oct 18, 2012 at 12:43 PM, Daniel Urban  wrote:
> The source is usually in Objects/*object.c (e.g., the source for list
> is in Objects/listobject.c, dict is in dictobject.c and so on). The
> implementation of __len__ is usually in a method called
> whatever_length (e.g., dict.__len__ is called dict_length). To be
> sure, you can check the PyTypeObject declaration for the type.
> Probably the tp_as_sequence or tp_as_mapping field contains the
> pointer to __len__ (sq_length or mp_length respectively). (You can
> also search for "lenfunc", which is the type of such functions.)

Many thanks for the details.
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: len() on mutables vs. immutables

2012-10-18 Thread Prasad, Ramit
Ian Kelly wrote:
> Sent: Thursday, October 18, 2012 2:39 PM
> To: Python
> Subject: Re: len() on mutables vs. immutables
> 
> On Thu, Oct 18, 2012 at 1:18 PM, Prasad, Ramit
>  wrote:
> > Why does pointer arithmetic work for dicts? I would think the position
> > of a value would be based on the hash of the key and thus "random" for
> > the context of this conversation.
> 
> It doesn't.  len() on CPython dicts is O(1) because the dict keeps
> track of how many items it contains.  It needs to do this anyway so
> that it can determine when to grow the internal hash table.

That is what I was thinking "should" happen. Thanks for the 
clarification Ian.


Ramit Prasad


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Daniel Urban
On Thu, Oct 18, 2012 at 8:42 PM, Demian Brecht  wrote:
>> str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and
>> ranges should all return len in O(1) time. That includes the possibility
>> of a subtraction as indicated above.
>
> Awesome. Pretty much what I figured. Of course, I'll have to dig around the
> source just to confirm this with my own eyes (more just curiosity than
> anything), so if you know whereabouts to look, it would be most helpful :)

The source is usually in Objects/*object.c (e.g., the source for list
is in Objects/listobject.c, dict is in dictobject.c and so on). The
implementation of __len__ is usually in a method called
whatever_length (e.g., dict.__len__ is called dict_length). To be
sure, you can check the PyTypeObject declaration for the type.
Probably the tp_as_sequence or tp_as_mapping field contains the
pointer to __len__ (sq_length or mp_length respectively). (You can
also search for "lenfunc", which is the type of such functions.)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Ian Kelly
On Thu, Oct 18, 2012 at 1:18 PM, Prasad, Ramit
 wrote:
> Why does pointer arithmetic work for dicts? I would think the position
> of a value would be based on the hash of the key and thus "random" for
> the context of this conversation.

It doesn't.  len() on CPython dicts is O(1) because the dict keeps
track of how many items it contains.  It needs to do this anyway so
that it can determine when to grow the internal hash table.
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: len() on mutables vs. immutables

2012-10-18 Thread Prasad, Ramit
Terry Reedy wrote:
> On 10/18/2012 1:23 PM, Demian Brecht wrote:
> 
> > When len() is called passing an immutable built-in type (such as a
> > string), I'd assume that the overhead in doing so is simply a function
> > call and there are no on-call calculations done. Is that correct?
> 
> See below.
> 
> > I'd also assume that mutable built-in types (such as a bytearray) would
> > cache their size internally as a side effect of mutation operations. Is
> 
> Or the length could be the difference of two pointers -- address of the
> first empty slot minus address of first item.
> 
> > that correct? If so, is it safe to assume that at least all built-in
> > types observe this behavior,
> 
> str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and
> ranges should all return len in O(1) time. That includes the possibility
> of a subtraction as indicated above.
> 

Why does pointer arithmetic work for dicts? I would think the position
of a value would be based on the hash of the key and thus "random" for
the context of this conversation.

Ramit Prasad


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: len() on mutables vs. immutables

2012-10-18 Thread Nick Cash
> I'm curious as to the implementation (I'd be happy to dig through the
> source, just don't have the time right now). I've seen various
> implementations across interpreters in the past (some which have been
> rather shocking) and I'd like to get some insight into Python (well,
> CPython at this point anyway).
> 
> When len() is called passing an immutable built-in type (such as a
> string), I'd assume that the overhead in doing so is simply a function
> call and there are no on-call calculations done. Is that correct?
> 
> I'd also assume that mutable built-in types (such as a bytearray) would
> cache their size internally as a side effect of mutation operations. Is
> that correct? If so, is it safe to assume that at least all built-in
> types observe this behavior, or are there some that incur an O(n) cost
> on every len() call?

It appears that list has len() complexity of O(1)
source: http://wiki.python.org/moin/TimeComplexity
It may be worth mentioning that lists in Python are implemented using arrays 
instead of linked lists.

It's reasonable to assume that other built-in collection types would be 
similar, though I don't see anything explicitly saying so for bytearray.

-Nick Cash

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Demian Brecht
On 10/18/2012 11:29 AM, Terry Reedy wrote:> Or the length could be the 
difference of two pointers -- address of the

> first empty slot minus address of first item.

That would assume contiguous blocks of memory, which I would find to be 
rather dangerous (of an assumption that is) in most dynamic cases 
(obviously totally depends on implementation details).


> str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and
> ranges should all return len in O(1) time. That includes the possibility
> of a subtraction as indicated above.

Awesome. Pretty much what I figured. Of course, I'll have to dig around 
the source just to confirm this with my own eyes (more just curiosity 
than anything), so if you know whereabouts to look, it would be most 
helpful :)

--
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Demian Brecht

On 10/18/2012 11:28 AM, Nick Cash wrote:

It appears that list has len() complexity of O(1)
source: http://wiki.python.org/moin/TimeComplexity
It may be worth mentioning that lists in Python are implemented using arrays 
instead of linked lists.

It's reasonable to assume that other built-in collection types would be 
similar, though I don't see anything explicitly saying so for bytearray.

-Nick Cash


Thanks for the link, I don't believe I had seen that one before.


--
http://mail.python.org/mailman/listinfo/python-list


Re: len() on mutables vs. immutables

2012-10-18 Thread Terry Reedy

On 10/18/2012 1:23 PM, Demian Brecht wrote:


When len() is called passing an immutable built-in type (such as a
string), I'd assume that the overhead in doing so is simply a function
call and there are no on-call calculations done. Is that correct?


See below.


I'd also assume that mutable built-in types (such as a bytearray) would
cache their size internally as a side effect of mutation operations. Is


Or the length could be the difference of two pointers -- address of the 
first empty slot minus address of first item.



that correct? If so, is it safe to assume that at least all built-in
types observe this behavior,


str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and 
ranges should all return len in O(1) time. That includes the possibility 
of a subtraction as indicated above.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list