Re: len() on mutables vs. immutables
On 2/7/2013 8:09 PM, Demian Brecht wrote: http://demianbrecht.github.com/posts/2013/02/07/understanding-len/ When len() is called passing an immutable built-in type (such as a string), I'd assume that the overhead in doing so is simply a function call and there are no on-call calculations done. Is that correct? I'd also assume that mutable built-in types (such as a bytearray) would cache their size internally as a side effect of mutation operations. Is that correct? If so, is it safe to assume that at least all built-in types observe this behavior, or are there some that incur an O(n) cost on every len() call? The language specification specifies behavior, not resource usage. However, CPython's concrete collection classes all require knowing how many items they contain for their operation. And they 'know' that they must respond to len() inquiries (including for truth value) and for sequences, deal with index and slice operations. So you may assume that len() simply accesses a private internal attribute. Keep in mind that 'immutables' have to be internally mutated to set their values, so from the interpreter viewpoint, there is little difference between mutable and immutable. The latter simply lack publicly accessible mutation methods. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
So, it's taken me a little while longer than I figured to actually get the time to dig around for the question that I had (added to the bottom of this message for context).. Pretty mundane stuff, but I did the digging (3.4.0a). Hopefully the results will help anyone else with the same questions. http://demianbrecht.github.com/posts/2013/02/07/understanding-len/ However, my research brought up a question (I'm assuming someone here can answer this): If a memoryview is representing a non-continuguous block of memory (> 1 ndim), will len(obj) not return incorrect results? It seems to be reporting the shape of the 0th dim at the moment.. Or is there something that I'm missing altogether? Thanks, Demian Brecht http://demianbrecht.github.com On 2012-10-18 5:26 PM, "Terry Reedy" wrote: >On 10/18/2012 2:42 PM, Demian Brecht wrote: > >> Awesome. Pretty much what I figured. Of course, I'll have to dig around >> the source just to confirm this with my own eyes (more just curiosity >> than anything), > >If you do, please followup with a report. > >-- >Terry Jan Reedy > >-- >http://mail.python.org/mailman/listinfo/python-list ** Quote for context on a necro'd post ** I'm curious as to the implementation (I'd be happy to dig through the source, just don't have the time right now). I've seen various implementations across interpreters in the past (some which have been rather shocking) and I'd like to get some insight into Python (well, CPython at this point anyway). When len() is called passing an immutable built-in type (such as a string), I'd assume that the overhead in doing so is simply a function call and there are no on-call calculations done. Is that correct? I'd also assume that mutable built-in types (such as a bytearray) would cache their size internally as a side effect of mutation operations. Is that correct? If so, is it safe to assume that at least all built-in types observe this behavior, or are there some that incur an O(n) cost on every len() call? Obviously this can't be controlled with custom types that implement their own __len__, I'm only asking about Python's built-ins. Thanks, Demian -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On 10/18/2012 2:42 PM, Demian Brecht wrote: Awesome. Pretty much what I figured. Of course, I'll have to dig around the source just to confirm this with my own eyes (more just curiosity than anything), If you do, please followup with a report. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On 10/18/2012 3:18 PM, Prasad, Ramit wrote: Terry Reedy wrote: On 10/18/2012 1:23 PM, Demian Brecht wrote: When len() is called passing an immutable built-in type (such as a string), I'd assume that the overhead in doing so is simply a function call and there are no on-call calculations done. Is that correct? See below. I'd also assume that mutable built-in types (such as a bytearray) would cache their size internally as a side effect of mutation operations. Is Or the length could be the difference of two pointers -- address of the first empty slot minus address of first item. that correct? If so, is it safe to assume that at least all built-in types observe this behavior, str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and ranges should all return len in O(1) time. That includes the possibility of a subtraction as indicated above. Why does pointer arithmetic work for dicts? It would only possibly be for lists, bytearrays, and, array module arrays.They are all over-allocated and need pointer to beginning, and either len or pointers to current end and allocated end. The current authoritative answer is in the current code itself. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On Thu, Oct 18, 2012 at 12:43 PM, Daniel Urban wrote: > The source is usually in Objects/*object.c (e.g., the source for list > is in Objects/listobject.c, dict is in dictobject.c and so on). The > implementation of __len__ is usually in a method called > whatever_length (e.g., dict.__len__ is called dict_length). To be > sure, you can check the PyTypeObject declaration for the type. > Probably the tp_as_sequence or tp_as_mapping field contains the > pointer to __len__ (sq_length or mp_length respectively). (You can > also search for "lenfunc", which is the type of such functions.) Many thanks for the details. -- http://mail.python.org/mailman/listinfo/python-list
RE: len() on mutables vs. immutables
Ian Kelly wrote: > Sent: Thursday, October 18, 2012 2:39 PM > To: Python > Subject: Re: len() on mutables vs. immutables > > On Thu, Oct 18, 2012 at 1:18 PM, Prasad, Ramit > wrote: > > Why does pointer arithmetic work for dicts? I would think the position > > of a value would be based on the hash of the key and thus "random" for > > the context of this conversation. > > It doesn't. len() on CPython dicts is O(1) because the dict keeps > track of how many items it contains. It needs to do this anyway so > that it can determine when to grow the internal hash table. That is what I was thinking "should" happen. Thanks for the clarification Ian. Ramit Prasad This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On Thu, Oct 18, 2012 at 8:42 PM, Demian Brecht wrote: >> str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and >> ranges should all return len in O(1) time. That includes the possibility >> of a subtraction as indicated above. > > Awesome. Pretty much what I figured. Of course, I'll have to dig around the > source just to confirm this with my own eyes (more just curiosity than > anything), so if you know whereabouts to look, it would be most helpful :) The source is usually in Objects/*object.c (e.g., the source for list is in Objects/listobject.c, dict is in dictobject.c and so on). The implementation of __len__ is usually in a method called whatever_length (e.g., dict.__len__ is called dict_length). To be sure, you can check the PyTypeObject declaration for the type. Probably the tp_as_sequence or tp_as_mapping field contains the pointer to __len__ (sq_length or mp_length respectively). (You can also search for "lenfunc", which is the type of such functions.) -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On Thu, Oct 18, 2012 at 1:18 PM, Prasad, Ramit wrote: > Why does pointer arithmetic work for dicts? I would think the position > of a value would be based on the hash of the key and thus "random" for > the context of this conversation. It doesn't. len() on CPython dicts is O(1) because the dict keeps track of how many items it contains. It needs to do this anyway so that it can determine when to grow the internal hash table. -- http://mail.python.org/mailman/listinfo/python-list
RE: len() on mutables vs. immutables
Terry Reedy wrote: > On 10/18/2012 1:23 PM, Demian Brecht wrote: > > > When len() is called passing an immutable built-in type (such as a > > string), I'd assume that the overhead in doing so is simply a function > > call and there are no on-call calculations done. Is that correct? > > See below. > > > I'd also assume that mutable built-in types (such as a bytearray) would > > cache their size internally as a side effect of mutation operations. Is > > Or the length could be the difference of two pointers -- address of the > first empty slot minus address of first item. > > > that correct? If so, is it safe to assume that at least all built-in > > types observe this behavior, > > str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and > ranges should all return len in O(1) time. That includes the possibility > of a subtraction as indicated above. > Why does pointer arithmetic work for dicts? I would think the position of a value would be based on the hash of the key and thus "random" for the context of this conversation. Ramit Prasad This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. -- http://mail.python.org/mailman/listinfo/python-list
RE: len() on mutables vs. immutables
> I'm curious as to the implementation (I'd be happy to dig through the > source, just don't have the time right now). I've seen various > implementations across interpreters in the past (some which have been > rather shocking) and I'd like to get some insight into Python (well, > CPython at this point anyway). > > When len() is called passing an immutable built-in type (such as a > string), I'd assume that the overhead in doing so is simply a function > call and there are no on-call calculations done. Is that correct? > > I'd also assume that mutable built-in types (such as a bytearray) would > cache their size internally as a side effect of mutation operations. Is > that correct? If so, is it safe to assume that at least all built-in > types observe this behavior, or are there some that incur an O(n) cost > on every len() call? It appears that list has len() complexity of O(1) source: http://wiki.python.org/moin/TimeComplexity It may be worth mentioning that lists in Python are implemented using arrays instead of linked lists. It's reasonable to assume that other built-in collection types would be similar, though I don't see anything explicitly saying so for bytearray. -Nick Cash -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On 10/18/2012 11:29 AM, Terry Reedy wrote:> Or the length could be the difference of two pointers -- address of the > first empty slot minus address of first item. That would assume contiguous blocks of memory, which I would find to be rather dangerous (of an assumption that is) in most dynamic cases (obviously totally depends on implementation details). > str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and > ranges should all return len in O(1) time. That includes the possibility > of a subtraction as indicated above. Awesome. Pretty much what I figured. Of course, I'll have to dig around the source just to confirm this with my own eyes (more just curiosity than anything), so if you know whereabouts to look, it would be most helpful :) -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On 10/18/2012 11:28 AM, Nick Cash wrote: It appears that list has len() complexity of O(1) source: http://wiki.python.org/moin/TimeComplexity It may be worth mentioning that lists in Python are implemented using arrays instead of linked lists. It's reasonable to assume that other built-in collection types would be similar, though I don't see anything explicitly saying so for bytearray. -Nick Cash Thanks for the link, I don't believe I had seen that one before. -- http://mail.python.org/mailman/listinfo/python-list
Re: len() on mutables vs. immutables
On 10/18/2012 1:23 PM, Demian Brecht wrote: When len() is called passing an immutable built-in type (such as a string), I'd assume that the overhead in doing so is simply a function call and there are no on-call calculations done. Is that correct? See below. I'd also assume that mutable built-in types (such as a bytearray) would cache their size internally as a side effect of mutation operations. Is Or the length could be the difference of two pointers -- address of the first empty slot minus address of first item. that correct? If so, is it safe to assume that at least all built-in types observe this behavior, str, bytes, bytearrays, arrays, sets, frozensets, dicts, dictviews, and ranges should all return len in O(1) time. That includes the possibility of a subtraction as indicated above. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list