Re: Proposed implementation for an Ordered Dictionary

2009-03-01 Thread Colin J. Williams

Michele Simionato wrote:

On Mar 1, 1:43 am, Paul Rubin http://phr...@nospam.invalid wrote:

Colin J. Williams c...@ncf.ca writes:


 # print [mydict[x] for x in sorted(mydict.keys)] Instance object
is not iterable

It was a typo.  Use:

print [mydict[x] for x in sorted(mydict.keys())]


Even better

print [mydict[x] for x in sorted(mydict)]


Both Paul Rubin and Michele Simionato 
produce the same result but neither

produces what was originally suggested:

def seqValues(self):
''' To return the values, with 
their keys, sorted by value. '''
v= [(it[1], it[0]) for it in 
self.items()]

v.sort()
return v

Colin W.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-28 Thread python

 A sort of premature pessimization, then.

QOTW!

+1

Malcolm
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-28 Thread Colin J. Williams

Raymond Hettinger wrote:

[Paul Rubin]

Ehh, I guess I'm not surprised at the slowdown and extra complexity
from the second dict.  Oh well.  If the module really turns out to be
really used a lot, another (messy) approach would be to write a C
extension that uses a doubly linked list some day.


That seems like an ideal implementation to me.

  O(1): appending, popping, searching, and deletion
  O(n): traversal

Raymond


Sometimes, it's useful to be able to 
obtain the data in the sorted sequence.


You might consider adding functionality 
like:


def seqItems(self):
'''To return the items, sorted 
by key. '''
return [self[k] for k in 
self.seqKeys()]


def seqKeys(self):
''' To return the keys sorted.  '''
k= self.keys()
k.sort()
return k

def seqValues(self):
''' To return the values, with 
their keys, sorted by value. '''
v= [(it[1], it[0]) for it in 
self.items()]

v.sort()
return v

Colin W.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-28 Thread Steven D'Aprano
Colin J. Williams wrote:

 Sometimes, it's useful to be able to
 obtain the data in the sorted sequence.
 
 You might consider adding functionality
 like:
 
 def seqItems(self):
 '''To return the items, sorted
 by key. '''
 return [self[k] for k in
 self.seqKeys()]

Amazingly, the Python time-machine provides such functionality! Using just a
handful of pre-existing pieces, you can do this:

sorted(mydict.items())
sorted(mydict.keys())
[mydict[x] for x in sorted(mydict.keys)]

and any other sequence you might need. That's the beauty of a general
purpose programming language. You don't need everything to be a built-in
*wink*



-- 
Steven

--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-28 Thread Colin J. Williams

Steven D'Aprano wrote:

Colin J. Williams wrote:


Sometimes, it's useful to be able to
obtain the data in the sorted sequence.

You might consider adding functionality
like:

def seqItems(self):
'''To return the items, sorted
by key. '''
return [self[k] for k in
self.seqKeys()]


Amazingly, the Python time-machine provides such functionality! Using just a
handful of pre-existing pieces, you can do this:

sorted(mydict.items())
sorted(mydict.keys())
[mydict[x] for x in sorted(mydict.keys)]

and any other sequence you might need. That's the beauty of a general
purpose programming language. You don't need everything to be a built-in
*wink*


Thanks, you are right, you have a neat 
way of handling the first two, they
work with an instance of Raymond 
Hettinger's. OrderedDict.


The third suggestion has a couple of 
problems, which are fixed below.


if __name__ == '__main__':
d= 
OrderedDict.fromkeys('abracadabra', 
value= 'zzz')

print(d)
print(d.seqKeys())
print(d.seqItems())
print(d.seqValues())
# Steven D'Aprano's alternative
mydict= d.copy()
print sorted(mydict.items())
print sorted(mydict.keys())
# print [mydict[x] for x in 
sorted(mydict.keys)] Instance object is 
not iterable
print(sorted(iter([(x[1], x[0]) for 
x in mydict.iteritems()]))) # This works


Colin W.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-28 Thread Paul Rubin
Colin J. Williams c...@ncf.ca writes:
  # print [mydict[x] for x in sorted(mydict.keys)] Instance object
 is not iterable

It was a typo.  Use:

print [mydict[x] for x in sorted(mydict.keys())]
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-28 Thread Michele Simionato
On Mar 1, 1:43 am, Paul Rubin http://phr...@nospam.invalid wrote:
 Colin J. Williams c...@ncf.ca writes:

       # print [mydict[x] for x in sorted(mydict.keys)] Instance object
  is not iterable

 It was a typo.  Use:

     print [mydict[x] for x in sorted(mydict.keys())]

Even better

print [mydict[x] for x in sorted(mydict)]
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-27 Thread bearophileHUGS
Raymond Hettinger:
Paul Rubin:
another (messy) approach would be to write a C
extension that uses a doubly linked list some day.

 That seems like an ideal implementation to me.

This was my Python implementation, where the delete too is O(1), but
it's slow:
http://code.activestate.com/recipes/498195/

I think the C extension with the doubly linked list is the best
approach.
Note that in modern CPUs caches are able to change the performance a
lot. So reducing the memory used is very important. So using the XOR
(or subtraction) trick to use only 1 pointer for each key-value may be
useful to keep performance not too much far from the normal python
dicts:
http://en.wikipedia.org/wiki/XOR_linked_list

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-27 Thread Paul Rubin
bearophileh...@lycos.com writes:
 So using the XOR (or subtraction) trick to use only 1 pointer for
 each key-value may be useful to keep performance not too much far
 from the normal python dicts: http://en.wikipedia.org/wiki/XOR_linked_list

I don't see how to delete a randomly chosen node if you use that
trick, since the hash lookup doesn't give you two consecutive nodes
in the linked list to xor together.  Maybe you could do something
intricate with skip lists and end up with O(log n) deletion.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-27 Thread bearophileHUGS
Paul Rubin:
I don't see how to delete a randomly chosen node if you use that trick, since 
the hash lookup doesn't give you two consecutive nodes in the linked list to 
xor together.

Thank you, I think you are right, I am sorry.
So on 32-bit CPUs you need to add 8 bytes to each value.

On 64-bit CPUs you may use a double indexed list, instead of a double
linked list. And each index can be stored in 6 bytes instead of 8
(281_474_976_710_656 buckets may be enough for most people), so you
need only 12 bytes for two indexes instead of 16 bytes, this helps the
L1 cache (and the small L2 cache too on Core i7) a bit. But this may
slow down too much the ordered iteration.

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-27 Thread Steve Holden
bearophileh...@lycos.com wrote:
 Paul Rubin:
 I don't see how to delete a randomly chosen node if you use that trick, 
 since the hash lookup doesn't give you two consecutive nodes in the linked 
 list to xor together.
 
 Thank you, I think you are right, I am sorry.
 So on 32-bit CPUs you need to add 8 bytes to each value.
 
 On 64-bit CPUs you may use a double indexed list, instead of a double
 linked list. And each index can be stored in 6 bytes instead of 8
 (281_474_976_710_656 buckets may be enough for most people), so you
 need only 12 bytes for two indexes instead of 16 bytes, this helps the
 L1 cache (and the small L2 cache too on Core i7) a bit. But this may
 slow down too much the ordered iteration.
 
A sort of premature pessimization, then.

regards
 Steve
-- 
Steve Holden+1 571 484 6266   +1 800 494 3119
Holden Web LLC  http://www.holdenweb.com/

--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-27 Thread bearophileHUGS
Steve Holden:
 A sort of premature pessimization, then.

Maybe not, the save in memory may lead to higher speed anyway. So you
need to test it to know the overall balance. And in data structures
with general purpose you want all the speed you can get.

Bye,
bearophile
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-27 Thread Raymond Hettinger
[Steve Holden]
 A sort of premature pessimization, then.

QOTW!

 _ ~
 @ @
 \_/


Raymond

--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Benjamin Peterson
Raymond Hettinger python at rcn.com writes:

 
 Here's a proposed implementation for Py2.7 and Py3.1:
 
 http://code.activestate.com/recipes/576669/
 
 Would like you guys to kick the tires, exercise it a bit, and let me
 know what you think.  The recipe runs under 2.6 and 3.0 without
 modification so it should be easy to play with.

Why not just inherit from collections.MutableMapping?



--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Hrvoje Niksic
Raymond Hettinger pyt...@rcn.com writes:

 Here's a proposed implementation for Py2.7 and Py3.1:

 http://code.activestate.com/recipes/576669/

Several methods like __contains__() and __getitem__() are not
overridden, so their performance is just as fast as a regular
dictionary.

Methods like __setitem__ and __delitem__ are overridden but have a
fast path depending on whether or not the key already exists.

It seems that __delitem__ of an existing key is O(n), whereas it's
amortized constant time for dicts.  (__setitem__ is constant time for
both.)  Is there a way to avoid this?

If not, it should probably be documented, since it differs from dict.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Raymond Hettinger
[Benjamin Peterson]
 Why not just inherit from collections.MutableMapping?

It makes the recipe shorter to inherit some methods from dict.  Also,
subclassing from dict gives a speedup for __getitem__(), __len__(),
and get().
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Raymond Hettinger
[Hrvoje Niksic]
 It seems that __delitem__ of an existing key is O(n), whereas it's
 amortized constant time for dicts.  (__setitem__ is constant time for
 both.)  Is there a way to avoid this?

I don't see any fast/clean way.  It's possible to tracking pending
deletions
and do them all at once but that's a bit messy and slow.


 If not, it should probably be documented, since it differs from dict.

That makes sense.


Raymond

--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Paul Rubin
Raymond Hettinger pyt...@rcn.com writes:
 I don't see any fast/clean way.  It's possible to tracking pending
 deletions and do them all at once but that's a bit messy and slow.

What about using a second dictionary (indexed by the incrementing
counter) instead of a list to record the insertion order?  Then you
have O(1) deletion, and traversal takes an O(n log n) sorting
operation.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Raymond Hettinger
[Paul Rubin]
 What about using a second dictionary (indexed by the incrementing
 counter) instead of a list to record the insertion order?  Then you
 have O(1) deletion, and traversal takes an O(n log n) sorting
 operation.

My first attempt used exactly that approach and it works fine
though it does complicate the code quite a bit and slows down
all of the common operations by a constant factor.


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Raymond Hettinger
  What about using a second dictionary (indexed by the incrementing
  counter) instead of a list to record the insertion order?  Then you
  have O(1) deletion, and traversal takes an O(n log n) sorting
  operation.

 My first attempt used exactly that approach and it works fine
 though it does complicate the code quite a bit and slows down
 all of the common operations by a constant factor.

FWIW, here is the code for that approach.

--
from collections import MutableMapping

class OrderedDict(dict):

def __init__(self, *args, **kwds):
if len(args)  1:
raise TypeError('expected at 1 argument, got %d', len
(args))
self.clear()
self.update(*args, **kwds)

def clear(self):
self.__cached_sort = None
self.__cnt = 0
self.__order = {}
dict.clear(self)

def __setitem__(self, key, value):
if key not in self:
self.__cached_sort = None
self.__cnt += 1
self.__order[key] = self.__cnt
dict.__setitem__(self, key, value)

def __delitem__(self, key):
if key in self:
self.__cached_sort = None
del self.__order[key]
dict.__delitem__(self, key)

def __sorted(self):
# return keys in insertion order and cache the result
if self.__cached_sort is None:
self.__cached_sort = sorted(dict.keys(self),
key=self.__order.__getitem__)
return self.__cached_sort

def __iter__(self):
return iter(self.__sorted())

def __reversed__(self):
return iter(reversed(self.__sorted()))

def popitem(self):
# O(n) cost on the first pop and O(1) on successive pops
if not self:
raise KeyError
key = self.__sorted().pop()
del self.__order[key]
value = dict.pop(self, key)
return key, value

# Methods with indirect access via the above methods

setdefault = MutableMapping.setdefault
update = MutableMapping.update
pop = MutableMapping.pop
keys = MutableMapping.keys
values = MutableMapping.values
items = MutableMapping.items

def __repr__(self):
return '{' + ', '.join(map('%r: %r'.__mod__, self.items())) +
'}'

def copy(self):
return self.__class__(self)

@classmethod
def fromkeys(cls, iterable, value=None):
d = cls()
for key in iterable:
d[key] = value
return d

def __reduce__(self):
# pickle an items list and any extra info in the instance dict
items = list(self.items())
inst_dict = vars(self).copy()
for attr in vars(self.__class__):
inst_dict.pop(attr, None)
return (self.__class__, (items,), inst_dict)

--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Paul Rubin
Raymond Hettinger pyt...@rcn.com writes:
  What about using a second dictionary 
 My first attempt used exactly that approach and it works fine
 though it does complicate the code quite a bit and slows down
 all of the common operations by a constant factor.

Ehh, I guess I'm not surprised at the slowdown and extra complexity
from the second dict.  Oh well.  If the module really turns out to be
really used a lot, another (messy) approach would be to write a C
extension that uses a doubly linked list some day.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Proposed implementation for an Ordered Dictionary

2009-02-26 Thread Raymond Hettinger
[Paul Rubin]
 Ehh, I guess I'm not surprised at the slowdown and extra complexity
 from the second dict.  Oh well.  If the module really turns out to be
 really used a lot, another (messy) approach would be to write a C
 extension that uses a doubly linked list some day.

That seems like an ideal implementation to me.

  O(1): appending, popping, searching, and deletion
  O(n): traversal

Raymond
--
http://mail.python.org/mailman/listinfo/python-list