Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Ville Vainio
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).


 d = CiDict([('Hi', 12),('hoho',13)])
 d['hi']

12

 d.keys()

['Hi','hoho']

Note that 'Hi' preserved the case. I imagine that 'Hi' and 'hi' would
need to share the same hash value in order for the lookup to be fast.

Anyone have a an implementation that I could use? Quick googling only
produced implementations that coerce all keys to lowercase.

-- 
Ville Vainio   http://tinyurl.com/2prnb
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Daniel Dittmar
Ville Vainio wrote:
I need a dict (well, it would be optimal anyway) class that stores the
keys as strings without coercing the case to upper or lower, but still
provides fast lookup (i.e. uses hash table).
Store the original key together with the value and use a lowercase key 
for lookup.

only a sketch:
class MyDict:
def __init__ (self):
self.inner = {}
def __setitem__ (self, key, value):
self.inner [key.lower ()] = (key, value]
def __getitem__ (self, key):
realkey, realvalue = self.inner [self]
return realvalue
def get (self, key, default = None):
try:
return self [key]
except KeyError:
return default
# or: return self.inner.get (key.lower (), (None, default)) [1]
def keys (self):
return [realkey for realkey, realval in self.inner.values ()]
def values (self):
return [realval for realkey, realval in self.inner.values  )]
   def items ():
return self.inner.values ()
# iterators are left as an exercise
--
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Ville Vainio
 Daniel == Daniel Dittmar [EMAIL PROTECTED] writes:

Daniel Ville Vainio wrote:

 I need a dict (well, it would be optimal anyway) class that
 stores the keys as strings without coercing the case to upper
 or lower, but still provides fast lookup (i.e. uses hash
 table).

Daniel Store the original key together with the value and use a
Daniel lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the simplest thing that could possibly work, though.

-- 
Ville Vainio   http://tinyurl.com/2prnb
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Daniel Dittmar
Ville Vainio wrote:
Daniel == Daniel Dittmar [EMAIL PROTECTED] writes:

Daniel Ville Vainio wrote:
 I need a dict (well, it would be optimal anyway) class that
 stores the keys as strings without coercing the case to upper
 or lower, but still provides fast lookup (i.e. uses hash
 table).
Daniel Store the original key together with the value and use a
Daniel lowercase key for lookup.
That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.
You could write a string wrapper that changes comparison and hashing. 
I'm not sure that this (+ 1 (Python object + instance dictionary)) would 
use less memory than the other proposal (+ 1 Tuple + 1 String).

Daniel
--
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Ron_Adam
On 01 Apr 2005 15:55:58 +0300, Ville Vainio [EMAIL PROTECTED]
wrote:

 Daniel == Daniel Dittmar [EMAIL PROTECTED] writes:

Daniel Ville Vainio wrote:

 I need a dict (well, it would be optimal anyway) class that
 stores the keys as strings without coercing the case to upper
 or lower, but still provides fast lookup (i.e. uses hash
 table).

Daniel Store the original key together with the value and use a
Daniel lowercase key for lookup.

That's what I thought initially, but the strings take most of the
space in dict and I didn't feel like doubling the size.

It would be the simplest thing that could possibly work, though.

Try access the keys indirectly though another dictionary.  That way
you don't have to change the original.

Lkeys = {}
for k dict.keys():
Lkeys[ k.lower] = dict[k]

Then use:

value = dict[ Lkeys[ key.lower() ] ]

To get your value from the original dictionary.

Watch out for duplicate keys in differing case in the original dict. 


Ron

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Raymond Hettinger
[Ville Vainio]
 I need a dict (well, it would be optimal anyway) class that stores the
 keys as strings without coercing the case to upper or lower, but still
 provides fast lookup (i.e. uses hash table).


 class S(str):
def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()


 d = {}
 d[S('ThE')] = 'quick'
 d[S('the')]
'quick'
 d
{'ThE': 'quick'}


Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Bengt Richter
On Fri, 01 Apr 2005 18:52:00 GMT, Raymond Hettinger [EMAIL PROTECTED] wrote:

[Ville Vainio]
 I need a dict (well, it would be optimal anyway) class that stores the
 keys as strings without coercing the case to upper or lower, but still
 provides fast lookup (i.e. uses hash table).


 class S(str):
def __hash__(self):
return hash(self.lower())
def __eq__(self, other):
return self.lower() == other.lower()


 d = {}
 d[S('ThE')] = 'quick'
 d[S('the')]
'quick'
 d
{'ThE': 'quick'}

Building on your S to sneak in a generalized idea ;-)

  class idict(dict):
 ... def __setitem__(self, key, value):
 ... dict.__setitem__(self, S(key), value)
 ... def __getitem__(self, key):
 ... return dict.__getitem__(self, S(key))
 ...
  d = idict()
  d['ThE'] = 'quick'
  d['the']
 'quick'
  d
 {'ThE': 'quick'}

Ok, the idea:
I wonder if a dict with a general override hook for hashing all keys would be 
useful.
E.g.,  a dict.__keyhash__ that would take key as arg and default as now 
returning key.__hash__()
but that you could override. Seems like this could potentially be more 
efficient than key wrappers,
and would also make it so you wouldn't have to chase all the affected methods 
in doing an idict
like the above (e.g., get, setdefault, update etc. etc.)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Raymond Hettinger
[Bengt Richter]
 I wonder if a dict with a general override hook for hashing all keys would be
useful.
 E.g.,  a dict.__keyhash__ that would take key as arg and default as now
returning key.__hash__()
 but that you could override. Seems like this could potentially be more
efficient than key wrappers,
 and would also make it so you wouldn't have to chase all the affected methods
in doing an idict
 like the above (e.g., get, setdefault, update etc. etc.)

There has also been a discussion of adding a seqdict that maintains a keys in
insertion order.  Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building).  Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas.  Essentially, the thought is to put all the bells and
whistles in one place.  If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1  # case insensitive dictionary
assert t['Abc'] == 1

t = TrickyDict()
t.setdefaultvalue(0)
t['x'] += 1 # very bag like
assert t['x'] == 1

t = TrickyDict()
t.setdefaultfunction(list)
t['x'] += ['first']  # rather like: t.setdefault('x',
[]).append('first')
t['x'] += ['second']
assert t == {'x': ['first', 'second']}

t = TrickyDict()
t.sortedkeys = True
t['x'] = 1
t['y'] = 2
assert t.keys() == ['x', 'y']   # order is guaranteed

This universal dictionary type could also incorporate some methods for other
recurrent themes such as a inverting a dictionary:

def invdict(self)
return self: dict((v,k) for k,v in self.iteritems()).

For the performance minded, there could also be a control for dictionary
speed/space efficiency:

t = TrickyDict()
t.maxkeydensity = 40  # resize whenever the dictionary is more than 40%
full

Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide the
best solution.


Raymond


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Bengt Richter
On Fri, 01 Apr 2005 23:04:42 GMT, Raymond Hettinger [EMAIL PROTECTED] wrote:

[Bengt Richter]
 I wonder if a dict with a general override hook for hashing all keys would be
useful.
 E.g.,  a dict.__keyhash__ that would take key as arg and default as now
returning key.__hash__()
 but that you could override. Seems like this could potentially be more
efficient than key wrappers,
 and would also make it so you wouldn't have to chase all the affected methods
in doing an idict
 like the above (e.g., get, setdefault, update etc. etc.)

There has also been a discussion of adding a seqdict that maintains a keys in
insertion order.  Another idea was to have a defaultdict that could be
instructed to create values as necessary (either zero for counting or [] for
list building).  Putting the three ideas together, perhaps we should write an
extension module with a custom dictionary type with methods/attributes
implementing those ideas.  Essentially, the thought is to put all the bells and
whistles in one place.  If the extension became popular, it could ultimately
wend its way into the collections module.

A facetious naming idea would be to call it TrickyDict, a relative to DictMixin
and no relation to a former U.S. president ;-)

t = TrickyDict()
t.keytransform(str.lower)
t['abC'] = 1  # case insensitive dictionary
assert t['Abc'] == 1
 assert t.keys() == ['abC']# actual keys unchanged (but transformed for 
hash and cmp)
[...]

Taken together, these six attributes/methods could cover many wished for
features for the 10% of the cases where a regular dictionary doesn't provide 
the
best solution.
You think as much as 10% ?

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Raymond Hettinger
 Taken together, these six attributes/methods could cover many wished for
 features for the 10% of the cases where a regular dictionary doesn't provide
the
 best solution.

 You think as much as 10% ?

Rounded up from 9.6 ;-)

More important than the percentage is the clarity of the resulting code and the
avoidance of continous reinvention of workarounds.

Separating tool features into a basic and an advanced version is common solution
to managing option/customization complexity.


Raymond


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Case-insensitive dict, non-destructive, fast, anyone?

2005-04-01 Thread Terry Reedy

Raymond Hettinger [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 More important than the percentage is the clarity of the resulting code 
 and the
 avoidance of continous reinvention of workarounds.

 Separating tool features into a basic and an advanced version is common 
 solution
 to managing option/customization complexity.

A super bells  whistles dict would certainly pre-answer a lot of queries 
and save much newsgroup/list response time.

tjr



-- 
http://mail.python.org/mailman/listinfo/python-list