Re: Mapping with continguous ranges of keys
On Sat, 17 Dec 2016 08:31 pm, Peter Otten wrote: > Steve D'Aprano wrote: > >> I have experimented with two solutions, and hope somebody might be able >> to suggest some improvements. Attached is the test code I ran, >> suggestions for improving the performance will be appreciated. > > If there was an attachment to this text -- that didn't make it to the > mailing list or the news group. I see it on comp.lang.python but not the mailing list or gmane. That's annoying because its a *text* attachment and should be allowed. I'll attach it again, inline this time, at the end of this post. [...] >> I tested the memory consumption and speed of these two solutions with >> (approximately) one million keys. I found: >> >> - the dict solution uses a lot more memory, about 24 bytes per key, >> compared to the pair of lists solution, which is about 0.3 bytes per key; >> >> - but on my computer, dict lookups are about 3-4 times faster. > > Only three to four times? You basically get that from a no-op function > call: [...] > Even adding a dummy value to V to go from > >> V[bisect.bisect_right(L, i) - 1] > > to > > V[bisect.bisect_right(L, i)] > > might be visible in your benchmark. Good thinking! I'll try that. And... it doesn't appear to make any real difference. >> Any suggestions for improving the speed of the binary search version, or >> the memory consumption of the dict? > > Depending on the usage pattern you might try bisect combined with a LRU > cache. Adding a cache will probably eat up the memory savings, but I may play around with that. And here's the code, this time inline. I've added the dummy value to the values list V as suggested. # --- cut --- import bisect import random import sys from timeit import default_timer as timer def make_keys(): """Yield (start, end) values suitable for passing to range(). Distance between start and end increase from 1 to 51, then repeat, for a total of 800*25*51 = 102 values all together. """ n = 1 for j in range(800): for i in range(50): yield (n, n+i+1) n += i+1 def populate(): D = {} L = [] V = [None] value = 1 last_b = 1 for a, b in make_keys(): assert a == last_b for i in range(a, b): D[i] = value L.append(a) V.append(value) last_b = b L.append(last_b) return (D, L, V) class Stopwatch: """Context manager for timing long running chunks of code.""" def __enter__(self): self._start = timer() return self def __exit__(self, *args): elapsed = timer() - self._start del self._start if elapsed < 0.01: print("Elapsed time is very small, consider using timeit instead.") print('time taken: %f seconds' % elapsed) D, L, V = populate() assert len(D) == 800*25*51 assert len(L) == len(V) print("size of dict", sys.getsizeof(D)) print("size of two lists", sys.getsizeof(L) + sys.getsizeof(V)) # Confirm that values are the same whether using dict lookup # or binary search. for i in range(1, 800*25*51 + 1): index = bisect.bisect_right(L, i) assert D[i] == V[index] # Simulate a large number of lookups in random order. nums = list(range(1, 800*25*51 + 1)) random.shuffle(nums) with Stopwatch(): for i in nums: x = D[i] with Stopwatch(): for i in nums: x = V[bisect.bisect_right(L, i)] # --- cut --- -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
Steve D'Aprano wrote: > I have experimented with two solutions, and hope somebody might be able to > suggest some improvements. Attached is the test code I ran, suggestions > for improving the performance will be appreciated. If there was an attachment to this text -- that didn't make it to the mailing list or the news group. > I decided on these two data structures: > > (1) The naive solution: don't worry about duplicate values, or the fact > that keys come in contiguous groups, and just store each individual key > and value in a dict; this is faster but uses more memory. > > (2) The clever solution: use a pair of lists, one holding the starting > value of each group of keys, and the other holding the common values. This > saves a lot of memory, but is slower. > > A concrete example might help. Suppose I have 15 keys in five groups: > > D = {0: 10, > 1: 20, 2: 20, > 3: 30, 4: 30, 5: 30, > 6: 40, 7: 40, 8: 40, 9: 40, > 10: 50, 11: 50, 12: 50, 13: 50, 14: 50} > > (Remember, in reality I could have as many as a million or two keys. This > is just a simple toy example.) > > Instead of using a dict, I also set up a pair of lists: > > L = [0, 1, 3, 6, 10, 15] # starting value of each group > V = [10, 20, 30, 40, 50] # value associated with each group > > Note that the first list has one extra item, namely the number one past > the final group. > > I can do a key look-up using either of these: > > D[key] > > V[bisect.bisect_right(L, i) - 1] > > > I tested the memory consumption and speed of these two solutions with > (approximately) one million keys. I found: > > - the dict solution uses a lot more memory, about 24 bytes per key, > compared to the pair of lists solution, which is about 0.3 bytes per key; > > - but on my computer, dict lookups are about 3-4 times faster. Only three to four times? You basically get that from a no-op function call: $ python3 -m timeit -s 'd = {1: 2}; k = 1' 'd[k]' 1000 loops, best of 3: 0.0935 usec per loop $ python3 -m timeit -s 'd = {1: 2}; k = 1' 'd[int(k)]' 100 loops, best of 3: 0.304 usec per loop Even adding a dummy value to V to go from > V[bisect.bisect_right(L, i) - 1] to V[bisect.bisect_right(L, i)] might be visible in your benchmark. > Any suggestions for improving the speed of the binary search version, or > the memory consumption of the dict? Depending on the usage pattern you might try bisect combined with a LRU cache. If runs of four or more nearby keys are common you can remember the current span: # untested, watch out for off-by-one errors ;) start = stop = 0 for key in data: if start <= key < stop: pass # reuse value else: index = bisect.bisect_right(L, key) start, stop = L[index: index + 2] value = V[index - 1] # use value (You might also fill V with (value, start, stop) tuples)) > By the way: the particular pattern of groups in the sample code (groups of > size 1, 2, 3, ... up to 50, then repeating from 1, 2, 3, ... again) is > just demo. In my real data, the sizes of the groups are all over the > place, in an unpredictable pattern. > > > > Thanks in advance. > > -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 16/12/2016 14:27, Steve D'Aprano wrote: (2) The clever solution: use a pair of lists, one holding the starting value of each group of keys, and the other holding the common values. This saves a lot of memory, but is slower. A concrete example might help. Suppose I have 15 keys in five groups: D = {0: 10, 1: 20, 2: 20, 3: 30, 4: 30, 5: 30, 6: 40, 7: 40, 8: 40, 9: 40, 10: 50, 11: 50, 12: 50, 13: 50, 14: 50} (Remember, in reality I could have as many as a million or two keys. This is just a simple toy example.) Instead of using a dict, I also set up a pair of lists: L = [0, 1, 3, 6, 10, 15] # starting value of each group V = [10, 20, 30, 40, 50] # value associated with each group I misread the problem (skipped the comment about keys 4 - 99) and assumed there might be gaps between the contiguous blocks so thought of the list structure [ ( (firstKeyN, lastKeyN), "value" ), ... ] At the cost of more memory keeping first and last keys numbers in tuples in the L list would mean there is only one L lookup at the expence of two additional tuple lookups. Are small tuple lookups quicker than 1 large list lookup? If not stick with the simple L and V you show above. I've never used any form of tree structure, perhaps someone else could comment of the performance of balanced trees as compared to simple lists. Would the insertion cost be too high in keeping the tree balanced? As to speeding up access with only L and V lists the binary search must be optimal unless specialised knowledge about the distribution of the size of the contiguous groups is made use of. But you say there is no pattern to the group sizes. Other thought is to have a smaller pre-index list, search this to find the range of L indexes the key is in. If the pre-index list had a 1000 entries then each entry would cover 1/1000 of the L list which narrows the binary search space in L considerably. The cost of something like this is keeping the pre-index list up to date when new keys are added and extra time to code and test it. preList struct [ (firstKeyN, lastKeyN), (firstLindex, lastLindex), ... ] Reminds me of Jackson's first two rules on optimisation, 1 - don't do it, 2 - don't do it yet Thanks for an interesting problem. Note that the first list has one extra item, namely the number one past the final group. I can do a key look-up using either of these: D[key] V[bisect.bisect_right(L, i) - 1] I tested the memory consumption and speed of these two solutions with (approximately) one million keys. I found: - the dict solution uses a lot more memory, about 24 bytes per key, compared to the pair of lists solution, which is about 0.3 bytes per key; - but on my computer, dict lookups are about 3-4 times faster. Any suggestions for improving the speed of the binary search version, or the memory consumption of the dict? By the way: the particular pattern of groups in the sample code (groups of size 1, 2, 3, ... up to 50, then repeating from 1, 2, 3, ... again) is just demo. In my real data, the sizes of the groups are all over the place, in an unpredictable pattern. Thanks in advance. -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On Fri, Dec 16, 2016 at 4:06 AM, Steve D'Apranowrote: > I have about a million or two keys, with a few hundred or perhaps a few > thousand distinct values. The size of each contiguous group of keys with > the same value can vary from 1 to perhaps a hundred or so. Going right back to the beginning here: I think that "a million or two" is a perfectly doable figure for a straight-forward list or dict. You get immediately-available lookups in a straight-forward way, at the cost of maybe 16MB of memory (if you use the same strings for the values, the cost is just the pointers). ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On Thursday, 15 December 2016 17:06:39 UTC, Steve D'Aprano wrote: > I have some key:value data where the keys often are found in contiguous > ranges with identical values. For example: > > {1: "foo", > 2: "foo", > 3: "foo", > # same for keys 4 through 99 > 100: "foo", > 101: "bar", > 102: "bar", > 103: "foobar", > 104: "bar", > 105: "foo", > } > ... > -- > Steve All the answers seem to rely on in-memory solutions. But isn't the problem a classic data design problem (cf Codd) with two tables. CREATE TABLE keys (idINTEGER NOT NULL PRIMARY KEY, kkey INTEGER, UNIQUE (kkey) ); ## eg id = 999, kkey=101 CREATE TABLE values (idINTEGER NOT NULL PRIMARY KEY, k_id INTEGER, value VARCHAR, UNIQUE (k_id, value), FOREIGN KEY (k_id) REFERENCES keys(id)); ## eg k_id = 999, value = "bar" For example, Python/SQLITE can parse the list of key:value pairs. key is looked up in keys -- and a keys row is added if the key is new. The keys id is saved. k_id--value pair is looked up -- and a row is added if the pair is new. Some of this can be simplified by relying on SQL to handle non-UNIQUE errors. This approach may be slower than in-memory processing, but it has almost no database size limits. -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 12/15/2016 11:57 PM, Terry Reedy wrote: On 12/15/2016 4:30 PM, Thomas Nyberg wrote: On 12/15/2016 12:48 PM, Terry Reedy wrote: A sparse array has at least half missing values. This one has none on the defined domain, but contiguous dupicates. I'm sorry for devolving into semantics, but there certainly isn't a single definition of "sparse array" out there. For example, the definition in wikipedia (https://en.wikipedia.org/wiki/Sparse_array) doesn't agree with you: I think it does ;-). "In computer science, a sparse array is an array in which most of the elements have the default value (usually 0 or null)." ... > Personally my usage of sparse arrays in scipy has _always_ had all defined values it's just that the default value was 0. I never deal with "missing" values. Lucky you. In statistics and analysis of real, experimental data, missing values are often a possibility. They are always a nuisance, sometimes a major one. When the data are represented by arrays, rather than by some compacted form, some value has to be chosen to represent the absence of of data. I thought that the definitions were different because yours said "at least half" and the other said "most" in addition to your saying "missing values" and the other said "default value". Ignoring the first minor point, if by "missing value" you basically mean "default value", then yes I agree that the definition is the same. Also I didn't mean that I was "lucky" to never have dealt with missing values (strictly speaking I have, I just do it rarely), but my point was that I deal with sparse matricies/arrays/structures quite often, but I rarely deal with "missing values" like nulls/nans/Nones. But that point is moot given that you apparently meant the same thing with "missing value" as I did with "default value" so I don't think we disagree here. Taking a step back, the real point of sparse anything is: can you represent it in a space/speed efficient manner which is "stable" under whatever operations you care about. I.e. if you have a default value of 0, then you have the property that 0 + x = x and 0 * x = 0 which means that sparse matrices with default value 0 are stable under addition and multiplication. If you have a default value of nan (or None or null) you usually have the convention that nan * x = nan (possibly depends on the sign of x) and nan + x = nan which means that a sparse matrix with nans as default is also stable under addition and multiplication. If you chose a default value of (say) 1 you would run into issues with the stability of these operations. It wouldn't mean it's not "sparse", but it would mean that the operations you care about might not work as well. The extension to the thread at and is just that now you have multiple default values and the fact that they are not assigned at random, but are instead runs of constant values means that you can put a sparse structure on this (with a suitable definition of "sparse"). > Let's devolve to a memory-based language like C. An physical array > consisting of sequential memory locations must have *some* bit pattern > in every byte and hence in every multibyte block. If I remember > correctly, C malloc initialized bytes to all 0 bits, which is an int > value of 0 also. If there is no meaningful value, there must be a > default value that means 'missing'. 0 may mean 'missing'. Null values > are by definition non-values. Python uses None as a Null. If the > example had been populated largely with Nones, I would have called it > 'sparse'. It's not really super pertinent to this discussion, but malloc does not guarantee that the values are zeroed. That is guaranteed by calloc though: http://man7.org/linux/man-pages/man3/malloc.3.html http://man7.org/linux/man-pages/man3/calloc.3p.html Also the point with a sparse representation is that your default value wouldn't exist in memory anywhere and that instead the operations would understand that it exists by other factors. For example, a sparse matrix with all 0s might* be represented by rows of the form i,j,v where i and j are the row and column indices and v is the value at the position. So in this representation we would have as many rows as we would non-default values. *I say might because there are usually more compact forms like this. This isn't the internal representation of scipy.sparse.csr_matrix, for example. Regardless, I think I wrote too much to say basically that I don't think we're really disagreeing except possibly slightly on perspective. Cheers, Thomas -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On Fri, 16 Dec 2016 04:06 am, Steve D'Aprano wrote: > I have some key:value data where the keys often are found in contiguous > ranges with identical values. [...] Thank you to everyone who gave suggestions! I have experimented with two solutions, and hope somebody might be able to suggest some improvements. Attached is the test code I ran, suggestions for improving the performance will be appreciated. I decided on these two data structures: (1) The naive solution: don't worry about duplicate values, or the fact that keys come in contiguous groups, and just store each individual key and value in a dict; this is faster but uses more memory. (2) The clever solution: use a pair of lists, one holding the starting value of each group of keys, and the other holding the common values. This saves a lot of memory, but is slower. A concrete example might help. Suppose I have 15 keys in five groups: D = {0: 10, 1: 20, 2: 20, 3: 30, 4: 30, 5: 30, 6: 40, 7: 40, 8: 40, 9: 40, 10: 50, 11: 50, 12: 50, 13: 50, 14: 50} (Remember, in reality I could have as many as a million or two keys. This is just a simple toy example.) Instead of using a dict, I also set up a pair of lists: L = [0, 1, 3, 6, 10, 15] # starting value of each group V = [10, 20, 30, 40, 50] # value associated with each group Note that the first list has one extra item, namely the number one past the final group. I can do a key look-up using either of these: D[key] V[bisect.bisect_right(L, i) - 1] I tested the memory consumption and speed of these two solutions with (approximately) one million keys. I found: - the dict solution uses a lot more memory, about 24 bytes per key, compared to the pair of lists solution, which is about 0.3 bytes per key; - but on my computer, dict lookups are about 3-4 times faster. Any suggestions for improving the speed of the binary search version, or the memory consumption of the dict? By the way: the particular pattern of groups in the sample code (groups of size 1, 2, 3, ... up to 50, then repeating from 1, 2, 3, ... again) is just demo. In my real data, the sizes of the groups are all over the place, in an unpredictable pattern. Thanks in advance. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 12/15/2016 4:30 PM, Thomas Nyberg wrote: On 12/15/2016 12:48 PM, Terry Reedy wrote: On 12/15/2016 12:27 PM, Thomas Nyberg wrote: I haven't dealt with a data structure exactly like this, but it's basically a sparse array. A sparse array has at least half missing values. This one has none on the defined domain, but contiguous dupicates. I'm sorry for devolving into semantics, but there certainly isn't a single definition of "sparse array" out there. For example, the definition in wikipedia (https://en.wikipedia.org/wiki/Sparse_array) doesn't agree with you: I think it does ;-). "In computer science, a sparse array is an array in which most of the elements have the default value (usually 0 or null)." Let's devolve to a memory-based language like C. An physical array consisting of sequential memory locations must have *some* bit pattern in every byte and hence in every multibyte block. If I remember correctly, C malloc initialized bytes to all 0 bits, which is an int value of 0 also. If there is no meaningful value, there must be a default value that means 'missing'. 0 may mean 'missing'. Null values are by definition non-values. Python uses None as a Null. If the example had been populated largely with Nones, I would have called it 'sparse'. > Personally my usage of sparse arrays in scipy has _always_ had all > defined values it's just that the default value was 0. I never deal > with "missing" values. Lucky you. In statistics and analysis of real, experimental data, missing values are often a possibility. They are always a nuisance, sometimes a major one. When the data are represented by arrays, rather than by some compacted form, some value has to be chosen to represent the absence of of data. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 12/15/2016 12:48 PM, Terry Reedy wrote: On 12/15/2016 12:27 PM, Thomas Nyberg wrote: I haven't dealt with a data structure exactly like this, but it's basically a sparse array. A sparse array has at least half missing values. This one has none on the defined domain, but contiguous dupicates. I'm sorry for devolving into semantics, but there certainly isn't a single definition of "sparse array" out there. For example, the definition in wikipedia (https://en.wikipedia.org/wiki/Sparse_array) doesn't agree with you: "In computer science, a sparse array is an array in which most of the elements have the default value (usually 0 or null)." Personally my usage of sparse arrays in scipy has _always_ had all defined values it's just that the default value was 0. I never deal with "missing" values. (I'm sure it has a name somewhere in the academic literature, but I couldn't find it with a quick search right now...) 'Sparse array' is the name for sparse arrays. I cannot think of one for blocks of duplicates. Yeah that's why I said "basically a sparse array". It has the same defining characteristics. It is a situation where you have a large number of values associated to a small number of values in such a way that you can fairly easily describe the association without needing to simply write out all pairs. For regular sparse arrays it's easy because you associate them to a single value by default and only specify the ones that aren't. In this case that doesn't work, but due to the fact that you have long runs of repeated values you can use that to encode and compress the mapping. Still not sure what to call it. I like D'Arcy's idea of subclassing a dict in combination with Peter's idea of using bisect (I had no idea that module existed!) so maybe I'll change my own not so great terminology to "basically a sparse map". Cheers, Thomas -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 12/15/2016 12:27 PM, Thomas Nyberg wrote: On 12/15/2016 09:06 AM, Steve D'Aprano wrote: Has anyone dealt with data like this and could give a recommendation of the right data structure to use? I haven't dealt with a data structure exactly like this, but it's basically a sparse array. A sparse array has at least half missing values. This one has none on the defined domain, but contiguous dupicates. > (I'm sure it has a name somewhere in the academic literature, but I couldn't find it with a quick search right now...) 'Sparse array' is the name for sparse arrays. I cannot think of one for blocks of duplicates. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 12/15/2016 12:06 PM, Steve D'Aprano wrote: I have some key:value data where the keys often are found in contiguous ranges with identical values. For example: {1: "foo", 2: "foo", 3: "foo", # same for keys 4 through 99 100: "foo", 101: "bar", 102: "bar", 103: "foobar", 104: "bar", 105: "foo", } For this particular example (untested): def lookup(n): F, B, FB = 'foo', 'bar', 'foobar' # Ensure value objects reused if 1 <= n <= 100: return F elif 101 <= n <= 105: return (B, B, FB, B, F')[n - 101] else: raise ValueError('n must be in range(1, 106)') So in this case, the keys 1 through 100, plus 105, all have the same value. I have about a million or two keys, with a few hundred or perhaps a few thousand distinct values. The size of each contiguous group of keys with the same value can vary from 1 to perhaps a hundred or so. Has anyone dealt with data like this and could give a recommendation of the right data structure to use? The obvious way is to explicitly list each key, in a regular dict. But I wonder whether there are alternatives which may be better (faster, more memory efficient)? If, as in the example, the keys comprise a contiguous sequence of ints, the 'obvious' way to me is sequence of values. A tuple of constants gets compiled and will load quickly from a .pyc file. 4 or 8 bytes per entry times 1 or 2 million entries is usually tolerable on a gigabyte machine. Or trade time for space with binary search or search in explicit binary tree. Or combine binary search and indexed lookup as I did above. -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
Steve D'Aprano wrote: > I have some key:value data where the keys often are found in contiguous > ranges with identical values. For example: > > {1: "foo", > 2: "foo", > 3: "foo", > # same for keys 4 through 99 > 100: "foo", > 101: "bar", > 102: "bar", > 103: "foobar", > 104: "bar", > 105: "foo", > } > > > So in this case, the keys 1 through 100, plus 105, all have the same > value. > > I have about a million or two keys, with a few hundred or perhaps a few > thousand distinct values. The size of each contiguous group of keys with > the same value can vary from 1 to perhaps a hundred or so. > > Has anyone dealt with data like this and could give a recommendation of > the right data structure to use? > > The obvious way is to explicitly list each key, in a regular dict. But I > wonder whether there are alternatives which may be better (faster, more > memory efficient)? Use a list, either directly if there are no big holes >>> r = range(10**5) >>> sys.getsizeof(list(r))/sys.getsizeof(dict(zip(r, r))) 0.14306676635590074 or indirectly: ranges_list = [ 1, 101, 103, 104, 105, 106, ] index_to_value = { 1: "foo", 2: "bar", 3: "foobar", 4: "bar", 5: "foo", } def get(key, default="missing"): index = bisect.bisect(ranges_list, key) return index_to_value.get(index, default) -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 12/15/2016 09:06 AM, Steve D'Aprano wrote: Has anyone dealt with data like this and could give a recommendation of the right data structure to use? I haven't dealt with a data structure exactly like this, but it's basically a sparse array. (I'm sure it has a name somewhere in the academic literature, but I couldn't find it with a quick search right now...) My solution to what you're asking for would be to have a list of key-value pairs, only adding a key to the list if it "changes" the value. I.e. your data structure would be this: l = [ (1, "foo"), (101, "bar"), (103, "foobar"), (104, "bar"), (105, "foo"), ] Then the only thing you need to do is define the lookup function. I would basically just do a binary search on the first values in the tuples. I.e. if "n" is your integer, you check if the middle values of the list l has it's first element as less than or greater than your value. Then you split l in two and do the same thing. Do this until you either find your value, or you find a value less than your value with the added property that the next value is greater than your value. After that you spit out the final second value. There might be better ways to find the keys, but I think this approach is probably your best bet. Cheers, Thomas -- https://mail.python.org/mailman/listinfo/python-list
Re: Mapping with continguous ranges of keys
On 2016-12-15 12:06 PM, Steve D'Aprano wrote: I have about a million or two keys, with a few hundred or perhaps a few thousand distinct values. The size of each contiguous group of keys with the same value can vary from 1 to perhaps a hundred or so. There isn't enough info in your post to be sure but if those values are constant then you might be able to subclass dict and write a new __getitem__ that checks for specific ranges and calls the superclass only if not in the known ranges. For example: class MyDict(dict): def __getitem__(self, key): if isinstance(key, int) and key >= 1 and key <= 100: return "foo" return dict.__getitem__(self, key) Obviously that middle section can be as complex as you need. -- D'Arcy J.M. Cain System Administrator, Vex.Net http://www.Vex.Net/ IM:da...@vex.net VoIP: sip:da...@vex.net -- https://mail.python.org/mailman/listinfo/python-list
Mapping with continguous ranges of keys
I have some key:value data where the keys often are found in contiguous ranges with identical values. For example: {1: "foo", 2: "foo", 3: "foo", # same for keys 4 through 99 100: "foo", 101: "bar", 102: "bar", 103: "foobar", 104: "bar", 105: "foo", } So in this case, the keys 1 through 100, plus 105, all have the same value. I have about a million or two keys, with a few hundred or perhaps a few thousand distinct values. The size of each contiguous group of keys with the same value can vary from 1 to perhaps a hundred or so. Has anyone dealt with data like this and could give a recommendation of the right data structure to use? The obvious way is to explicitly list each key, in a regular dict. But I wonder whether there are alternatives which may be better (faster, more memory efficient)? Thanks, -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list