On Fri, Sep 26, 2008 at 10:51:31AM -0400, Jason Stover wrote: > So now I'm thinking hashing won't work. The problem is that I do > not know in advance the range of the hash function, nor how to > avoid collisions. > > If the keys were just, say, of the form (variable1, variable2), > I could make a simple hash function like > > key = variable1->idx + variable2->idx * n_vars > > This would be fine for numeric variables. > > The problem is with the categorical variables. I need distinct > entries in the hash table for each of the values of the categorical > variables, and I haven't passed the data yet, so I don't know how many > values there are, nor what they may be. Which means, I think, that I > can't write a hash function in advance that will be guaranteed to be > one-to-one.
I mis-spoke here. The problem isn't so much the one-to-one'ness of the hash as it is that I just can't write a hash function in advance that knows what to do with the categorical values, since I haven't seen them yet. _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
