On Fri, Sep 26, 2008 at 10:51:31AM -0400, Jason Stover wrote:
> So now I'm thinking hashing won't work. The problem is that I do
> not know in advance the range of the hash function, nor how to
> avoid collisions. 
> 
> If the keys were just, say, of the form (variable1, variable2),
> I could make a simple hash function like
> 
>   key = variable1->idx + variable2->idx * n_vars
> 
> This would be fine for numeric variables.
> 
> The problem is with the categorical variables. I need distinct
> entries in the hash table for each of the values of the categorical
> variables, and I haven't passed the data yet, so I don't know how many
> values there are, nor what they may be. Which means, I think, that I
> can't write a hash function in advance that will be guaranteed to be
> one-to-one.

I mis-spoke here. The problem isn't so much the one-to-one'ness of the 
hash as it is that I just can't write a hash function in advance that
knows what to do with the categorical values, since I haven't seen them
yet.




_______________________________________________
pspp-dev mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/pspp-dev

Reply via email to