A while back I hit a bug that was traceable to Categorize, whether
standalone or called from within ImportSpreadsheet, but I got sidetracked
and never ran it down.
Now I have more info, but I'm not able to figure out the fix since my
knowledge of how hash tables work is too puny. So this is for someone who
has time on their hands and wants a challenge (probably not too
challenging).
*********
Problem statement:
As of OpenDX 4.1.0 on SGI:
Categorize (or ImportSpreadsheet("categorize")) fails with the error:
Internal error: Hash table internal error
if requested to categorize a list of integers with more than 32 unique values.
Categorize (and ISS) works fine with lists of strings or lists of floats
with more than 32 unique values.
In 3.1.4b on SGI, this bug did not exist: I had run this same net
successfully with 360 unique integer classes.
*********
Why would you want to do this? In my case, the data was a long-term (30 yr)
study of stock prices in which the data was grouped by month. The client's
date nomenclature was YYMM, thus, Jan-65 is 6501 and so on. For various
reasons, I decided to import the date info as received then categorize for
the sake of axis labels, and so on.
I can find the one and only instance of this error message in
"categorize.c" (about line 732), but I've studied this code and
"_cat_util.c" and "cat.h" til my head hurts and cannot figure out why 32
should be the magic number for integer type data. Of course, that number
sounds intriguingly familiar to all good programmers, so there are all
sorts of possibilities. I would guess that somehow the hash table was not
made big enough, based on a mistake made in typing the input data, or
something like that, but that's just speculation. I have not been able to
figure out how cp->num gets initialized.
If anyone is interested in exploring this (and fixing it), it would be a
good thing! Thanks.
Chris Pelkie
Vice President/Scientific Visualization Producer
Conceptual Reality Presentations, Inc.
30 West Meadow Drive
Ithaca, NY 14850
[EMAIL PROTECTED]