(I forgot to reply to the list.) ----- Forwarded message from Jason Stover <[EMAIL PROTECTED]> -----
Date: Mon, 20 Mar 2006 10:03:27 -0500 From: Jason Stover <[EMAIL PROTECTED]> To: John Darrington <[EMAIL PROTECTED]> Subject: Re: category.c In-Reply-To: <[EMAIL PROTECTED]> User-Agent: Mutt/1.5.10i On Mon, Mar 20, 2006 at 09:03:21AM +0800, John Darrington wrote: > I've been thinking about re-implementing T-TEST, ONEWAY and EXAMINE, > using category.c and thus retiring the rather ad hoc group.c and > factor-stats.c files. > > Several questions about category.c : > > > 1. cat_value_find uses a linear search. Might is not be better to use > a hash instead? Yes. category.c is my first attempt at cacheing the information related to categorical variables, and there is probably a lot of room for improvement. > 2. Do we really need cat-routines.h ? Can it not be merged into > category.h ? Separating them was a hack to prevent a build break, and the need to do so may no longer exist. My memory is vague here, but there was an email discussion that I can no longer find. The problem was something like this: Most routines do not need to know about anything in category.h or cat-routines.h, but variable.h includes category.h. When cat-routines.h and category.h were in the same file, they caused some compile-time errors when files that included variable.h did not also know about everything related to category.h. I *think* the trouble may have been a *.h file that referred to struct design_matrix. Whatever the cause, I split category.h into two files, which may not have been the best solution. And now, any need to keep them apart may no longer exist. > 3. cat_value_update seems to do nothing for numeric variables. Why is > this? A numeric variable can be used as a categorical variable > just as easily as an alpha one. Good point. Encoding numeric data as categorical is usually a mistake from a statistical standpoint, but there are circumstances when treating a numeric variable as categorical makes perfect sense, so maybe cat_value_update() shouldn't care what type of variable it is looking at. This is where the question 'should we protect the user?' comes up. Someone with a numeric variable that has, say, 10^5 distinct values and inadvertently treats that variable as categorical could wind up running a procedure with 0 or negative degrees of freedom; slowing the machine down to a crawl; or, worst of all, finding bugs we'd rather not know about. But users should probably have the ability to treat numeric data as categorical if they want to. > 4. If I'm reading the code right, cat_stored_values_destroy is leaky. > It frees obs_vals, but doesn't tidy up obs_vals->vals . > Also, shouldn't it set v->obs_vals to NULL after freeing? You're right. That's a problem. I'll fix it soon if no one else fixes it first. While we're on the topic, is anyone in favor of using a garbage collector in PSPP? -Jason ----- End forwarded message ----- -- Jason Stover Assistant Professor Mathematics Department Georgia Kung Fu & State University "Georgia's public martial arts university" On the web at www.gksu.edu _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
