On Fri, Nov 20, 2009 at 03:44:11PM -0500, Jason Stover wrote: > So when do we merge?
Not yet, I think. I was just looking at what needs to be done to make interactions possible for the GLM procedure. I also discussed this with Ben via IRC. It seems that adding the interactions is going to be trickier than just fixing the code in interaction.c. An interaction for us is just a product of values of two or more variables. So, for example, if var1 and var2 interact, we would need to compute all possible combinations of values of var1 an var2. Each of these combinations would go into computing the covariance matrix, just as any other values would. So an "interaction" must be like a variable, in that it has at least one column in a covariance matrix. Next, if var1 and var2 are numeric, a "combination" of their values is just their product. This is easy to compute as we pass the data. So to include the interaction of var1 and var2 in the covariance matrix, we would just make a new variable, pass that to the constructor for the covariance matrix, and for each case in our data-reading loop, compute the product of the values of var1 and var2, append that to the case, and send that case along to covariance_accumulate_pass[12]. The complication enters if var1 is categorical and var2 is numeric. Then, instead of having bit-vectors as computed in category.c, we would need the scalar product of the numeric value from var2, times the bit vector from var1. So for example, if we encountered var1's value 'a', encoded that as (0 0 1 0), and a 2.2 for var2, then we would need to use (0 0 2.2 0) in the computation of the covariance matrix. This raises some obvious questions about what that interaction should be: It can't be a variable because it has both categorical and numeric attributes. How should it be appended to the case being read? How should covariance.c deal with it? There is a further complication if both var1 and var2 are categorical. Now we must encode the interaction as a bit vector for its use in computing the covariance. So for example, if we see 'a' for var1 and 'b' for var2, we should encode that as, say, (0 0 1 0 0 0 0). Now if we have n categories for var1 and m categories for var2, then we would have n*m categories for var1 interacting with var2, which means we would need a bit vector of length n*m - 1 to handle the interaction between var1 and var2. Where should this be stored? Maybe some function to smash the two values together and append it to the case being read? I don't know. Here is a further complication: The user could specify any number of variables in an interaction. So instead of var1 interacting with var2, the user could specify var1, var2,... vark all interacting together. This would be a bad idea for most experimental designs, but it is computationally just fine. So the question of how to make interactions seems difficult because its answer must involve reading cases, computing new variables, and encoding vectors from strings and numeric values. I'm asking how to do this here, because the last time I tried it I made a mess. But it is important, and necessary for a GLM command and many other modeling procedures. Any suggestions? (John, you want to just code this up over lunch?) > > And what to do next? Here is a list of tasks that > stem from having the new covariance.[ch]: > > 1. Change linreg.c, coefficient.c and regression.q to use the new covariance > routines. > > 2. Drop src/data/category.c and covariance-matrix.[ch]. > > 3. Rewrite interaction.c to use covariance.c. > > I would prefer to finish a GLM before changing linreg.c too much, but > I'm afraid doing so will just make more work later. Also, linreg.c > will have to be changed to use the new covariance struct anyway, and > doing so without dropping its current behavior of using the entire > data set would make it a lot uglier in the meantime. > > > > _______________________________________________ > pspp-dev mailing list > [email protected] > http://lists.gnu.org/mailman/listinfo/pspp-dev _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
