On Mon, Apr 16, 2007 at 11:10:43AM +0800, John Darrington wrote: > If we were to follow approach 2, am I right in thinking that the > 'interaction' data structure could be as large as the number of > cases in the casefile?
No. It would have either a hash of possible values (all unique), or a small function to get back and forth between a union value and a binary vector. > On the other hand, approach 1 sounds attractive, but there are things > that need to be considered: > > a) They'd have to be a special class of variable, which would not > normally be displayed, written to system files etc. So a new > enum dict_class entry in variable.h would be required. > > b) I'm not sure how existing code would deal with these > 'invisible' variables. For example many procedures might iterate > through all the variables. So dict_get_var_cnt might have to > take a parameter so that we'd know if we were interested in > 'interaction' variables or not. These statements make me think approach 2 is the way, especially your comment b) above. > c) Presumably it's not just the dictionary that needs modifying. > When you add new interaction, you also need to add values for the > variables into the casefile? That involves running a procedure. > What I did for RANK was to create a temporary variable, which was an > illegal name in pspp syntax, and delete it afterwards. No, no extra data need to be written to the casefile. But given a) and b) above, I think approach number 2 would be the least painful. -Jason > > On Sun, Apr 15, 2007 at 03:06:17PM -0400, Jason Stover wrote: > To have a glm procedure, pspp needs a data structure to handle > interactions. An interaction can be thought of as another variable > which is a function of two or more variables, usually categorical, > like this: > > Variable 1 Variable 2 Interaction > A B AB > E B EB > A C AC > E C EC > > > ...etc. The interaction term could be created in one of two ways: > Either 1) create a new variable in the dictionary that corresponds to > the interaction, or 2) create a new 'interaction' data structure > that contains all necessary mappings between existing variables and > the value of the interaction. > > Approach 1 would add a variable to the dictionary, but would not > create any more observations in the data set. It would make coding any > procedures that use interactions easier than approach 2, because doing > so would mean the procedure doesn't need to know about much special > code to handle interactions. It would also prevent the need for having > any more obscure string-values-to-binary-vector code like that in > category.[ch]. Approach 1 would still require the creation of some > code to create the interaction, though it may not require the creation > of a specialized "interaction" data structure to be available for use > by all procedures. > > Approach 2 doesn't require adding anything to the dictionary, but it > does mean that any procedures that need to use interactions would have > to create those interactions themselves. These interactions would > therefore be lost after the procedure exits, meaning that any other > procedure that needs interactions would have to recreate > them. Approach 2 also means writing more code that partly duplicates > the code already in category.[ch]. > > I favor approach number 1, but before I fiddle with the > dictionary, I thought I should ask. > > -Jason > > > _______________________________________________ > pspp-dev mailing list > [email protected] > http://lists.gnu.org/mailman/listinfo/pspp-dev > > -- > PGP Public key ID: 1024D/2DE827B3 > fingerprint = 8797 A26D 0854 2EAB 0285 A290 8A67 719C 2DE8 27B3 > See http://pgp.mit.edu or any PGP keyserver for public key. > > _______________________________________________ pspp-dev mailing list [email protected] http://lists.gnu.org/mailman/listinfo/pspp-dev
