Ron Crump wrote: > Hi, > > I have a dataframe that contains pedigree information; > that is individual, sire and dam identities as separate > columns. It also has date of birth. > > These identifiers are not numeric, or not sequential. > > Obviously, an identifier can appear in one or two columns, > depending on whether it was a parent or not. These should > be consistent. > > Not all identifiers appear in the individual column - it > is possible for a parent not to have its own record if its > parents were not known. > > Missing parental (sire and/or dam) identifiers can occur. > > I need to export the data for use in another program that > requires the pedigree to be coded as integers, increasing > with date of birth (therefore sire and dam always have > lower identifiers than their offspring) and with missing > values coded as 0. > > How would I go about doing this? > Hi Ron, Without the genealogical coding system for the output, I can only make a guess. It seems as though you are going from a series of records for which the index is the individual, followed by fields containing sire, dam and date of birth (perhaps not in that order).
I think you want to transform this into a network (maybe hierarchical unless consanguinuity intervenes) with individuals coded as positive integers (and maybe some or all of the original information attached to those identifiers). At a guess, I would recode the birthdates as integers, preserving the order and including a rule for breaking ties. Assuming that you want an inverted tree for each individual, construct a linked list beginning with the individual with two pointers to the parents (their integer identifiers). Each parent has two links pointing to their parents, and so on. Whenever a pointer is zero, the linking stops. I don't know whether this can be represented in any of the tree diagrams in R, but it certainly could be coded. I think a bit more information for non-genealogists about the formats might elicit a more specific answer. > And a second, simpler related question, if I have a column with > n different values (may be strings or non-sequential integers) > identifying levels (possibly with repeated occurences), how > can I recode them to be sequential from 1 to n? > > I can solve both problems in fortran, so could use loops to > do it in R, but feel there should be quicker, more elegant, > "more R" solution. > sounds like "sort" Jim ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.