A useful technique when it is easy to compute a vector from an ordered
data.frame but you need to do it for an unordered one is to compute the
order
vector 'ord', compute the vector from df[ord,], and use df[ord,...] <-
vector
to reorder the vector.  In your case you could do:
  > dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')),
  +                   D=c(5,3,1,3,2,4))
  > ord <- with(dat_2, order(S, D)) # order by subject, break ties by date
  > dat_2$visitNo <- integer(nrow(dat_2)) # will fill this in next
  > dat_2$visitNo[ord] <- with(dat_2[ord,], ave(visitNo, S, FUN=seq_along))
  > dat_2
    S D visitNo
  1 a 5       2
  2 c 3       2
  3 a 1       1
  4 b 3       1
  5 c 2       1
  6 c 4       3

Now this is different from your answer, c(2,2,1,1,2,3).  Which is correct?

You can also do the reordering of the result from the ordered dataset by
subscripting the right hand side with [order(ord)], but I find using [ord]
on left side easier to remember.
  with(dat_2[ord,], ave(visitNo, S, FUN=seq_along))[order(ord)]
  [1] 2 2 1 1 1 3



Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Feb 4, 2015 at 12:07 PM, Tom Wright <t...@maladmin.com> wrote:

> Thanks, I was not aware of order().
> I did deliberately mess up the order of S. The following example breaks
> your solution
> dat_2<-data.frame(S=factor(c('a','c','a','b','c','c')),
>                   D=c(5,3,1,3,2,4))
>
> which should give the answer c(2,2,1,1,2,3)
>
> Your solution does indicate that sorting the data correctly before
> starting might solve the problem.
>
>
> On Wed, 2015-02-04 at 19:49 +0000, Rui Barradas wrote:
> > Hello,
> >
> > Aren't the levels of your example wrong? If the levels are
> > levels=c('a','b','c'), not c('b', 'a', 'c'), then the following will do
> > the job.
> >
> > unname(unlist(tapply(dat$D, dat$S, order)))
> >
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Em 04-02-2015 19:34, Tom Wright escreveu:
> > > Given a dataframe:
> > >
> dat<-data.frame(S=factor(c('a','b','a','c','c','c',levels=c('b','a','c')),
> > >             D=c(1,5,3,2,3,4))
> > >
> > > where S is a subject identifier and D a visit (actually a date in my
> > > real dataset). I would like to generate another column giving the visit
> > > number
> > >
> > > R=c(2,1,1,1,2,3)
> > >
> > > My current solution uses nested loops and is slow and ugly. I've looked
> > > at by() but can't see how to keep the order of R correct.
> > >
> > > Thanks,
> > > Tom
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to