Adam Lawrence <alaw005 <at> gmail.com> writes: > > I am hoping someone can help me with a bus stop sequencing problem in R, > where I need to match counts of people getting on and off a bus to the > correct stop in the bus route stop sequence. I have tried looking > online/forums for sequence matching but seems to refer to numeric > sequences > or DNA matching and over my head. I am after a simple example if anyone > can > please help. >
Adam, Yet another way... See inline code. BTW, you should have mentioned that you are a transit planner or included a signature block so folks would know this is not a homework question. As others have noted/hinted, there are some unstated assumptions, so you need to try some test cases to be sure any solution always works. You only have one outbound/inbound cycle in stop_onoff, right?? If not, I think almost any approach can fail given the right sequence of 'seq's. > I have two data series as per below (from database), that I want to > combine. In this example “stop_sequence” includes the equence (seq) of bus > stops and “stop_onoff” is a count of people getting on and off at certain > stops (there is no entry if noone gets on or off). > > stop_sequence <- data.frame(seq=c(10,20,30,40,50,60), > ref=c('A','B','C','D','B','A')) > ## seq ref > ## 1 10 A > ## 2 20 B > ## 3 30 C > ## 4 40 D > ## 5 50 B > ## 6 60 A > stop_onoff <- > data.frame(ref=c('A','D','B','A'),on=c(5,0,10,0),off=c(0,2,2,6)) > ## ref on off > ## 1 A 5 0 > ## 2 D 0 2 > ## 3 B 10 2 > ## 4 A 0 6 > > I need to match the stop_onoff numbers in the right sto sequence, with the > correctly matched output as follows (load is a cumulative count of on and > off) > > desired_output <- data.frame(seq=c(10,20,30,40,50,60), > ref=c('A','B','C','D','B','A'), > on=c(5,'-','-',0,10,0),off=c(0,'-','-',2,2,6), load=c(5,0,0,3,11,5)) > ## seq ref on off load > ## 1 10 A 5 0 5 > ## 2 20 B - - 0 > ## 3 30 C - - 0 > ## 4 40 D 0 2 3 > ## 5 50 B 10 2 11 > ## 6 60 A 0 6 5 > Start here: > stop_onoff$load <- with(stop_onoff,cumsum(on)-cumsum(off)) > split.ref <- with(stop_sequence,split(seq,ref)) > split.ref.onoff <- split.ref[as.character(stop_onoff$ref)] > stop.mat <- sapply(split.ref.onoff,rep,length=2) > inout <- cbind(stop.mat,c(0,Inf))>cbind(c(0,Inf),stop.mat) > stop_onoff$seq <- head(stop.mat[inout],-1) > merge(stop_sequence[c("ref","seq")],stop_onoff[-1],by="seq",all.x=T) seq ref on off load 1 10 A 5 0 5 2 20 B NA NA NA 3 30 C NA NA NA 4 40 D 0 2 3 5 50 B 10 2 11 6 60 A 0 6 5 You can take care of turning the NA's to zeroes or '-'s, I think. HTH, Chuck ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.