Here's one way to do part 1:
rr = rle(Table[,'binary'])
cc = cumsum(rr$lengths)+1
thestarts = c(1,cc[cc<=nrow(Table)])
theends = cc-1
answer =
cbind(Table[thestarts,'Chromosome'],Table[thestarts,'start'],Table[theends,'start'],rr$values)
answer
[,1] [,2] [,3] [,4]
[1,]1 12 18
Hi:
Here are a couple more options using packages plyr and data.table. The
labels in the second part are changed because they didn't make sense
in a 2M line file (well, mine may not either, but it's a start). You
can always change them to something more pertinent.
# Question 1:
Table <- data.fram
Here is an answer to part 1:
> binary<-c(1,1,1,0,0,0,1,1,1,0,0)
> Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
> start<-c(12,17,18,20,25,36,12,15,16,17,19)
> Table<-cbind(Chromosome,start,binary)
> # determine where the start/end of each group is
> # use indices since the size is large
> startEnd <- lappl
My question is twofold.
Part 1:
My data looks like this:
(example set, real data has 2*10^6 rows)
binary<-c(1,1,1,0,0,0,1,1,1,0,0)
Chromosome<-c(1,1,1,1,1,1,2,2,2,2,2)
start<-c(12,17,18,20,25,36,12,15,16,17,19)
Table<-cbind(Chromosome,start,binary)
Chromosome start binary
[1,] 1
4 matches
Mail list logo