Paul,
Try this (I changed some of the object names, but the meat of the code is
the same):
df <- data.frame(
chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8", "MafF_(M8194)",
"ZBTB33", "CTCF"),
cumsum = c(10089, 20221, 30354, 40502, 50884, 67016)
)
# assign a new bin every time chrom changes and every time chromStart
changes by 115341 or more
L <- nrow(df)
prev.chrom <- c(NA, df$chrom[-L])
delta.start <- c(NA, df$chromStart[-1] - df$chromStart[-L])
new.bin <- is.na(prev.chrom) | df$chrom != prev.chrom | delta.start >=
115341
df$bin <- cumsum(new.bin)
df
pguilha <[email protected]> wrote on 07/02/2012 10:23:36 AM:
> Jean, that's exactly what it should be, but yes I copied and pasted
> from your email so I don't see how I could have introduced an error in
> there....
> paul
>
> On 2 July 2012 15:57, Jean V Adams [via R]
> <[email protected]> wrote:
> > Paul,
> >
> > Are you submitting the exact code that I included in my previous
e-mail?
> > When I submit that code, I get this ...
> >
> > chrom chromStart chromEnd name cumsum bin
> > 1 chr1 10089 10309 ZBTB33 10089 1
> > 2 chr1 10132 10536 TAF7_(SQ-8) 20221 1
> > 3 chr2 10133 10362 Pol2-4H8 30354 2
> > 4 chr2 10148 10418 MafF_(M8194) 40502 2
> > 5 chr2 210382 210578 ZBTB33 50884 3
> > 6 chr2 216132 216352 CTCF 67016 3
> >
> > Jean
> >
> >
> > Paul Guilhamon <[hidden email]> wrote on 07/02/2012 08:59:00 AM:
> >
> >> Thanks for your reply Jean,
> >>
> >> I think your interpretation is correct but when I run your code I end
> >> up with the below dataframe and obviously the bins created there
don't
> >> correspond to a chromStart change of 115341:
> >>
> >> chrom chromStart chromEnd name cumsum bin
> >> 1 chr1 10089 10309 ZBTB33 10089 1
> >> 2 chr1 10132 10536 TAF7_(SQ-8) 20221 2
> >> 3 chr2 10133 10362 Pol2-4H8 30354 3
> >> 4 chr2 10148 10418 MafF_(M8194) 40502 4
> >> 5 chr2 210382 210578 ZBTB33 50884 5
> >> 6 chr2 216132 216352 CTCF 67016 6
> >>
> >> the first two rows should have the same bin number (same chrom,
> >> <115341 diff), then rows 3&4 should be in another bin (different
chrom
> >> from rows 1&2, <115341 diff), and rows 5&6 in another one (same chrom
> >> but >115341 difference between row 4 and row 5).
> >>
> >> it seems the new.bin line of your code isn't quite doing what it
> >> should but I can't pinpoint the error there...
> >> Paul
> >>
> >>
> >> On 2 July 2012 14:19, Jean V Adams <[hidden email]> wrote:
> >> > Paul,
> >> >
> >> > My interpretation is that you are trying to assign a new bin number
to
> > a row
> >> > every time the variable chrom changes and every time the variable
> > chromStart
> >> > changes by 115341 or more. Is that right? If so, you don't need a
> > loop at
> >> > all. Check out the code below. I made a couple changes to the
> > all.tf7
> >> > example data frame so that it would have two changes in bin number,
> > one
> >
> >> > based on the chrom variable and one based on the chromStart
variable.
> >> >
> >> > Jean
> >> >
> >> > all.tf7 <- data.frame(
> >> > chrom = c("chr1", "chr1", "chr2", "chr2", "chr2", "chr2"),
> >> > chromStart = c(10089, 10132, 10133, 10148, 210382, 216132),
> >> > chromEnd = c(10309, 10536, 10362, 10418, 210578, 216352),
> >> > name = c("ZBTB33", "TAF7_(SQ-8)", "Pol2-4H8",
"MafF_(M8194)",
> >> > "ZBTB33", "CTCF"),
> >> > cumsum = c(10089, 20221, 30354, 40502, 50884, 67016),
> >> > bin = rep(NA, 6)
> >> > )
> >> >
> >> > # assign a new bin every time chrom changes and every time
chromStart
> >> > changes by 115341 or more
> >> > L <- nrow(all.tf7)
> >> > prev.chrom <- c(NA, all.tf7$chrom[-L])
> >> > delta.start <- c(NA, all.tf7$chromStart[-1] -
all.tf7$chromStart[-L])
> >> > new.bin <- is.na(prev.chrom) | all.tf7$chrom != prev.chrom |
> > delta.start >=
> >
> >> > 115341
> >> > all.tf7$bin <- cumsum(new.bin)
> >> > all.tf7
> >> >
> >> >
> >> > pguilha <[hidden email]> wrote on 07/02/2012 06:25:13 AM:
> >> >
> >> >> Hello all,
> >> >>
> >> >> I have written a for loop to act on a dataframe with close to
> > 3million
> >> >> rows
> >> >> and 6 columns and I would like to pass it to apply() to speed the
> > process
> >> >> up
> >> >> (I let the loop run for 2 days before stopping it and it had only
> > gone
> >> >> through 200,000 rows) but I am really struggling to find a way to
> > pass the
> >> >> arguments. Below are the loop and the head of the dataframe I am
> > working
> >> >> on.
> >> >> Any hints would be much appreciated, thank you! (I have searched
for
> > this
> >
> >> >> but could not find any other posts doing quite what I want)
> >> >> Paul
> >> >>
> >> >> x<-as.numeric(all.tf7[1,2])
> >> >> for (i in 2:nrow(all.tf7)) {
> >> >> if (all.tf7[i,1]==all.tf7[i-1,1] & (all.tf7[i,2]-x)<115341)
> >> >> all.tf7[i,6]<-all.tf7[i-1,6]
> >> >> else if (all.tf7[i,1]==all.tf7[i-1,1] &
(all.tf7[i,2]-x)>=115341) {
> >> >> all.tf7[i,6]<-(all.tf7[i-1,6]+1)
> >> >> x<-as.numeric(all.tf7[i,2]) }
> >> >> else if (all.tf7[i,1]!=all.tf7[i-1,1]) {
> >> >> all.tf7[i,6]<-(all.tf7[i-1,6]+1)
> >> >> x<-as.numeric(all.tf7[i,2]) }
> >> >> }
> >> >>
> >> >> #the aim here is to attribute a bin number to each row so that I
can
> > then
> >
> >> >> split the dataframe according to those bins.
> >> >>
> >> >>
> >> >> chrom chromStart chromEnd name cumsum bin
> >> >> chr1 10089 10309 ZBTB33 10089 1
> >> >> chr1 10132 10536 TAF7_(SQ-8) 20221 1
> >> >> chr1 10133 10362 Pol2-4H8 30354 1
> >> >> chr1 10148 10418 MafF_(M8194) 40502 1
> >> >> chr1 10382 10578 ZBTB33 50884 1
> >> >> chr1 16132 16352 CTCF 67016 1
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.