[R] How to get row numbers of a subset of rows
Hello list, I read in a txt file using B-read.table(file=data.snp,header=TRUE,row.names=NULL) by specifying the row.names=NULL so that the rows are numbered. Below is an example after how the table looks like using B[1:10,1:3] SNPChromosome PhysicalPosition 1 SNP_A-1909444 1 7924293 2 SNP_A-2237149 1 8173763 3 SNP_A-4303947 1 8191853 4 SNP_A-2236359 1 8323433 5 SNP_A-2205441 1 8393263 6 SNP_A-1909445 1 7924293 7 SNP_A-2237146 2 8173763 8 SNP_A-4303946 2 8191853 9 SNP_A-2236357 2 8323433 10 SNP_A-2205442 2 8393263 I am wondering if there is a way to return the start and end row numbers for a subset of rows. For example, If I specify B[,2]=1, I would like to get start=1 and end=6 if B[,2]=2, then start=7 and end=10 Is there any way in R to quickly do this? Thanks a bunch! Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get row numbers of a subset of rows
Here is a way of doing it using 'rle': x - read.table(textConnection( SNPChromosome PhysicalPosition + 1 SNP_A-1909444 1 7924293 + 2 SNP_A-2237149 1 8173763 + 3 SNP_A-4303947 1 8191853 + 4 SNP_A-2236359 1 8323433 + 5 SNP_A-2205441 1 8393263 + 6 SNP_A-1909445 1 7924293 + 7 SNP_A-2237146 2 8173763 + 8 SNP_A-4303946 2 8191853 + 9 SNP_A-2236357 2 8323433 + 10 SNP_A-2205442 2 8393263), header=TRUE) # use rle to get the 'runs' y - rle(x$Chromosome) # create dataframe with start/ends and values start - head(cumsum(c(1, y$lengths)), -1) index - data.frame(values=y$values, start=start, end=start + y$lengths - 1) index values start end 1 1 1 6 2 2 7 10 On Nov 14, 2007 10:56 AM, affy snp [EMAIL PROTECTED] wrote: Hello list, I read in a txt file using B-read.table(file=data.snp,header=TRUE,row.names=NULL) by specifying the row.names=NULL so that the rows are numbered. Below is an example after how the table looks like using B[1:10,1:3] SNPChromosome PhysicalPosition 1 SNP_A-1909444 1 7924293 2 SNP_A-2237149 1 8173763 3 SNP_A-4303947 1 8191853 4 SNP_A-2236359 1 8323433 5 SNP_A-2205441 1 8393263 6 SNP_A-1909445 1 7924293 7 SNP_A-2237146 2 8173763 8 SNP_A-4303946 2 8191853 9 SNP_A-2236357 2 8323433 10 SNP_A-2205442 2 8393263 I am wondering if there is a way to return the start and end row numbers for a subset of rows. For example, If I specify B[,2]=1, I would like to get start=1 and end=6 if B[,2]=2, then start=7 and end=10 Is there any way in R to quickly do this? Thanks a bunch! Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get row numbers of a subset of rows
Am I missing something? ... Why not: range(seq(nrow(B))[B[,2]==1] ) ?? ## note: == not = Alternatively, and easily generalized (to start with a frame which is a subset of the original and any subset of rows, contiguous or not) range(as.numeric(row.names(B)[B[,2]==1])) Again, am I missing something that makes this obvious solution impossible? (Wouldn't be the first time.) Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jim holtman Sent: Wednesday, November 14, 2007 8:39 AM To: affy snp Cc: r-help@r-project.org Subject: Re: [R] How to get row numbers of a subset of rows Here is a way of doing it using 'rle': x - read.table(textConnection( SNPChromosome PhysicalPosition + 1 SNP_A-1909444 1 7924293 + 2 SNP_A-2237149 1 8173763 + 3 SNP_A-4303947 1 8191853 + 4 SNP_A-2236359 1 8323433 + 5 SNP_A-2205441 1 8393263 + 6 SNP_A-1909445 1 7924293 + 7 SNP_A-2237146 2 8173763 + 8 SNP_A-4303946 2 8191853 + 9 SNP_A-2236357 2 8323433 + 10 SNP_A-2205442 2 8393263), header=TRUE) # use rle to get the 'runs' y - rle(x$Chromosome) # create dataframe with start/ends and values start - head(cumsum(c(1, y$lengths)), -1) index - data.frame(values=y$values, start=start, end=start + y$lengths - 1) index values start end 1 1 1 6 2 2 7 10 On Nov 14, 2007 10:56 AM, affy snp [EMAIL PROTECTED] wrote: Hello list, I read in a txt file using B-read.table(file=data.snp,header=TRUE,row.names=NULL) by specifying the row.names=NULL so that the rows are numbered. Below is an example after how the table looks like using B[1:10,1:3] SNPChromosome PhysicalPosition 1 SNP_A-1909444 1 7924293 2 SNP_A-2237149 1 8173763 3 SNP_A-4303947 1 8191853 4 SNP_A-2236359 1 8323433 5 SNP_A-2205441 1 8393263 6 SNP_A-1909445 1 7924293 7 SNP_A-2237146 2 8173763 8 SNP_A-4303946 2 8191853 9 SNP_A-2236357 2 8323433 10 SNP_A-2205442 2 8393263 I am wondering if there is a way to return the start and end row numbers for a subset of rows. For example, If I specify B[,2]=1, I would like to get start=1 and end=6 if B[,2]=2, then start=7 and end=10 Is there any way in R to quickly do this? Thanks a bunch! Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get row numbers of a subset of rows
That works for the specific value of '1', but you would have to repeat it for other values in the column. If you had 100 different ranges in that column, what would you do? Here is another solution using 'range' on the same data: tapply(seq_len(nrow(x)), x$Chromosome, range) $`1` [1] 1 6 $`2` [1] 7 10 On Nov 14, 2007 12:04 PM, Bert Gunter [EMAIL PROTECTED] wrote: Am I missing something? ... Why not: range(seq(nrow(B))[B[,2]==1] ) ?? ## note: == not = Alternatively, and easily generalized (to start with a frame which is a subset of the original and any subset of rows, contiguous or not) range(as.numeric(row.names(B)[B[,2]==1])) Again, am I missing something that makes this obvious solution impossible? (Wouldn't be the first time.) Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jim holtman Sent: Wednesday, November 14, 2007 8:39 AM To: affy snp Cc: r-help@r-project.org Subject: Re: [R] How to get row numbers of a subset of rows Here is a way of doing it using 'rle': x - read.table(textConnection( SNPChromosome PhysicalPosition + 1 SNP_A-1909444 1 7924293 + 2 SNP_A-2237149 1 8173763 + 3 SNP_A-4303947 1 8191853 + 4 SNP_A-2236359 1 8323433 + 5 SNP_A-2205441 1 8393263 + 6 SNP_A-1909445 1 7924293 + 7 SNP_A-2237146 2 8173763 + 8 SNP_A-4303946 2 8191853 + 9 SNP_A-2236357 2 8323433 + 10 SNP_A-2205442 2 8393263), header=TRUE) # use rle to get the 'runs' y - rle(x$Chromosome) # create dataframe with start/ends and values start - head(cumsum(c(1, y$lengths)), -1) index - data.frame(values=y$values, start=start, end=start + y$lengths - 1) index values start end 1 1 1 6 2 2 7 10 On Nov 14, 2007 10:56 AM, affy snp [EMAIL PROTECTED] wrote: Hello list, I read in a txt file using B-read.table(file=data.snp,header=TRUE,row.names=NULL) by specifying the row.names=NULL so that the rows are numbered. Below is an example after how the table looks like using B[1:10,1:3] SNPChromosome PhysicalPosition 1 SNP_A-1909444 1 7924293 2 SNP_A-2237149 1 8173763 3 SNP_A-4303947 1 8191853 4 SNP_A-2236359 1 8323433 5 SNP_A-2205441 1 8393263 6 SNP_A-1909445 1 7924293 7 SNP_A-2237146 2 8173763 8 SNP_A-4303946 2 8191853 9 SNP_A-2236357 2 8323433 10 SNP_A-2205442 2 8393263 I am wondering if there is a way to return the start and end row numbers for a subset of rows. For example, If I specify B[,2]=1, I would like to get start=1 and end=6 if B[,2]=2, then start=7 and end=10 Is there any way in R to quickly do this? Thanks a bunch! Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get row numbers of a subset of rows
Thanks a lot, Jim and Bert. It worked pretty well. Best, Allen On Nov 14, 2007 12:11 PM, jim holtman [EMAIL PROTECTED] wrote: That works for the specific value of '1', but you would have to repeat it for other values in the column. If you had 100 different ranges in that column, what would you do? Here is another solution using 'range' on the same data: tapply(seq_len(nrow(x)), x$Chromosome, range) $`1` [1] 1 6 $`2` [1] 7 10 On Nov 14, 2007 12:04 PM, Bert Gunter [EMAIL PROTECTED] wrote: Am I missing something? ... Why not: range(seq(nrow(B))[B[,2]==1] ) ?? ## note: == not = Alternatively, and easily generalized (to start with a frame which is a subset of the original and any subset of rows, contiguous or not) range(as.numeric(row.names(B)[B[,2]==1])) Again, am I missing something that makes this obvious solution impossible? (Wouldn't be the first time.) Bert Gunter Genentech Nonclinical Statistics -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of jim holtman Sent: Wednesday, November 14, 2007 8:39 AM To: affy snp Cc: r-help@r-project.org Subject: Re: [R] How to get row numbers of a subset of rows Here is a way of doing it using 'rle': x - read.table(textConnection( SNPChromosome PhysicalPosition + 1 SNP_A-1909444 1 7924293 + 2 SNP_A-2237149 1 8173763 + 3 SNP_A-4303947 1 8191853 + 4 SNP_A-2236359 1 8323433 + 5 SNP_A-2205441 1 8393263 + 6 SNP_A-1909445 1 7924293 + 7 SNP_A-2237146 2 8173763 + 8 SNP_A-4303946 2 8191853 + 9 SNP_A-2236357 2 8323433 + 10 SNP_A-2205442 2 8393263), header=TRUE) # use rle to get the 'runs' y - rle(x$Chromosome) # create dataframe with start/ends and values start - head(cumsum(c(1, y$lengths)), -1) index - data.frame(values=y$values, start=start, end=start + y$lengths - 1) index values start end 1 1 1 6 2 2 7 10 On Nov 14, 2007 10:56 AM, affy snp [EMAIL PROTECTED] wrote: Hello list, I read in a txt file using B-read.table(file=data.snp,header=TRUE,row.names=NULL) by specifying the row.names=NULL so that the rows are numbered. Below is an example after how the table looks like using B[1:10,1:3] SNPChromosome PhysicalPosition 1 SNP_A-1909444 1 7924293 2 SNP_A-2237149 1 8173763 3 SNP_A-4303947 1 8191853 4 SNP_A-2236359 1 8323433 5 SNP_A-2205441 1 8393263 6 SNP_A-1909445 1 7924293 7 SNP_A-2237146 2 8173763 8 SNP_A-4303946 2 8191853 9 SNP_A-2236357 2 8323433 10 SNP_A-2205442 2 8393263 I am wondering if there is a way to return the start and end row numbers for a subset of rows. For example, If I specify B[,2]=1, I would like to get start=1 and end=6 if B[,2]=2, then start=7 and end=10 Is there any way in R to quickly do this? Thanks a bunch! Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get row numbers of a subset of rows
One way to do this is range(which(B[,2]==1)) Julian affy snp wrote: Hello list, I read in a txt file using B-read.table(file=data.snp,header=TRUE,row.names=NULL) by specifying the row.names=NULL so that the rows are numbered. Below is an example after how the table looks like using B[1:10,1:3] SNPChromosome PhysicalPosition 1 SNP_A-1909444 1 7924293 2 SNP_A-2237149 1 8173763 3 SNP_A-4303947 1 8191853 4 SNP_A-2236359 1 8323433 5 SNP_A-2205441 1 8393263 6 SNP_A-1909445 1 7924293 7 SNP_A-2237146 2 8173763 8 SNP_A-4303946 2 8191853 9 SNP_A-2236357 2 8323433 10 SNP_A-2205442 2 8393263 I am wondering if there is a way to return the start and end row numbers for a subset of rows. For example, If I specify B[,2]=1, I would like to get start=1 and end=6 if B[,2]=2, then start=7 and end=10 Is there any way in R to quickly do this? Thanks a bunch! Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to get row numbers of a subset of rows
Thank you very much, Julian. I got it. Best, Allen On Nov 14, 2007 2:38 PM, Julian Burgos [EMAIL PROTECTED] wrote: One way to do this is range(which(B[,2]==1)) Julian affy snp wrote: Hello list, I read in a txt file using B-read.table(file=data.snp,header=TRUE,row.names=NULL) by specifying the row.names=NULL so that the rows are numbered. Below is an example after how the table looks like using B[1:10,1:3] SNPChromosome PhysicalPosition 1 SNP_A-1909444 1 7924293 2 SNP_A-2237149 1 8173763 3 SNP_A-4303947 1 8191853 4 SNP_A-2236359 1 8323433 5 SNP_A-2205441 1 8393263 6 SNP_A-1909445 1 7924293 7 SNP_A-2237146 2 8173763 8 SNP_A-4303946 2 8191853 9 SNP_A-2236357 2 8323433 10 SNP_A-2205442 2 8393263 I am wondering if there is a way to return the start and end row numbers for a subset of rows. For example, If I specify B[,2]=1, I would like to get start=1 and end=6 if B[,2]=2, then start=7 and end=10 Is there any way in R to quickly do this? Thanks a bunch! Allen __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.