> I'm heaving difficulties with a dataset containing gene names and positions > of those genes. > Not such a big problem, but each gene has multiple exons so it's hard to say > where de gene starts and where it ends. I want the starting and ending > position of each gene in my dataset. > Attached is the dataset: > http://www.nabble.com/file/p21312449/genlistchrompos.csv genlistchrompos.csv > Column 'B' is the gene name, 'G' is the starting position and 'H' is the > stop position. > You can load the dataset by using: data<-read.csv("genlistchrompos.csv", > sep=";") > I hope someone can help me, it's giving me headaches for a week now:-((.
which(diff(as.numeric(data$Gene))!=0) will give you a vector of the last row in each gene. The start position is obviously the next row after the previous end. Also take a look at split(data, data$Gene) Regards, Richie. Mathematical Sciences Unit HSL ------------------------------------------------------------------------ ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.