doggysaywhat <chwh...@ucsd.edu> writes: > My apologies for the context problem. I'll explain. > > df1 is a matrix of genes labeled g1 through g5 with start positions in the > START column and end positions in the END column. > > df2 is a matrix of chromatin modification values at positions along the DNA. > > I want to average chromatin modification values for each gene from the start > to the end position. So this would involve pulling out all values for > column C0 that are between pos 200 and 700 for the first gene and averaging > them. Then, I would pull all values from 500 to 1000, and continue for each > gene.
This type of operation is what the IRanges and GenomicRanges packages were developed for. Suggest you install both (from bioconductor.org), then review http://www.bioconductor.org/help/course-materials/2011/CSAMA/Tuesday/Morning%20Talks/IRangesLecture.pdf and the vignettes for those packages and the help page for 'findOverlaps'. If that doesn't solve your problem, post to the bioconductor list. HTH, Chuck > > The example I gave previously was a short one, but I will be doing this for > around 1000 genes with different positions. This is why just removing one > group. > > This was something I tried to come up with that allowed me to use start and > end positions. Your advice to use the cut is working. > > start<-df1[,2] > end<-df1[,3] > > while(i<length(start)){ > i<-i+1 > print(cut(df2[,1],c(start[i],end[i]))) > } > > These were the results > > [1] <NA> (200,700] <NA> <NA> <NA> <NA> <NA> > [8] <NA> <NA> <NA> <NA> <NA> <NA> <NA> > [15] <NA> <NA> <NA> <NA> <NA> > Levels: (200,700] > [1] <NA> <NA> (500,1e+03] (500,1e+03] <NA> <NA> > [7] <NA> <NA> <NA> <NA> <NA> <NA> > [13] <NA> <NA> <NA> <NA> <NA> <NA> > [19] <NA> > Levels: (500,1e+03] > [1] <NA> <NA> <NA> <NA> <NA> > [6] (2e+03,3e+03] (2e+03,3e+03] <NA> <NA> <NA> > [11] <NA> <NA> <NA> <NA> <NA> > [16] <NA> <NA> <NA> <NA> > Levels: (2e+03,3e+03] > [1] <NA> <NA> <NA> <NA> <NA> > [6] <NA> <NA> <NA> <NA> (4e+03,6e+03] > [11] (4e+03,6e+03] (4e+03,6e+03] (4e+03,6e+03] <NA> <NA> > [16] <NA> <NA> <NA> <NA> > Levels: (4e+03,6e+03] > [1] <NA> <NA> <NA> <NA> <NA> > [6] <NA> <NA> <NA> <NA> <NA> > [11] <NA> <NA> <NA> <NA> <NA> > [16] (7e+03,8e+03] (7e+03,8e+03] <NA> <NA> > Levels: (7e+03,8e+03] > > > This is producing the right bins for each of the results, but I'm not sure > how to put this into a data frame. When I did this. > > > start<-df1[,2] > end<-df1[,3] > > while(i<length(start)){ > i<-i+1 > bins<-(cut(df2[,1],c(start[i],end[i]))) > } > > the bins variable was the last level. > Is there a way to assign the results of the of the while statement to a > dataframe? > > Many thanks > > -- > View this message in context: > http://r.789695.n4.nabble.com/Averaging-within-a-range-of-values-tp4291958p4294061.html > Sent from the R help mailing list archive at Nabble.com. > -- Charles C. Berry Dept of Family/Preventive Medicine cberry at ucsd edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.