Thanks Martin! I will try it and will let your guys know how it goes.
On Fri, Nov 5, 2010 at 9:42 AM, Martin Morgan <[email protected]> wrote: > On 11/05/2010 09:13 AM, Changbin Du wrote: > > HI, Phil, > > > > I used the following codes and run it overnight for 15 hours, this > morning, > > I stopped it. It seems it is still not efficient. > > > > > >> > > > matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth", > > sep="\t", skip=0, header=F,fill=T) # > >> names(matt)<-c("id","reads") > > > >> dim(matt) > > [1] 3384766 2 > > [snip] > > >>> On Thu, 4 Nov 2010, Changbin Du wrote: > >>> > >>> HI, Dear R community, > >>>> > >>>> I have one data set like this, What I want to do is to calculate the > >>>> cumulative coverage. The following codes works for small data set > (#rows > >>>> = > >>>> 100), but when feed the whole data set, it still running after 24 > hours. > >>>> Can someone give some suggestions for long vector? > >>>> > >>>> id reads > >>>> Contig79:1 4 > >>>> Contig79:2 8 > >>>> Contig79:3 13 > >>>> Contig79:4 14 > >>>> Contig79:5 17 > >>>> Contig79:6 20 > >>>> Contig79:7 25 > >>>> Contig79:8 27 > >>>> Contig79:9 32 > >>>> Contig79:10 33 > >>>> Contig79:11 34 > >>>> > >>>> > >>>> > matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth", > >>>> sep="\t", skip=0, header=F,fill=T) # > >>>> dim(matt) > >>>> [1] 3384766 2 > >>>> > >>>> matt_plot<-function(matt, outputfile) { > >>>> names(matt)<-c("id","reads") > >>>> > >>>> cover<-matt$reads > >>>> > >>>> > >>>> #calculate the cumulative coverage. > >>>> + cover_per<-function (data) { > >>>> + output<-numeric(0) > >>>> + for (i in data) { > >>>> + x<-(100*sum(ifelse(data >= i, 1, 0))/length(data)) > >>>> + output<-c(output, x) > >>>> + } > >>>> + return(output) > >>>> + } > >>>> > >>>> > >>>> result<-cover_per(cover) > > Hi Changbin > > If I understand correctly, your contigs 'start' at position 1, and have > 'width' equal to matt$reads. You'd like to know the coverage at the last > covered location of each contig in matt$reads. > > ## first time only > source("http://bioconductor.org") > biocLite("IRanges") > > ## > library(IRanges) > contigs = IRanges(start=1, width=matt$reads) > cvg = coverage(contigs) ## an RLE summarizing coverage, from position 1 > as.vector(cvg[matt$reads]) / nrow(matt) ## at the end of each contig > > for a larger data set: > > > matt=data.frame(reads=ceiling(as.integer(runif(3384766, 1, 100)))) > > contigs = IRanges(start=1, width=matt$reads) > > system.time(cvg <- coverage(contigs)) > user system elapsed > 5.145 0.050 5.202 > > Martin > > >>>> > >>>> > >>>> Thanks so much! > >>>> > >>>> > >>>> -- > >>>> Sincerely, > >>>> Changbin > >>>> -- > >>>> > >>>> [[alternative HTML version deleted]] > >>>> > >>>> ______________________________________________ > >>>> [email protected] mailing list > >>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>> PLEASE do read the posting guide > >>>> http://www.R-project.org/posting-guide.html > >>>> and provide commented, minimal, self-contained, reproducible code. > >>>> > >>>> > >> > >> > >> -- > >> Sincerely, > >> Changbin > >> -- > >> > >> Changbin Du > >> DOE Joint Genome Institute > >> Bldg 400 Rm 457 > >> 2800 Mitchell Dr > >> Walnut Creet, CA 94598 > >> Phone: 925-927-2856 > >> > >> > >> > > > > > > > -- > Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 > > Location: M1-B861 > Telephone: 206 667-2793 > -- Sincerely, Changbin -- Changbin Du DOE Joint Genome Institute Bldg 400 Rm 457 2800 Mitchell Dr Walnut Creet, CA 94598 Phone: 925-927-2856 [[alternative HTML version deleted]] ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

