Thanks Martin! I will try it and will let your guys know how it goes.

On Fri, Nov 5, 2010 at 9:42 AM, Martin Morgan <[email protected]> wrote:

> On 11/05/2010 09:13 AM, Changbin Du wrote:
> > HI, Phil,
> >
> > I used the following codes and run it overnight for 15 hours, this
> morning,
> > I stopped it. It seems it is still not efficient.
> >
> >
> >>
> >
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> > sep="\t", skip=0, header=F,fill=T) #
> >> names(matt)<-c("id","reads")
> >
> >> dim(matt)
> > [1] 3384766       2
>
> [snip]
>
> >>> On Thu, 4 Nov 2010, Changbin Du wrote:
> >>>
> >>>  HI, Dear R community,
> >>>>
> >>>> I have one data set like this,  What I want to do is to calculate the
> >>>> cumulative coverage. The following codes works for small data set
> (#rows
> >>>> =
> >>>> 100), but when feed the whole data set,  it still running after 24
> hours.
> >>>> Can someone give some suggestions for long vector?
> >>>>
> >>>> id                reads
> >>>> Contig79:1    4
> >>>> Contig79:2    8
> >>>> Contig79:3    13
> >>>> Contig79:4    14
> >>>> Contig79:5    17
> >>>> Contig79:6    20
> >>>> Contig79:7    25
> >>>> Contig79:8    27
> >>>> Contig79:9    32
> >>>> Contig79:10    33
> >>>> Contig79:11    34
> >>>>
> >>>>
> >>>>
> matt<-read.table("/house/groupdirs/genetic_analysis/mjblow/ILLUMINA_ONLY_MICROBIAL_GENOME_ASSEMBLY/4083340/STANDARD_LIBRARY/GWZW.994.5.1129.trim_69.fastq.19621832.sub.sorted.bam.clone.depth",
> >>>> sep="\t", skip=0, header=F,fill=T) #
> >>>> dim(matt)
> >>>> [1] 3384766       2
> >>>>
> >>>> matt_plot<-function(matt, outputfile) {
> >>>> names(matt)<-c("id","reads")
> >>>>
> >>>> cover<-matt$reads
> >>>>
> >>>>
> >>>> #calculate the cumulative coverage.
> >>>> + cover_per<-function (data) {
> >>>> + output<-numeric(0)
> >>>> + for (i in data) {
> >>>> +           x<-(100*sum(ifelse(data >= i, 1, 0))/length(data))
> >>>> +           output<-c(output, x)
> >>>> +                 }
> >>>> + return(output)
> >>>> + }
> >>>>
> >>>>
> >>>> result<-cover_per(cover)
>
> Hi Changbin
>
> If I understand correctly, your contigs 'start' at position 1, and have
> 'width' equal to matt$reads. You'd like to know the coverage at the last
> covered location of each contig in matt$reads.
>
> ## first time only
> source("http://bioconductor.org";)
> biocLite("IRanges")
>
> ##
> library(IRanges)
> contigs = IRanges(start=1, width=matt$reads)
> cvg = coverage(contigs) ## an RLE summarizing coverage, from position 1
> as.vector(cvg[matt$reads]) / nrow(matt)  ## at the end of each contig
>
> for a larger data set:
>
> > matt=data.frame(reads=ceiling(as.integer(runif(3384766, 1, 100))))
> > contigs = IRanges(start=1, width=matt$reads)
> > system.time(cvg <- coverage(contigs))
>   user  system elapsed
>  5.145   0.050   5.202
>
> Martin
>
> >>>>
> >>>>
> >>>> Thanks so much!
> >>>>
> >>>>
> >>>> --
> >>>> Sincerely,
> >>>> Changbin
> >>>> --
> >>>>
> >>>>        [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> [email protected] mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>>
> >>
> >>
> >> --
> >> Sincerely,
> >> Changbin
> >> --
> >>
> >> Changbin Du
> >> DOE Joint Genome Institute
> >> Bldg 400 Rm 457
> >> 2800 Mitchell Dr
> >> Walnut Creet, CA 94598
> >> Phone: 925-927-2856
> >>
> >>
> >>
> >
> >
>
>
> --
> Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
>
> Location: M1-B861
> Telephone: 206 667-2793
>



-- 
Sincerely,
Changbin
--

Changbin Du
DOE Joint Genome Institute
Bldg 400 Rm 457
2800 Mitchell Dr
Walnut Creet, CA 94598
Phone: 925-927-2856

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to