On Mon, Dec 30, 2013 at 5:27 PM, William Ray Wing <[email protected]> wrote: > On Dec 30, 2013, at 7:54 PM, "Protas, Meredith" <[email protected]> > wrote: > >> Thanks for all of your comments! I am working with human genome information >> which is in the form of many very short DNA sequence reads. I am using a >> script that sorts through all of these sequences and picks out ones that >> contain a particular sequence I'm interested in. Because my data set is so >> big, I have the data on an external hard drive (but that's where I had it >> before when it was faster too).
A strong suggestion: please show the content of the program to a professional programmer and get their informed analysis on the program. If it's possible, providing a clear description on what problem the program is trying to solve would be very helpful. It's very possible that the current program you're working with is not written with efficiency in mind. In many domains, efficiency isn't such a concern because the input is relatively small. But in bioinformatics, the inputs are huge (on the order of gigabytes or terabytes), and the proper use of memory and cpu matter a lot. In a previous thread on python-tutor, a bioinformatician was asking how to load their whole data set into memory. After a few questions, we realized their data set was about 100 gigabytes or so. Most of us here then tried to convince the original questioner to reconsider, that whatever performance gains they thought they were getting by read the whole file into memory were probably delusional dreams. I guess I'm trying to say: if you can, show us the source. Maybe there's something there that needs to be fixed. And maybe Python isn't even the right tool for the job. From the limited description you've provided of the problem---searching for a pattern among a database of short sequences---I'm wondering if you're using BLAST or not. (http://blast.ncbi.nlm.nih.gov/Blast.cgi) _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
