I actually have a similar problem of slow loading times for a large number of intervals in an NLMSA (10^6). I have the sample code and input file here (Beware, file size is 26MB):
http://www.ics.uci.edu/~baldig/pygr_nlmsa_test.tar.gz Input file (bowtie_mapped_reads.txt) is 1 million reads from bowtie mapping. All code has been added to one file necessary for parsing, reading, and inserting the data into pygr (pygr_nlmsa_test.py). Using "time" module, I get a loading time from pygr.Data.getResource(...) of ~50 sec. Looking at usage with "top" shows: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME DATA COMMAND 32134 kdaily 25 0 361m 268m 4336 S 0 1.7 1:29.49 1:29 264m ipython ... Mem: 16475040k total, 1745548k used, 14729492k free, 70924k buffers Swap: 65537156k total, 6812k used, 65530344k free, 1126804k cached Also, any hints on better ways of implementing this kind of functionality are appreciated! Thanks, Kenny On Mar 12, 9:32 pm, Christopher Lee <[email protected]> wrote: > I don't think it should take 2 minutes to load that much data. 300,000 > * 24 = 7 MB. That can be read in a fraction of a second. Is it > possible that your system is somehow going into virtual memory > swapping or some other very slow state? Otherwise, we need a > reproducible for your performance problem, so we can debug it. > -- Chris > > On Mar 12, 2009, at 6:59 PM, Alexander Alekseyenko wrote: > > > > > a few hundred thousands. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
