I have an application that is using the UCSC multiple alignments. My problem is that I need to query the NLMSA around 10-100 million times, and get pieces of sequence to compute on. However, I am having trouble figuring out ways to increase the speed, as it seems the bottleneck is accessing the NLMSA and getting the sequence pieces - my current best is only getting about 10/sec.
Does anyone else have any strategies for handling this kind of throughput? Does decreasing the size of the LPO indexes help any? I thought about breaking apart the alignments by chromosome of the reference species, then trying to build them in memory, but rebuilding the alignment takes some time as well (although overall, probably less time than I'm using now). I also have lots of processors to throw at the problem - on the order of 100s - but they're all accessing the same disk (although this disk is very fast and on very expensive hardware). Any ideas would be appreciated! Kenny --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
