> how much faster is your library processor than iterating over blasting > all the sequences in the simple-minded approach?
It all depends on the size of the blast database you are comparing you library to. In my case, I had a longish (1.2kb) sequence that I needed to compare lots of 454 reads to. For each read inside the blastall itself the 1.2kb reference sequence needs to be read using fastacmd which we know is slow, plus there's some initialization going on,which also requires some time, multiply that by the number of reads (on the order of 10^5) and you can see that there's a lot of unnecessary work being performed. > Then there is the issue of testing -- do you > have (or could you write) a test suite that would test this > thoroughly, that we could incorporate into our standard tests or > megatests (if the tests would take too long or require resources too > big to be incorporated in the standard release package)? I have no experience writing test case, but I suppose I could give it a go early next week. I'm thinking of following the usage example I described above, i.e. one long sequence as a blast database, plus a bunch of its sub-sequences with known much coordinates as a library. This should catch all of the problems I can think of. Cheers, Alex P.S. > FYI, I have summarized the blast refactoring in the Pygr Dev > discussion group and also somewhat in the issue tracker. Sorry, I fell about a year behind on the new going-ons with pygr, trying to catch up. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to pygr-dev@googlegroups.com To unsubscribe from this group, send email to pygr-dev+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---