Hi Chris, thanks so much for taking care of this. Looking forward to testing the feature. Cheers, Paul
-- Paul Rigor http://www.ics.uci.edu/~prigor On Mon, Apr 29, 2013 at 11:03 AM, Christopher Lee <l...@chem.ucla.edu>wrote: > Hi Paul, > sorry, this mix of Python, Pyrex and C code is awfully dense and hard to > make sense of. However, based on looking at the code a few days ago, I > think the task of limiting the number of files that are opened at once > during readMAFfiles() can be done in the relatively simple way I outlined. > The build_ifile array is completely internal to readMAFfiles(); it is not > passed to any other function. Note that the later call to buildFiles() > (and hence to each NLMSASequence.build_files()) as its first step simply > closes the build_ifile on each NLMSASequence (we'd only need to make very > minor adjustments to that code). So once readMAFfiles() is done writing, > everything else is done one file at a time. Thus our task really does not > extend outside readMAFfiles() itself. > > It sounds like it'd be most efficient if I try to write code for this over > the next few days, then you can take a look at the changes and see what you > think... > > The question of limiting the number of files that are opened during > regular usage of the NLMSA (i.e. querying the alignment database) is > completely separate. I believe the current default mode of opening files > only "onDemand" should keep the number of files from getting too big. If > we need to, we can later add code for again automatically closing some > files if the number gets too big. > > Chris > > > > On Apr 28, 2013, at 6:03 PM, Paul Rigor wrote: > > > Hi Chris, > > > > I don't think replacing build_ifile and nbuild arrays with the FileQueue > you mentioned will be this straight forward. These two variables are used > outside of the readMAFfiles method, eg, loading indexes later on. > > > > Also, there are other implicit counters to nlmsa objects (thus their > associated interval files) that will need to be maintained, eg, inlmsa and > self.id. The linear id scheme for the interval files is not obviously > amenable to LRU caching. > > > > Also, the creation of new lpo sequence cannot be easily bound to the new > filecache -- it's used everywhere and i'm not sure about all of the > dependencies for other opened file handles. Further, it's unclear how to > open an associated interval file once it's closed. In other words, the code > (from the latest git repo) isn't self-explanatory at the moment. > Additionally, the saveInterval() method is quite confusing. What is it > actually doing? It's argument list isn't consistent with examples of actual > calls. The same goes for the newSequence() method. > > > > I'm trying to piece together a solution using the LRUcache extension > from the PyTables project, but not modifying the newSequence() method > throws things off because of its implicit id generation. > > > > What is the best way to isolate the changes? Again, I've just gone > through the relevant code the past couple of days, so I'm probably > misunderstanding a few things ;-) > > > > Thank you again for your time! > > Paul > > > > > > -- > > Paul Rigor > > http://www.ics.uci.edu/~prigor > > > > -- > You received this message because you are subscribed to the Google Groups > "pygr-dev" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to pygr-dev+unsubscr...@googlegroups.com. > To post to this group, send email to pygr-dev@googlegroups.com. > Visit this group at http://groups.google.com/group/pygr-dev?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > > -- You received this message because you are subscribed to the Google Groups "pygr-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to pygr-dev+unsubscr...@googlegroups.com. To post to this group, send email to pygr-dev@googlegroups.com. Visit this group at http://groups.google.com/group/pygr-dev?hl=en. For more options, visit https://groups.google.com/groups/opt_out.