On Aug 27, 2009, at 3:58 PM, C. Titus Brown wrote:
> Generic == good. I worry a bit about adding too many "data loading" > scripts into pygr itself, though; every text format is slightly > different, and a generic solution seems unlikely. I'd rather that we > focus instead on adding useful public data sets to worldbase, AND > providing plenty of examples of how to load stuff into pygr from > various > useful text data formats. Both would be great. Yes, this is really the way to go. Worldbase should in principle free people from having to replicate all the work that somebody else did to load a given dataset. Once one person puts the dataset in worldbase, ideally everyone else just requests it by name (and sets download=True if they want), end of story. Right away we can do that for genomes, NLMSA; and can add more data types over time. For example, we ought to provide a download=True mode for getting an AnnotationDB, probably as a point release (0.8.2?). I also agree with you about trying to keep a clean separation between Pygr (i.e. the core code) vs. all sorts of specific application code. But we need a one-stop shop where people can access, use and contribute such "application code". This should be a git repository (to enable all the nice cross-collaboration possibilities). The only question for me is whether it should just be in the Pygr git repository (as a separate directory or directories, not inside the core pygr/pygr directory), or just its own git repository. I can think of one good reason for keeping it in the existing Pygr git repository: these scripts are going to depend on the core-pygr version, so it makes sense for them to be updated along with core pygr. Actually, if this were done in a systematic way, they could be an awesome set of additional functional tests that we should run "early and often" to ensure that we haven't broken Pygr in any way during our development process. If the application code is kept in a completely different git repository, I think we run a greater risk that old code there will gradually "go stale" (i.e. no longer work with the current Pygr version). Whereas if we make it part of the standard Pygr git repo, we could make it part of our release process to run all the application scripts as functional tests (and delete or move to an "obsolete" folder those that no longer work). One puzzle: where to keep example datasets (needed for testing the application code)? We definitely don't want to put those in the Pygr git repo. > > Hmm, though, perhaps an adapter that lets you take a CSV DictReader > with > particular field names and turn it into annotations, and/or gives > you a > way to specify a callable to turn a row into a annotation > information... > that might be usefully generic. Yup. We're definitely looking for a general "design pattern" here... -- Chris --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
