On Thu, Aug 27, 2009 at 04:44:33PM -0700, Christopher Lee wrote: -> On Aug 27, 2009, at 3:58 PM, C. Titus Brown wrote: -> > Generic == good. I worry a bit about adding too many "data loading" -> > scripts into pygr itself, though; every text format is slightly -> > different, and a generic solution seems unlikely. I'd rather that we -> > focus instead on adding useful public data sets to worldbase, AND -> > providing plenty of examples of how to load stuff into pygr from -> > various -> > useful text data formats. Both would be great. -> -> Yes, this is really the way to go. Worldbase should in principle free -> people from having to replicate all the work that somebody else did to -> load a given dataset. Once one person puts the dataset in worldbase, -> ideally everyone else just requests it by name (and sets download=True -> if they want), end of story. Right away we can do that for genomes, -> NLMSA; and can add more data types over time. For example, we ought -> to provide a download=True mode for getting an AnnotationDB, probably -> as a point release (0.8.2?).
Good idea. Should we put it in the issue tracker? -> I also agree with you about trying to keep a clean separation between -> Pygr (i.e. the core code) vs. all sorts of specific application code. -> But we need a one-stop shop where people can access, use and -> contribute such "application code". This should be a git repository -> (to enable all the nice cross-collaboration possibilities). The only -> question for me is whether it should just be in the Pygr git -> repository (as a separate directory or directories, not inside the -> core pygr/pygr directory), or just its own git repository. I can -> think of one good reason for keeping it in the existing Pygr git -> repository: these scripts are going to depend on the core-pygr -> version, so it makes sense for them to be updated along with core -> pygr. Actually, if this were done in a systematic way, they could be -> an awesome set of additional functional tests that we should run -> "early and often" to ensure that we haven't broken Pygr in any way -> during our development process. Yep! I'm mildly against putting it in the pygr git repository, but I don't have a good reason... perhaps it's just a vestige of centralized VCS view that giving commit access to the main repo is a big deal ;) -> If the application code is kept in a completely different git -> repository, I think we run a greater risk that old code there will -> gradually "go stale" (i.e. no longer work with the current Pygr -> version). Whereas if we make it part of the standard Pygr git repo, -> we could make it part of our release process to run all the -> application scripts as functional tests (and delete or move to an -> "obsolete" folder those that no longer work). Why does it need to be in the main repository to be included in the release process? We could put it in (e.g.) 'pygr-examples' and grant more general commit access, which would relieve you from being the sole maintainer, too. I'd like to run such things as part of the daily continuous integration runs, too. -> One puzzle: where to keep example datasets (needed for testing the -> application code)? We definitely don't want to put those in the Pygr -> git repo. How about an FTP site that is rsync'd from some master somewhere (or a git repo, just not one hosted on github)? As long as it's got a URL and a nice network connection, it doesn't really matter where we put it ;) cheers, --titus -- C. Titus Brown, [email protected] --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
