On Aug 27, 2009, at 3:58 PM, C. Titus Brown wrote:

> Generic == good.  I worry a bit about adding too many "data loading"
> scripts into pygr itself, though; every text format is slightly
> different, and a generic solution seems unlikely.  I'd rather that we
> focus instead on adding useful public data sets to worldbase, AND
> providing plenty of examples of how to load stuff into pygr from  
> various
> useful text data formats.  Both would be great.

Yes, this is really the way to go.  Worldbase should in principle free  
people from having to replicate all the work that somebody else did to  
load a given dataset.  Once one person puts the dataset in worldbase,  
ideally everyone else just requests it by name (and sets download=True  
if they want), end of story.  Right away we can do that for genomes,  
NLMSA; and can add more data types over time.  For example, we ought  
to provide a download=True mode for getting an AnnotationDB, probably  
as a point release (0.8.2?).

I also agree with you about trying to keep a clean separation between  
Pygr (i.e. the core code) vs. all sorts of specific application code.   
But we need a one-stop shop where people can access, use and  
contribute such "application code".  This should be a git repository  
(to enable all the nice cross-collaboration possibilities).  The only  
question for me is whether it should just be in the Pygr git  
repository (as a separate directory or directories, not inside the  
core pygr/pygr directory), or just its own git repository.  I can  
think of one good reason for keeping it in the existing Pygr git  
repository: these scripts are going to depend on the core-pygr  
version, so it makes sense for them to be updated along with core  
pygr.  Actually, if this were done in a systematic way, they could be  
an awesome set of additional functional tests that we should run  
"early and often" to ensure that we haven't broken Pygr in any way  
during our development process.

If the application code is kept in a completely different git  
repository, I think we run a greater risk that old code there will  
gradually "go stale" (i.e. no longer work with the current Pygr  
version).  Whereas if we make it part of the standard Pygr git repo,  
we could make it part of our release process to run all the  
application scripts as functional tests (and delete or move to an  
"obsolete" folder those that no longer work).

One puzzle: where to keep example datasets (needed for testing the  
application code)?  We definitely don't want to put those in the Pygr  
git repo.

>
> Hmm, though, perhaps an adapter that lets you take a CSV DictReader  
> with
> particular field names and turn it into annotations, and/or gives  
> you a
> way to specify a callable to turn a row into a annotation  
> information...
> that might be usefully generic.

Yup.  We're definitely looking for a general "design pattern" here...

-- Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to