I agree that the server-side joins will be the limiting step. I'll checkout the 'view' classes you mentioned and also Titus's pygr draw git repo. In any case, the GFF schema consists of separate tables for 'experiment sources' and actual data (chromosome, start, stop, and other interval attributes -- genes, exons, conservation scores, etc). It actually might involve joining 3 tables? I have to double check.
Regarding the csv tool I mentioned, the tool set provides very generic (and customizable) creation of pygr annotation db's from csv files (using either sqlite, mysql, postgres, and/or other back-ends supported by SQLAlchemy). SQLAlchemy is just used for its more generic introspection to obtain the table schema (as well as table schema creation and table instantiation). The csv files are required to have fieldnames and the toolset determines the column value type (integer, string, float) automatically by using the first row of data. The schema is generated from the column types, and populates the database accordingly. The pygr annotation db/nlmsa's are then created separately. We've separated these two steps (database table creation/population and pygr annotation db creation), but I'm sure they can be integrated to provide a one-click-n-go tool. There are a couple of caveats we've encountered. Sometimes the first row of data isn't representative of the data type. For example, a column intended to be a numeric value might be represented with 'N/A' (or None, or blank). If the user knows the field type beforehand, then our tool provides ways of customizing the table creation step. In fact, schema creation and table instantiation are in separate steps (two different classes/objects to work with) in order to provide such functionality. I can add this to the 'baldiapis' on my git. I'll have to make sure my PI agrees, our lab is cited/referenced and my colleagues have no problem disseminating portions of our code. Thanks, Paul On Thu, Aug 27, 2009 at 3:58 PM, C. Titus Brown <[email protected]> wrote: > > On Thu, Aug 27, 2009 at 03:35:41PM -0700, Christopher Lee wrote: > -> > By the way, we have several in-house tools for generating annotation > -> > databases from csv files: from cvs -> sql table -> pygr annotatino > -> > db. Could these tools be incorporated into pygr? > -> > -> csv is a standard format, and if bioinformatics people are already > -> using it widely, it would make sense for Pygr to be able to read those > -> data conveniently. I guess it depends on whether your tools are > -> reasonably general, or will only work for one specific application... > -> Can you describe your tools a bit? > -> > -> Titus, what do you think? > > Generic == good. I worry a bit about adding too many "data loading" > scripts into pygr itself, though; every text format is slightly > different, and a generic solution seems unlikely. I'd rather that we > focus instead on adding useful public data sets to worldbase, AND > providing plenty of examples of how to load stuff into pygr from various > useful text data formats. Both would be great. > > Hmm, though, perhaps an adapter that lets you take a CSV DictReader with > particular field names and turn it into annotations, and/or gives you a > way to specify a callable to turn a row into a annotation information... > that might be usefully generic. > > cheers, > --titus > -- > C. Titus Brown, [email protected] > > > > -- Paul Rigor http://www.paulrigor.net/ http://www.ics.uci.edu/~prigor --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
