[pygr] Re: not reinventing the wheel on Annotations

Paul Rigor (gmail) Thu, 27 Aug 2009 16:45:14 -0700

I agree that the server-side joins will be the limiting step.  I'll checkout
the 'view' classes you mentioned and also Titus's pygr draw git repo.
In any case, the GFF schema consists of separate tables for 'experiment
sources' and actual data (chromosome, start, stop, and other interval
attributes -- genes, exons, conservation scores, etc).  It actually might
involve joining 3 tables? I have to double check.

Regarding the csv tool I mentioned, the tool set provides very generic (and
customizable) creation of pygr annotation db's from csv files (using either
sqlite, mysql, postgres, and/or other back-ends supported by SQLAlchemy).
 SQLAlchemy is just used for its more generic introspection to obtain the
table schema (as well as table schema creation and table instantiation).

The csv files are required to have fieldnames and the toolset determines the
column value type (integer, string, float) automatically by using the first
row of data.  The schema is generated from the column types, and populates
the database accordingly.  The pygr annotation db/nlmsa's are then created
separately.  We've separated these two steps (database table
creation/population and pygr annotation db creation), but I'm sure they can
be integrated to provide a one-click-n-go tool.

There are a couple of caveats we've encountered. Sometimes the first row of
data isn't representative of the data type.  For example, a column intended
to be a numeric value might be represented with 'N/A' (or None, or blank).
 If the user knows the field type beforehand, then our tool provides ways of
customizing the table creation step.  In fact, schema creation and table
instantiation are in separate steps (two different classes/objects to work
with) in order to provide such functionality.

I can add this to the 'baldiapis' on my git.  I'll have to make sure my PI
agrees, our lab is cited/referenced and my colleagues have no problem
disseminating portions of our code.

Thanks,
Paul

On Thu, Aug 27, 2009 at 3:58 PM, C. Titus Brown <[email protected]> wrote:

>
> On Thu, Aug 27, 2009 at 03:35:41PM -0700, Christopher Lee wrote:
> -> > By the way, we have several in-house tools for generating annotation
> -> > databases from csv files: from cvs -> sql table -> pygr annotatino
> -> > db.  Could these tools be incorporated into pygr?
> ->
> -> csv is a standard format, and if bioinformatics people are already
> -> using it widely, it would make sense for Pygr to be able to read those
> -> data conveniently.  I guess it depends on whether your tools are
> -> reasonably general, or will only work for one specific application...
> -> Can you describe your tools a bit?
> ->
> -> Titus, what do you think?
>
> Generic == good.  I worry a bit about adding too many "data loading"
> scripts into pygr itself, though; every text format is slightly
> different, and a generic solution seems unlikely.  I'd rather that we
> focus instead on adding useful public data sets to worldbase, AND
> providing plenty of examples of how to load stuff into pygr from various
> useful text data formats.  Both would be great.
>
> Hmm, though, perhaps an adapter that lets you take a CSV DictReader with
> particular field names and turn it into annotations, and/or gives you a
> way to specify a callable to turn a row into a annotation information...
> that might be usefully generic.
>
> cheers,
> --titus
> --
> C. Titus Brown, [email protected]
>
> >
>

-- 
Paul Rigor
http://www.paulrigor.net/
http://www.ics.uci.edu/~prigor

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

[pygr] Re: not reinventing the wheel on Annotations

Reply via email to