Re: pygr.Data - strange behavior

Christopher Lee Wed, 14 Jan 2009 16:30:06 -0800

I thought it might be useful if I outlined...

Basic principles for long-term development (targeted as release 1.0):


- a metabase represents a "zone of access", i.e. a set of resources  
that *can* be accessed together (presumably because they are stored  
"in the same place" as the metabase), and that a user *wants* to  
access together (e.g. one metabase might represent a "sandbox" in  
which a developer prototypes and initially tests a set of resources,  
but doesn't want them visible to any other processes or users).

- a user would typically add resources to a test metabase (i.e. a  
metabase not in his usual PYGRDATAPATH), and later "publish" them to  
his personal metabase, a "group" metabase (for his co-workers), and  
finally to a public metabase (accessible to the internet in general).   
Thus, copying resource info from one metabase to another becomes a  
system for publishing data.

- we would aim to make this copying process totally automatic and  
transparent, for both remote access (i.e. a server that accepts  
queries from remote clients) or fully transferring data to a user's  
local filesystem (in the spirit of the current download=True mechanism).

- there would be a DNS-like system for finding "the nearest available  
instance" of resources in the public namespace.

- just like an organization or group controls its subdomain in the DNS  
address space (i.e. they control what names get added to that  
subdomain, and what each name maps to), they could "own" a piece of  
the pygr.Data namespace (e.g. the Santa Cruz Genome center would  
control the subdomain Bio.MSA.UCSC), and would publish resources into  
that subdomain.  Initially those resources would physically live only  
on the site where they were originally published, but as more people  
requested those resources to be pulled to their own servers for high  
speed access, popular resources would automatically get distributed to  
many sites around the world, which would then serve both requests to  
use them (by remote clients) and to download them to users' local  
filesystems.

- Obviously all this needs to be secure, using the well-established  
framework of public key signatures and GPG-style networks of trust.   
That should be implemented at the basic level, i.e. pickles should be  
signed and verifiable.  With that infrastructure in place, *code* can  
also be published this way, i.e. there would be a "pygr.Code"  
namespace representing both APIs and implementations.  A given dataset  
(content) would specify an interface required for opening it; that  
interface would have a default implementation (or a user could specify  
they want a different implementation as an "alias" in their  
metabase).  If the user already has that module, it gets used in the  
usual pickle way.  If he doesn't, the code gets pulled from "the  
nearest available instance" in the usual metabase-DNS way, its  
signature verified, and checked against the user's network of trust.


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: pygr.Data - strange behavior

Reply via email to