Re: pygr.Data - strange behavior

Christopher Lee Wed, 14 Jan 2009 20:48:39 -0800


On Jan 14, 2009, at 6:10 PM, Istvan Albert wrote:

>
>
>
> On Jan 14, 7:29 pm, Christopher Lee <[email protected]> wrote:
>
>> mdb = pygr.metabase.path[0] # get the first metabase in PYGRDATAPATH
>>
>> As far as I can see, sys.path provides a clear model for
>> pygr.metabase.path
>
>> calling pygr.metabase.path.clear().  What do you think about this?   
>> Do
>> you see any reasons not to follow the sys.path model?
>
> I agree with almost everything that you wrote. The only thing that I
> don't like is that the metabase.path appears to be a global variable,
> and that seems like a holdover from the current pygr.Data that is also
> a global name. These global variables/names tend to cause lots of
> complications.

In the Python standard library, sys.path is also a global variable.   
Obviously there are only a few cases where a global variable is  
justified, usually because data integrity requires it.  sys.path acts  
as a global control over the import search path (mirroring the global  
environment variable PYTHONPATH), so I guess that is justified.

I think one aspect of pygr.Data does require a global instance: the  
cache of currently loaded resources.  pygr.Data must guarantee that  
two different references to the same ID within the same interpreter  
session will get the same Python object.  All sorts of subtle bugs  
arise (i.e. users can tie themselves in all sorts of knots without  
realizing it, depending on the exact order in which they ask for (or  
cross-reference) various resources), if this basic data integrity  
principle is not guaranteed.  pygr.Data ensures this by keeping  
currently loaded resources in a cache, which should act like a global  
variable (i.e. there must be only one cache).  You can have as many  
different metabases as you want, but they should coordinate what  
objects they load, via this single cache.

>
>
> Why not make this path an attribute of the instance (object) itself.
> That way one can have two Metabase instances that are independent from
> one another. For example inside __init__.py it may look like so:
>
> Data = Metabase()

Not exactly.  This root object would not itself be a metabase, but  
instead an interface to the *list* of metabases (which by default will  
be the list specified by PYGRDATAPATH). I noticed in our emails back  
and forth that we're making different assumptions about what a  
metabase is.  You talk about it as an object that is initialized with  
a *list* of locations, whereas I'm talking about a single database (of  
metadata) that lives in one definite location.  So when I talk about  
pushing a resource from one metabase to another (e.g. from a metabase  
on my local disk to an XMLRPC server metabase accessible to anyone on  
the internet), that unambiguously means copying that resource from the  
first location to the second location.  That is a standard operation  
people will do all the time.  If I try to recast this operation in  
your terms (i.e. to copy a resource from one "list of metabases" to  
another "list of metabases"), it no longer seems like a clear,  
standard operation; in fact, I am not even sure what exactly it should  
be defined to mean.

Of course, there's no reason you can't have two different "lists of  
metabases" that you can search separately, if you really want to.

>
>
> this creates a Data object initialized from the environment variable,
> that works exactly as before. You can add metadata to it but you can
> also inspect its path to see where it can read the metadata from:
>
> pygr.Data.Bio.Genome.Yeast = FooClass()
> print pygr.Data.path
>
> At the same time one can also write in their own program something
> like:
>
> workdata = Metabase( path=[ 'foo/bar/.localpygrdata' ]  )
> workdata.Mystuff.Chunk1 = BarClass()

side comment: sure, you can make up whatever names you want for  
resources in a non-public metabase.  But if you later wanted to  
publish them (i.e. push them to a public metabase) you'd first have to  
change the names, presumably to put them into a subdomain (of the  
public namespace) that you (or your publisher) has control of.  That  
would use the alias mechanism I suggested in a previous email.

>
>
> and the two  stay independent and we don't need to do anything to keep
> them separate. Its path is different from the one above:
>
> print workdata.path
>
> Istvan

In general, I think it's better to separate "code that knows how to  
save a certain kind of resource" from the actual path settings.   
That's what PYGRDATAPATH lets you do: you write code that makes no  
assumptions about what metabase the data should be stored in, and you  
can make it save to any metabase you want by changing the  
PYGRDATAPATH.  If I understand right, you want that same modularity  
but in the form of a variable you can pass around, instead of mucking  
with PYGRDATAPATH.  That's certainly what I was proposing.  At any  
time you could instantiate a new metabase object and pass that to a  
function, to make that function do all its resource reading and  
writing from that metabase instead of the list of metabases specified  
by PYGRDATAPATH.

I was planning on making the interface for accessing a namespace  
identical regardless of whether it's from a single metabase or a list  
of metabases. (i.e either object would have an attribute called  
something like Data, which would be the root of that namespace). e.g.  
mdb.Data; path.Data.

I notice that you and I also seem to constitute the relation between  
"Data" and path oppositely.  E.g. you write Data.path (which  
presumably means the list of metabases associated with this  
namespace).  Whereas I write path.Data (the root of the namespace  
associated with this list of metabases).  For one thing, I don't want  
to have anything in the namespace other than what it's supposed to  
contain, i.e. data.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: pygr.Data - strange behavior

Reply via email to