On Jan 14, 2009, at 6:10 PM, Istvan Albert wrote:
> > > > On Jan 14, 7:29 pm, Christopher Lee <[email protected]> wrote: > >> mdb = pygr.metabase.path[0] # get the first metabase in PYGRDATAPATH >> >> As far as I can see, sys.path provides a clear model for >> pygr.metabase.path > >> calling pygr.metabase.path.clear(). What do you think about this? >> Do >> you see any reasons not to follow the sys.path model? > > I agree with almost everything that you wrote. The only thing that I > don't like is that the metabase.path appears to be a global variable, > and that seems like a holdover from the current pygr.Data that is also > a global name. These global variables/names tend to cause lots of > complications. In the Python standard library, sys.path is also a global variable. Obviously there are only a few cases where a global variable is justified, usually because data integrity requires it. sys.path acts as a global control over the import search path (mirroring the global environment variable PYTHONPATH), so I guess that is justified. I think one aspect of pygr.Data does require a global instance: the cache of currently loaded resources. pygr.Data must guarantee that two different references to the same ID within the same interpreter session will get the same Python object. All sorts of subtle bugs arise (i.e. users can tie themselves in all sorts of knots without realizing it, depending on the exact order in which they ask for (or cross-reference) various resources), if this basic data integrity principle is not guaranteed. pygr.Data ensures this by keeping currently loaded resources in a cache, which should act like a global variable (i.e. there must be only one cache). You can have as many different metabases as you want, but they should coordinate what objects they load, via this single cache. > > > Why not make this path an attribute of the instance (object) itself. > That way one can have two Metabase instances that are independent from > one another. For example inside __init__.py it may look like so: > > Data = Metabase() Not exactly. This root object would not itself be a metabase, but instead an interface to the *list* of metabases (which by default will be the list specified by PYGRDATAPATH). I noticed in our emails back and forth that we're making different assumptions about what a metabase is. You talk about it as an object that is initialized with a *list* of locations, whereas I'm talking about a single database (of metadata) that lives in one definite location. So when I talk about pushing a resource from one metabase to another (e.g. from a metabase on my local disk to an XMLRPC server metabase accessible to anyone on the internet), that unambiguously means copying that resource from the first location to the second location. That is a standard operation people will do all the time. If I try to recast this operation in your terms (i.e. to copy a resource from one "list of metabases" to another "list of metabases"), it no longer seems like a clear, standard operation; in fact, I am not even sure what exactly it should be defined to mean. Of course, there's no reason you can't have two different "lists of metabases" that you can search separately, if you really want to. > > > this creates a Data object initialized from the environment variable, > that works exactly as before. You can add metadata to it but you can > also inspect its path to see where it can read the metadata from: > > pygr.Data.Bio.Genome.Yeast = FooClass() > print pygr.Data.path > > At the same time one can also write in their own program something > like: > > workdata = Metabase( path=[ 'foo/bar/.localpygrdata' ] ) > workdata.Mystuff.Chunk1 = BarClass() side comment: sure, you can make up whatever names you want for resources in a non-public metabase. But if you later wanted to publish them (i.e. push them to a public metabase) you'd first have to change the names, presumably to put them into a subdomain (of the public namespace) that you (or your publisher) has control of. That would use the alias mechanism I suggested in a previous email. > > > and the two stay independent and we don't need to do anything to keep > them separate. Its path is different from the one above: > > print workdata.path > > Istvan In general, I think it's better to separate "code that knows how to save a certain kind of resource" from the actual path settings. That's what PYGRDATAPATH lets you do: you write code that makes no assumptions about what metabase the data should be stored in, and you can make it save to any metabase you want by changing the PYGRDATAPATH. If I understand right, you want that same modularity but in the form of a variable you can pass around, instead of mucking with PYGRDATAPATH. That's certainly what I was proposing. At any time you could instantiate a new metabase object and pass that to a function, to make that function do all its resource reading and writing from that metabase instead of the list of metabases specified by PYGRDATAPATH. I was planning on making the interface for accessing a namespace identical regardless of whether it's from a single metabase or a list of metabases. (i.e either object would have an attribute called something like Data, which would be the root of that namespace). e.g. mdb.Data; path.Data. I notice that you and I also seem to constitute the relation between "Data" and path oppositely. E.g. you write Data.path (which presumably means the list of metabases associated with this namespace). Whereas I write path.Data (the root of the namespace associated with this list of metabases). For one thing, I don't want to have anything in the namespace other than what it's supposed to contain, i.e. data. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
