On Jan 14, 2009, at 6:01 AM, Istvan Albert wrote: >> > I am a little uncertain what a root.object meant in your post. I > assume it is a pygr.Data() class instance created the pygr module > namespace, that may have various attributes and methods. This implies > that we may ourselves create such instances that may be customized and > more importantly exist in a scope. It is probably best if the class > had a close() method that could be called explicitly and would close > all resources. This method should also be called from the destructor, > but that is not guaranteed to be called. >
By "root object" I meant an object that would serve as the root of the namespace for requesting (reading) resources from pygr.Data. I see this namespace issue as separate from the issue of being able to create / open specific resource databases independent of the PYGRDATAPATH env variable, which you raised above. Whereas current pygr.Data has no __getattr__ (because it's a module), the root object would just be a regular Python object and therefore can respond dynamically to top-level name requests, allowing me to get rid of the annoyances associated with connecting to resource databases during module import. This root object would be recommended as the standard mechanism for read requests (i.e. if the root object were called "root", then "root.some.foo.bar()" would mean "get me the resource named some.foo.bar, from whatever resource DB that can provide it"). For example, we could put this root object in the pygr/__init__.py module and call the object Data, so it would look the same as current usage: import pygr hg18 = pygr.Data.Bio.Seq.Genome.HUMAN.hg18() If we did this, I guess we'd rename the pygr.Data *module* something else; this would cleanly separate access to the code from access to the namespace, which right now are mixed up with each other. > In general I think it would be best if we had the option of passing > the pygrdatapath as a parameter to the constructor and be able to > configure the instance upon construction. The default behavior would > be the same as now, reading out the 'PYGRDATAPATH' environment > variable. I think we need some absolutely unambiguous terminology to avoid potential confusion. Since pygr.Data is intended only to store metadata about other databases (rather than to *actually* store all their data itself), it is a "metadata database" or "metabase" for short. I propose that we replace the ambiguous term "resource database" with "metabase". If people think this name is ok, perhaps we should rename the pygr.Data module pygr.metabase? Using that terminology, what I proposed as pygr.metabase.path would be the list of metabases specified in PYGRDATAPATH. I propose that pygr.metabase.path will act like a Python list, containing "metabase objects" which each provide an interface to one metabase. E.g. mdb = pygr.metabase.path[0] # get the first metabase in PYGRDATAPATH As far as I can see, sys.path provides a clear model for pygr.metabase.path; both read a list of paths from an environment variable, and provide a pythonic interface to that list, which can be modified by the user in all the usual python ways. Just like the user doesn't need to "do anything" to initialize sys.path to read PYTHONPATH, pygr.metabase.path would be initialized automatically from PYGRDATAPATH. But if you wanted to, you could clear it by simply calling pygr.metabase.path.clear(). What do you think about this? Do you see any reasons not to follow the sys.path model? You would be able to open or create metabases yourself, append them to pygr.metabase.path etc. Thus, you could write code to automate management of metabases. Long term, we should provide a full set of management functions, for easily grouping or deleting resources, copying resources from one metabase to another etc. > > > The same with the namespaces and/or layers. Have a number of sane > names added by default, if someone wants different namespaces, they > can configure them either in the constructor or (if that gets too > overloaded) via some helper methods (of course that means turning off > the defaults as well). > >> resource database from the list in pygr.Data.path, or create / open a >> new resource database by calling an appropriate constructor. The >> interface for saving to a specific resource database object will be >> the same as the generic interface. > > I would need to see an example to understand and be able to comment > more meaningfully but it sounds good. A metabase object would also provide an interface to the namespace, but this would of course ONLY access resources whose metadata is stored in this metabase. Is that what you wanted? This namespace would work both for reading and writing resources to this metabase. E.g. using the metabase object obtained in the example above: mdb.Data.Bio.Annotation.ASAP.humanJan06 = mydata # save a resource to this metabase A long-range comment on namespace conventions: pygr.Data names function as identifiers, i.e. they unify different datasets by allowing them to refer to each other using these identifier strings. An identifier only works if it is stable. I think it's important that pygr.Data uphold this important data integrity principle. You can rename things if you provide a mechanism for "aliases" that redirect the old name to the new name. Such aliases could be specific to one metabase (in which the renaming event occurred). Thus you could initially create a resource as one name, within your "local zone" metabase. When you decide to publish it to a public zone, you could first rename it within your local metabase, before copying it to the public metabase. This alias would be kept in your local metabase, so that using the old name in code and data would still work within that zone. In the public zone only the new name would be visible. > PS. forgot to add this to the public post. I think having the same > data is very important. That is pygr.Data.Bio.foo is always the same > object (right now, and this what the original post was about, you can > have two different objects depending on whether the resource has been > reloaded or not) Sure. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "pygr-dev" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/pygr-dev?hl=en -~----------~----~----~----~------~----~------~--~---
