Re: pygr.Data - strange behavior

Christopher Lee Thu, 18 Dec 2008 13:51:22 -0800

On Dec 18, 2008, at 12:05 PM, Istvan Albert wrote:

> On the other hand  saving an object, then retrieving the same object
> later seems a common thing to do. It is really strange when you start
> getting back different things just because another module may have
> reloaded pygr.Data. Imagine a threaded webserver that reloads a
> changed module, or a failed data attempt that now wants to obtain a
> fresh copy of the data.
>
> More to the point, in this particular case I don't even know what else
> should one be doing (other than reload) to actually get the file
> itself.

Sure. Back in July I floated a proposal to eliminate this reload()
behavior.
http://groups.google.com/group/pygr-dev/browse_thread/thread/d309166f7ca0ee36/31e771979f92504e#31e771979f92504e

No one seemed particularly interested, so I haven't yet followed that
up. I'll briefly summarize the issue:

- for users to access names from the pygr.Data module namespace, those
names have to be loaded into that namespace during the module import,
since Python provides no dynamic attribute lookup (__getattr__)
mechanism for modules. Names like pygr.Data.Bio or pygr.Data.Physics
have to be added during import (based on reading top-level names like
Bio and Physics from the resource databases), so that users can access
them. This annoying fact causes annoying consequences:

- e.g. pygr.Data must connect to resource databases listed by
PYGRDATAPATH *during the import*, and creates an object
(pygr.Data.getResource) that keeps a cache of all access activity to
those resource databases.

- if you reload the module, you of course get a new getResource object
with an empty cache. When you reload a module you expect a clean
reload with no possible "side-effects" persisting from past usage of
the module before the reload...

Possible Solutions:
- require that all pygr.Data access go through a "root" name, i.e.
pygr.Data.root.Bio.Seq.Genome.HUMAN.hg17. This requires users to type
a few more characters, but eliminates most of these issues. The root
object __getattr__ will be run whenever the user requests a new name,
so pygr.Data would no longer need to connect to resource databases
during module import. That could wait until the user actually
requests some data.

- don't cache objects that undergo unpickling transformations, since
the current behavior of retrieving the object from cache will not give
the expected transformation.

If you or others think this needs to be addressed in the 0.8 release,
we could include it. In the past, no one expressed much interest, so
it was deferred, presumably to the 1.0 "pygr.Data improvements" release.

Thanks for raising this!

-- Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: pygr.Data - strange behavior

Reply via email to