On Dec 18, 2008, at 12:05 PM, Istvan Albert wrote:

> On the other hand  saving an object, then retrieving the same object
> later seems a common thing to do. It is really strange when you start
> getting back different things just because another module may have
> reloaded pygr.Data. Imagine a threaded webserver that reloads a
> changed module, or a failed data attempt that now wants to obtain a
> fresh copy of the data.
>
> More to the point, in this particular case I don't even know what else
> should one be doing (other than reload) to actually get the file
> itself.

Sure.  Back in July I floated a proposal to eliminate this reload()  
behavior.
http://groups.google.com/group/pygr-dev/browse_thread/thread/d309166f7ca0ee36/31e771979f92504e#31e771979f92504e

No one seemed particularly interested, so I haven't yet followed that  
up.  I'll briefly summarize the issue:

- for users to access names from the pygr.Data module namespace, those  
names have to be loaded into that namespace during the module import,  
since Python provides no dynamic attribute lookup (__getattr__)  
mechanism for modules.  Names like pygr.Data.Bio or pygr.Data.Physics  
have to be added during import (based on reading top-level names like  
Bio and Physics from the resource databases), so that users can access  
them.  This annoying fact causes annoying consequences:

- e.g. pygr.Data must connect to resource databases listed by  
PYGRDATAPATH *during the import*, and creates an object  
(pygr.Data.getResource) that keeps a cache of all access activity to  
those resource databases.

- if you reload the module, you of course get a new getResource object  
with an empty cache.  When you reload a module you expect a clean  
reload with no possible "side-effects" persisting from past usage of  
the module before the reload...

Possible Solutions:
- require that all pygr.Data access go through a "root" name, i.e.  
pygr.Data.root.Bio.Seq.Genome.HUMAN.hg17.  This requires users to type  
a few more characters, but eliminates most of these issues.  The root  
object __getattr__ will be run whenever the user requests a new name,  
so pygr.Data would no longer need to connect to resource databases  
during module import.  That could wait until the user actually  
requests some data.

- don't cache objects that undergo unpickling transformations, since  
the current behavior of retrieving the object from cache will not give  
the expected transformation.

If you or others think this needs to be addressed in the 0.8 release,  
we could include it.  In the past, no one expressed much interest, so  
it was deferred, presumably to the 1.0 "pygr.Data improvements" release.

Thanks for raising this!

-- Chris

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pygr-dev" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/pygr-dev?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to