[web2py] Re: Dangerous web2py object-reuse pattern?

Daniel Gonzalez Mon, 09 Jul 2012 16:02:41 -0700

Sorry to bug you here, but:

1) Thread safety: sure, i need to handle that.
2) Threads and processes will be killed by web2py? How? My libraries are 
creating threads (for example, to track the _changes api of couchdb). This 
is not at all in control of web2py. I mean, of course, initially a request 
to web2py is triggering the instantiation of an object which creates the 
thread (via the "threading" python library). Is web2py in control of this 
background thread? I do not think so, since I am not using any web2py 
facilities to create it.
3) couchdb and REST: sure, couchdb follows REST, which is very lightweight 
in terms of object instantiation to "connect" to the database. But it is 
still not zero cost, and since I could have lots of requests, I would like 
to avoid repeatedly creating these objects. Besides, there are some threads 
which must be always running, like the client of the _changes api (which is 
basically a long-poller to couchdb).


I think most of my problems are coming from the fact that I want to *reuse* 
my libraries with web2py. Which I think should be somehow feasible.

Daniel

On Tuesday, July 10, 2012 12:23:58 AM UTC+2, Massimo Di Pierro wrote:
>
> As long as you do not use DAL to connect to couchdb then you do not need 
> to worry about web2py closing the connections.
>
> You still have two problems:
> 1) thread safety, you may need to mutex lock the objects
> 2) threads and processes will be killed by the web server at will and 
> therefore you have no guarantee they will persist across requests.
>
> I you have sized number of connections and the number is not too high, you 
> can create a background process that communicates only with localhost via - 
> for example - xmlrpc. Then your web app would basically act as a proxy of 
> that background process.
>
> I cannot help more, since there are many details involved. Mover couchdb 
> talks over rest therefore I am not sure I understand the meaning of 
> persistent connection. Perhaps web2py's cache.ram may be sufficient.
>
> Massimo
>
> On Monday, 9 July 2012 09:45:36 UTC-5, Daniel Gonzalez wrote:
>>
>> I can make subscriber_pool threadsafe, can't I?
>>
>> The point is that I would like to have some objects to be persistent 
>> between requests, to avoid the cost of re-creating them each time. This is 
>> nothing that can be managed by the DAL. You could think of it as a 
>> long-to-setup object that I need to obtain the results expected by the 
>> JSONRPC requests hitting web2py.
>>
>> To give you more detail: I have a pool of subscribers, belonging to 
>> different "organizations". They are requesting data related to their 
>> organization, via JSONRPC requests to web2py. The data is not controlled by 
>> web2py, but is in external couchdb instances. I already have libraries to 
>> access and manipulate this data, so I do not want to create new models for 
>> it. *But* I need to connecto to those couchdb instances, and create my 
>> library objects which know how to process this external data. Those are the 
>> objects that I want to be persistent, because:
>>
>>    1. A user can send requests very fast (1 s period)
>>    2. Several users can belong to the same organization. Thus, they can 
>>    reuse the same subscriber object.
>>    
>> For how long must these objects be in the pool? This is something that I 
>> have not yet decided myself. Probably for as long as they are being needed, 
>> which means that I will implement a timeout (let's say 30s). If they are 
>> used within that timeframe, they get to stay alive. If they timeout, they 
>> get destroyed and will be recreated in the next request, thus incurring in 
>> a penalty. This is a trade-off between speed and cache size (or memory 
>> leak, if you want to see it that way).
>>
>> I could use a message queue (beanstalkd, celery?) to communicate with 
>> these "workers", but in my first implementation I would like to keep it as 
>> simple as possible. I am directly using the libraries from withing web2py. 
>> So far, this is working fine, but I must confess that I have not yet 
>> performed concurrent access tests.
>>
>> On Monday, July 9, 2012 4:03:36 PM UTC+2, Massimo Di Pierro wrote:
>>>
>>> I am not sure what class Subscriber does but, if it uses the DAL, than 
>>> you code is problematic.
>>> The DAL is "smart" in the sense that it keep tracks of all connection 
>>> open in certain thread and closes them all (or pools them) when the thread 
>>> ends. By using a module, you store references to those connections (which 
>>> may be closed or no, depending on the thread) in a persistent object, 
>>> subscriber_pool, thus making your code not thread safe.
>>>
>>> Can you please explain in more detail what are you trying to accomplish? 
>>> How long should the connections be cached? What is their scope? I am sure 
>>> there is a better way. :-)
>>>
>>>
>>> On Monday, 9 July 2012 02:32:28 UTC-5, Daniel Gonzalez wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am using the following pattern to use my libraries with web2py: some 
>>>> of my utilities are creating connections to databases where I have data 
>>>> that I need to serve with web2py. This data is not modeled with web2py 
>>>> models. In order to avoid re-creating these connections with every single 
>>>> request (I have a site which is performing requests with a 1s period), I 
>>>> have discovered that the importing of modules is not cleaning the module 
>>>> global variables. So now what I am doing is creating a cache for the 
>>>> objects that I want to be persistent across requests. This is my code, in 
>>>> file subscribers.py:
>>>>
>>>> class SubscriberPoolCls:
>>>>     def __init__(self):
>>>>         self.subscriber_pool = { }
>>>>
>>>>
>>>>     def get_subscriber(self, subsriber_id, myorg):
>>>>         log.info('get_subscriber > Requested subscriber_id=%s myorg=%d' 
>>>> % (subsriber_id, myorg))
>>>>         if not subsriber_id in self.subscriber_pool:
>>>>             self.subscriber_pool[subsriber_id] = 
>>>> Subscriber(myorg,subsriber_id
>>>> )
>>>>         return self.subscriber_pool[subsriber_id]
>>>>
>>>>
>>>>     def unsubscribe_all(self):
>>>>         for subsriber_id in self.subscriber_pool:
>>>>             self.subscriber_pool[subsriber_id].unsubscribe()
>>>>
>>>>
>>>> _subscriber_pool = None
>>>>
>>>>
>>>> def SubscriberPool():
>>>>     global _subscriber_pool
>>>>     if not _subscriber_pool:
>>>>         _subscriber_pool = SubscriberPoolCls()
>>>>     return _subscriber_pool
>>>>
>>>> And then in a request I do the following (default.py):
>>>>
>>>> from   subscribers      import SubscriberPool
>>>>
>>>> subscriber_pool   = SubscriberPool()
>>>>
>>>> ...
>>>>
>>>> def init_session():
>>>>     if not session.my_session_id:
>>>>         session.my_session_id = get_uuid()
>>>>
>>>> ...
>>>>
>>>> @auth.requires_login()
>>>> @service.jsonrpc
>>>> def get_call_details(pars_json):
>>>>     init_session()
>>>>     myorg = session.auth.user.org_id
>>>>     pars = simplejson.loads(pars_json)
>>>>     subscriber = subscriber_pool.get_subscriber(session.my_session_id,myorg
>>>> )
>>>>     activity_cdr = subscriber.get_call_details(pars['cdr_doc_id'])
>>>>     response = {
>>>>
>>>>         'cdr_details' : activity_cdr,
>>>>         }
>>>>     return simplejson.dumps(response)
>>>>
>>>> By doing this I can create a subscriber object associated to the 
>>>> session and the organization, and I get to reuse this object in subsequent 
>>>> requests.
>>>>
>>>> Now I have the following questions:
>>>>
>>>>    1. Why are the imported modules keeping the global variables? 
>>>>    default.py is not, as far as I can tell. I would say it is reparsed 
>>>> with 
>>>>    each request.
>>>>    2. I have the problem that my object cache 
>>>>    (SubscriberPoolCls.subscriber_pool) can grow indefinitely. I do not 
>>>> know 
>>>>    how or when to delete entries from this cache.
>>>>    3. Do you think this pattern is dangerous? Do you have an 
>>>>    alternative?
>>>>
>>>> Thanks,
>>>>
>>>> Daniel
>>>>
>>>

[web2py] Re: Dangerous web2py object-reuse pattern?

Reply via email to