[web2py] Re: Dangerous web2py object-reuse pattern?

Massimo Di Pierro Mon, 09 Jul 2012 19:17:15 -0700

2) Sorry, not by web2py. Threads and processes are managed (started and 
killed) by the web server. Web applications (in any framework) should never 
start their own threads.



On Monday, 9 July 2012 18:02:37 UTC-5, Daniel Gonzalez wrote:
>
> Sorry to bug you here, but:
>
> 1) Thread safety: sure, i need to handle that.
> 2) Threads and processes will be killed by web2py? How? My libraries are 
> creating threads (for example, to track the _changes api of couchdb). This 
> is not at all in control of web2py. I mean, of course, initially a request 
> to web2py is triggering the instantiation of an object which creates the 
> thread (via the "threading" python library). Is web2py in control of this 
> background thread? I do not think so, since I am not using any web2py 
> facilities to create it.
> 3) couchdb and REST: sure, couchdb follows REST, which is very lightweight 
> in terms of object instantiation to "connect" to the database. But it is 
> still not zero cost, and since I could have lots of requests, I would like 
> to avoid repeatedly creating these objects. Besides, there are some threads 
> which must be always running, like the client of the _changes api (which is 
> basically a long-poller to couchdb).
>
> I think most of my problems are coming from the fact that I want to 
> *reuse* my libraries with web2py. Which I think should be somehow feasible.
>
> Daniel
>
> On Tuesday, July 10, 2012 12:23:58 AM UTC+2, Massimo Di Pierro wrote:
>>
>> As long as you do not use DAL to connect to couchdb then you do not need 
>> to worry about web2py closing the connections.
>>
>> You still have two problems:
>> 1) thread safety, you may need to mutex lock the objects
>> 2) threads and processes will be killed by the web server at will and 
>> therefore you have no guarantee they will persist across requests.
>>
>> I you have sized number of connections and the number is not too high, 
>> you can create a background process that communicates only with localhost 
>> via - for example - xmlrpc. Then your web app would basically act as a 
>> proxy of that background process.
>>
>> I cannot help more, since there are many details involved. Mover couchdb 
>> talks over rest therefore I am not sure I understand the meaning of 
>> persistent connection. Perhaps web2py's cache.ram may be sufficient.
>>
>> Massimo
>>
>> On Monday, 9 July 2012 09:45:36 UTC-5, Daniel Gonzalez wrote:
>>>
>>> I can make subscriber_pool threadsafe, can't I?
>>>
>>> The point is that I would like to have some objects to be persistent 
>>> between requests, to avoid the cost of re-creating them each time. This is 
>>> nothing that can be managed by the DAL. You could think of it as a 
>>> long-to-setup object that I need to obtain the results expected by the 
>>> JSONRPC requests hitting web2py.
>>>
>>> To give you more detail: I have a pool of subscribers, belonging to 
>>> different "organizations". They are requesting data related to their 
>>> organization, via JSONRPC requests to web2py. The data is not controlled by 
>>> web2py, but is in external couchdb instances. I already have libraries to 
>>> access and manipulate this data, so I do not want to create new models for 
>>> it. *But* I need to connecto to those couchdb instances, and create my 
>>> library objects which know how to process this external data. Those are the 
>>> objects that I want to be persistent, because:
>>>
>>>    1. A user can send requests very fast (1 s period)
>>>    2. Several users can belong to the same organization. Thus, they can 
>>>    reuse the same subscriber object.
>>>    
>>> For how long must these objects be in the pool? This is something that I 
>>> have not yet decided myself. Probably for as long as they are being needed, 
>>> which means that I will implement a timeout (let's say 30s). If they are 
>>> used within that timeframe, they get to stay alive. If they timeout, they 
>>> get destroyed and will be recreated in the next request, thus incurring in 
>>> a penalty. This is a trade-off between speed and cache size (or memory 
>>> leak, if you want to see it that way).
>>>
>>> I could use a message queue (beanstalkd, celery?) to communicate with 
>>> these "workers", but in my first implementation I would like to keep it as 
>>> simple as possible. I am directly using the libraries from withing web2py. 
>>> So far, this is working fine, but I must confess that I have not yet 
>>> performed concurrent access tests.
>>>
>>> On Monday, July 9, 2012 4:03:36 PM UTC+2, Massimo Di Pierro wrote:
>>>>
>>>> I am not sure what class Subscriber does but, if it uses the DAL, than 
>>>> you code is problematic.
>>>> The DAL is "smart" in the sense that it keep tracks of all connection 
>>>> open in certain thread and closes them all (or pools them) when the thread 
>>>> ends. By using a module, you store references to those connections (which 
>>>> may be closed or no, depending on the thread) in a persistent object, 
>>>> subscriber_pool, thus making your code not thread safe.
>>>>
>>>> Can you please explain in more detail what are you trying to 
>>>> accomplish? How long should the connections be cached? What is their 
>>>> scope? 
>>>> I am sure there is a better way. :-)
>>>>
>>>>
>>>> On Monday, 9 July 2012 02:32:28 UTC-5, Daniel Gonzalez wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am using the following pattern to use my libraries with web2py: some 
>>>>> of my utilities are creating connections to databases where I have data 
>>>>> that I need to serve with web2py. This data is not modeled with web2py 
>>>>> models. In order to avoid re-creating these connections with every single 
>>>>> request (I have a site which is performing requests with a 1s period), I 
>>>>> have discovered that the importing of modules is not cleaning the module 
>>>>> global variables. So now what I am doing is creating a cache for the 
>>>>> objects that I want to be persistent across requests. This is my code, in 
>>>>> file subscribers.py:
>>>>>
>>>>> class SubscriberPoolCls:
>>>>>     def __init__(self):
>>>>>         self.subscriber_pool = { }
>>>>>
>>>>>
>>>>>     def get_subscriber(self, subsriber_id, myorg):
>>>>>         log.info('get_subscriber > Requested subscriber_id=%s 
>>>>> myorg=%d' % (subsriber_id, myorg))
>>>>>         if not subsriber_id in self.subscriber_pool:
>>>>>             self.subscriber_pool[subsriber_id] = 
>>>>> Subscriber(myorg,subsriber_id
>>>>> )
>>>>>         return self.subscriber_pool[subsriber_id]
>>>>>
>>>>>
>>>>>     def unsubscribe_all(self):
>>>>>         for subsriber_id in self.subscriber_pool:
>>>>>             self.subscriber_pool[subsriber_id].unsubscribe()
>>>>>
>>>>>
>>>>> _subscriber_pool = None
>>>>>
>>>>>
>>>>> def SubscriberPool():
>>>>>     global _subscriber_pool
>>>>>     if not _subscriber_pool:
>>>>>         _subscriber_pool = SubscriberPoolCls()
>>>>>     return _subscriber_pool
>>>>>
>>>>> And then in a request I do the following (default.py):
>>>>>
>>>>> from   subscribers      import SubscriberPool
>>>>>
>>>>> subscriber_pool   = SubscriberPool()
>>>>>
>>>>> ...
>>>>>
>>>>> def init_session():
>>>>>     if not session.my_session_id:
>>>>>         session.my_session_id = get_uuid()
>>>>>
>>>>> ...
>>>>>
>>>>> @auth.requires_login()
>>>>> @service.jsonrpc
>>>>> def get_call_details(pars_json):
>>>>>     init_session()
>>>>>     myorg = session.auth.user.org_id
>>>>>     pars = simplejson.loads(pars_json)
>>>>>     subscriber = 
>>>>> subscriber_pool.get_subscriber(session.my_session_id,myorg
>>>>> )
>>>>>     activity_cdr = subscriber.get_call_details(pars['cdr_doc_id'])
>>>>>     response = {
>>>>>
>>>>>         'cdr_details' : activity_cdr,
>>>>>         }
>>>>>     return simplejson.dumps(response)
>>>>>
>>>>> By doing this I can create a subscriber object associated to the 
>>>>> session and the organization, and I get to reuse this object in 
>>>>> subsequent 
>>>>> requests.
>>>>>
>>>>> Now I have the following questions:
>>>>>
>>>>>    1. Why are the imported modules keeping the global variables? 
>>>>>    default.py is not, as far as I can tell. I would say it is reparsed 
>>>>> with 
>>>>>    each request.
>>>>>    2. I have the problem that my object cache 
>>>>>    (SubscriberPoolCls.subscriber_pool) can grow indefinitely. I do not 
>>>>> know 
>>>>>    how or when to delete entries from this cache.
>>>>>    3. Do you think this pattern is dangerous? Do you have an 
>>>>>    alternative?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Daniel
>>>>>
>>>>

[web2py] Re: Dangerous web2py object-reuse pattern?

Reply via email to