Sorry to bug you here, but:
1) Thread safety: sure, i need to handle that.
2) Threads and processes will be killed by web2py? How? My libraries are
creating threads (for example, to track the _changes api of couchdb). This
is not at all in control of web2py. I mean, of course, initially a request
to web2py is triggering the instantiation of an object which creates the
thread (via the "threading" python library). Is web2py in control of this
background thread? I do not think so, since I am not using any web2py
facilities to create it.
3) couchdb and REST: sure, couchdb follows REST, which is very lightweight
in terms of object instantiation to "connect" to the database. But it is
still not zero cost, and since I could have lots of requests, I would like
to avoid repeatedly creating these objects. Besides, there are some threads
which must be always running, like the client of the _changes api (which is
basically a long-poller to couchdb).
I think most of my problems are coming from the fact that I want to *reuse*
my libraries with web2py. Which I think should be somehow feasible.
Daniel
On Tuesday, July 10, 2012 12:23:58 AM UTC+2, Massimo Di Pierro wrote:
>
> As long as you do not use DAL to connect to couchdb then you do not need
> to worry about web2py closing the connections.
>
> You still have two problems:
> 1) thread safety, you may need to mutex lock the objects
> 2) threads and processes will be killed by the web server at will and
> therefore you have no guarantee they will persist across requests.
>
> I you have sized number of connections and the number is not too high, you
> can create a background process that communicates only with localhost via -
> for example - xmlrpc. Then your web app would basically act as a proxy of
> that background process.
>
> I cannot help more, since there are many details involved. Mover couchdb
> talks over rest therefore I am not sure I understand the meaning of
> persistent connection. Perhaps web2py's cache.ram may be sufficient.
>
> Massimo
>
> On Monday, 9 July 2012 09:45:36 UTC-5, Daniel Gonzalez wrote:
>>
>> I can make subscriber_pool threadsafe, can't I?
>>
>> The point is that I would like to have some objects to be persistent
>> between requests, to avoid the cost of re-creating them each time. This is
>> nothing that can be managed by the DAL. You could think of it as a
>> long-to-setup object that I need to obtain the results expected by the
>> JSONRPC requests hitting web2py.
>>
>> To give you more detail: I have a pool of subscribers, belonging to
>> different "organizations". They are requesting data related to their
>> organization, via JSONRPC requests to web2py. The data is not controlled by
>> web2py, but is in external couchdb instances. I already have libraries to
>> access and manipulate this data, so I do not want to create new models for
>> it. *But* I need to connecto to those couchdb instances, and create my
>> library objects which know how to process this external data. Those are the
>> objects that I want to be persistent, because:
>>
>> 1. A user can send requests very fast (1 s period)
>> 2. Several users can belong to the same organization. Thus, they can
>> reuse the same subscriber object.
>>
>> For how long must these objects be in the pool? This is something that I
>> have not yet decided myself. Probably for as long as they are being needed,
>> which means that I will implement a timeout (let's say 30s). If they are
>> used within that timeframe, they get to stay alive. If they timeout, they
>> get destroyed and will be recreated in the next request, thus incurring in
>> a penalty. This is a trade-off between speed and cache size (or memory
>> leak, if you want to see it that way).
>>
>> I could use a message queue (beanstalkd, celery?) to communicate with
>> these "workers", but in my first implementation I would like to keep it as
>> simple as possible. I am directly using the libraries from withing web2py.
>> So far, this is working fine, but I must confess that I have not yet
>> performed concurrent access tests.
>>
>> On Monday, July 9, 2012 4:03:36 PM UTC+2, Massimo Di Pierro wrote:
>>>
>>> I am not sure what class Subscriber does but, if it uses the DAL, than
>>> you code is problematic.
>>> The DAL is "smart" in the sense that it keep tracks of all connection
>>> open in certain thread and closes them all (or pools them) when the thread
>>> ends. By using a module, you store references to those connections (which
>>> may be closed or no, depending on the thread) in a persistent object,
>>> subscriber_pool, thus making your code not thread safe.
>>>
>>> Can you please explain in more detail what are you trying to accomplish?
>>> How long should the connections be cached? What is their scope? I am sure
>>> there is a better way. :-)
>>>
>>>
>>> On Monday, 9 July 2012 02:32:28 UTC-5, Daniel Gonzalez wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am using the following pattern to use my libraries with web2py: some
>>>> of my utilities are creating connections to databases where I have data
>>>> that I need to serve with web2py. This data is not modeled with web2py
>>>> models. In order to avoid re-creating these connections with every single
>>>> request (I have a site which is performing requests with a 1s period), I
>>>> have discovered that the importing of modules is not cleaning the module
>>>> global variables. So now what I am doing is creating a cache for the
>>>> objects that I want to be persistent across requests. This is my code, in
>>>> file subscribers.py:
>>>>
>>>> class SubscriberPoolCls:
>>>> def __init__(self):
>>>> self.subscriber_pool = { }
>>>>
>>>>
>>>> def get_subscriber(self, subsriber_id, myorg):
>>>> log.info('get_subscriber > Requested subscriber_id=%s myorg=%d'
>>>> % (subsriber_id, myorg))
>>>> if not subsriber_id in self.subscriber_pool:
>>>> self.subscriber_pool[subsriber_id] =
>>>> Subscriber(myorg,subsriber_id
>>>> )
>>>> return self.subscriber_pool[subsriber_id]
>>>>
>>>>
>>>> def unsubscribe_all(self):
>>>> for subsriber_id in self.subscriber_pool:
>>>> self.subscriber_pool[subsriber_id].unsubscribe()
>>>>
>>>>
>>>> _subscriber_pool = None
>>>>
>>>>
>>>> def SubscriberPool():
>>>> global _subscriber_pool
>>>> if not _subscriber_pool:
>>>> _subscriber_pool = SubscriberPoolCls()
>>>> return _subscriber_pool
>>>>
>>>> And then in a request I do the following (default.py):
>>>>
>>>> from subscribers import SubscriberPool
>>>>
>>>> subscriber_pool = SubscriberPool()
>>>>
>>>> ...
>>>>
>>>> def init_session():
>>>> if not session.my_session_id:
>>>> session.my_session_id = get_uuid()
>>>>
>>>> ...
>>>>
>>>> @auth.requires_login()
>>>> @service.jsonrpc
>>>> def get_call_details(pars_json):
>>>> init_session()
>>>> myorg = session.auth.user.org_id
>>>> pars = simplejson.loads(pars_json)
>>>> subscriber = subscriber_pool.get_subscriber(session.my_session_id,myorg
>>>> )
>>>> activity_cdr = subscriber.get_call_details(pars['cdr_doc_id'])
>>>> response = {
>>>>
>>>> 'cdr_details' : activity_cdr,
>>>> }
>>>> return simplejson.dumps(response)
>>>>
>>>> By doing this I can create a subscriber object associated to the
>>>> session and the organization, and I get to reuse this object in subsequent
>>>> requests.
>>>>
>>>> Now I have the following questions:
>>>>
>>>> 1. Why are the imported modules keeping the global variables?
>>>> default.py is not, as far as I can tell. I would say it is reparsed
>>>> with
>>>> each request.
>>>> 2. I have the problem that my object cache
>>>> (SubscriberPoolCls.subscriber_pool) can grow indefinitely. I do not
>>>> know
>>>> how or when to delete entries from this cache.
>>>> 3. Do you think this pattern is dangerous? Do you have an
>>>> alternative?
>>>>
>>>> Thanks,
>>>>
>>>> Daniel
>>>>
>>>