Re: [storm] Cache for nonexistent result

Ivan Zakrevskyi Fri, 16 Jan 2015 15:36:27 -0800

> For "Read Committed" and "Read Uncommitted" my patch is not safe, because 
> this levels should not to have "Non-repeatable reads". But for existent 
> object storm also can not provide "repeatable reads". So, it's not mater, 
> will "Non-repeatable" be applied reads for existent object or for nonexistent 
> object.


-- sorry. For "Read Committed" and "Read Uncommitted" my patch is not
safe, because this levels should to have "Non-repeatable reads". But
for existent object storm also can not provide "Non-repeatable reads".
So, it's not mater, will "Non-repeatable reads" be prevent for
existent object or for nonexistent object.


> 2015-01-17 1:12 GMT+02:00 Ivan Zakrevskyi <[email protected]>:
>> Thanks for answer.
>>
>> If you add object to DB by this store, you will can get this object,
>> because store.get() looking object first in store._alive and then in
>> store.nonexistent_cache.
>>
>> If you will create object by other store (for example "master" store
>> or by concurrent thread), then you can get this object after
>> transaction has been stared (but before first reading), if your DB
>> transaction-level is read commited. So, it's not exactly serializable
>> isolation level or repeteable reads.
>>
>> But if you have already tried get this object in current transaction,
>> and will try to get it again, you will get None even if object has
>> been already created by concurrent thread. Of course, you can call
>> store.get() with exists=True argument to bypass nonexistent_cache, or
>> you can even reset nonexistent_cache (it's a public attribute).
>>
>> On the other hand, suppose that an object exists, and you have already
>> got this object in current transaction. After it, suppose, object was
>> changed in DB by concurrent thread. But these changes will not affect
>> your object. I think in this case it does not matter what type of
>> object, None or Model instance. Since the object has been read, it can
>> not be changed even if it has been modified by a parallel process.
>>
>> My patch does not affect store.find(), and, hence, selection. I'm not
>> sure, that phantom reads is possible here, except that
>> store.get_multi(). There is rather a "Non-repeatable reads", than
>> "Phantom reads". Because it can hide changes of certain row (with
>> specified primary key), but not of selection.
>>
>> So, for "Repeatable Read" and "Serializable" my patch is safe (only
>> need add reset of store.nonexistent_cache on commit).
>>
>> For "Read Committed" and "Read Uncommitted" my patch is not safe,
>> because this levels should not to have "Non-repeatable reads". But for
>> existent object storm also can not provide "repeatable reads". So,
>> it's not mater, will "Non-repeatable" be applied reads for existent
>> object or for nonexistent object.
>>
>> Of course, my patch is temporary solution. There is can be more
>> elegant solutions on library level. But it really reduce many DB
>> queries for nonexistent primary keys.
>>
>>
>>
>> 2015-01-16 23:20 GMT+02:00 Free Ekanayaka <[email protected]>:
>>>
>>> See:
>>>
>>> http://en.wikipedia.org/wiki/Isolation_%28database_systems%29
>>>
>>> for reference.
>>>
>>> On Fri, Jan 16, 2015 at 10:19 PM, Free Ekanayaka <[email protected]> wrote:
>>>>
>>>> Hi Ivan,
>>>>
>>>> it feels what you suggest would work safely on for transactions set the 
>>>> serializable isolation level, not repeteable reads down to read 
>>>> uncommitted (since phantom reads could occur there, and the non-existing 
>>>> cache would hide new results).
>>>>
>>>> Cheers
>>>>
>>>> On Fri, Jan 16, 2015 at 5:55 PM, Ivan Zakrevskyi 
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hi, all. Thanks for answer. I'll try to explain.
>>>>>
>>>>> Try to get existent object.
>>>>>
>>>>> In [2]: store.get(StTwitterProfile, (1,3))
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id 
>>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 3)'
>>>>> Out[2]: <users.orm.TwitterProfile at 0x7f1e93b6d450>
>>>>>
>>>>> In [3]: store.get(StTwitterProfile, (1,3))
>>>>> Out[3]: <users.orm.TwitterProfile at 0x7f1e93b6d450>
>>>>>
>>>>> In [4]: store.get(StTwitterProfile, (1,3))
>>>>> Out[4]: <users.orm.TwitterProfile at 0x7f1e93b6d450>
>>>>>
>>>>> You can see, that storm made only one query.
>>>>>
>>>>> Ok, now try get nonexistent twitter profile for given context:
>>>>>
>>>>> In [5]: store.get(StTwitterProfile, (10,3))
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id 
>>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'
>>>>>
>>>>> In [6]: store.get(StTwitterProfile, (10,3))
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id 
>>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'
>>>>>
>>>>> In [7]: store.get(StTwitterProfile, (10,3))
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id 
>>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'
>>>>>
>>>>> Storm sends a query to the database each time.
>>>>>
>>>>> For example, we have a some util:
>>>>>
>>>>> def myutil(user_id, *args, **kwargs):
>>>>>     context_id = 
>>>>> get_context_from_mongodb_redis_memcache_environment_etc(user_id, *args, 
>>>>> **kwargs)
>>>>>     twitter_profile = store.get(TwitterProfile, (context_id, user_id))
>>>>>     return twitter_profile.some_attr
>>>>>
>>>>> In this case, Storm will send a query to the database each time.
>>>>>
>>>>> The similar situation for non-existent relation.
>>>>>
>>>>> In [20]: u = store.get(StUser, 10)
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM user WHERE user.id = %s LIMIT 1; args=(10,)'
>>>>>
>>>>>
>>>>> In [22]: u.profile
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s 
>>>>> LIMIT 1; args=(10,)'
>>>>>
>>>>> In [23]: u.profile
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s 
>>>>> LIMIT 1; args=(10,)'
>>>>>
>>>>> In [24]: u.profile
>>>>> base.py:50 =>
>>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s 
>>>>> LIMIT 1; args=(10,)'
>>>>>
>>>>> I've created a temporary patch, to reduce number of DB queries (see 
>>>>> bellow). But I am sure that a solution can be more elegant (on library 
>>>>> level).
>>>>>
>>>>>
>>>>> class NonexistentCache(list):
>>>>>
>>>>>     _size = 1000
>>>>>
>>>>>     def add(self, val):
>>>>>         if val in self:
>>>>>             self.remove(val)
>>>>>         self.insert(0, val)
>>>>>         if len(self) > self._size:
>>>>>             self.pop()
>>>>>
>>>>>
>>>>> class Store(StoreOrig):
>>>>>
>>>>>     def __init__(self, database, cache=None):
>>>>>         StoreOrig.__init__(self, database, cache)
>>>>>         self.nonexistent_cache = NonexistentCache()
>>>>>
>>>>>     def get(self, cls, key, exists=False):
>>>>>         """Get object of type cls with the given primary key from the 
>>>>> database.
>>>>>
>>>>>         This method is patched to cache nonexistent values to reduce 
>>>>> number of DB-queries.
>>>>>         If the object is alive the database won't be touched.
>>>>>
>>>>>         @param cls: Class of the object to be retrieved.
>>>>>         @param key: Primary key of object. May be a tuple for composed 
>>>>> keys.
>>>>>
>>>>>         @return: The object found with the given primary key, or None
>>>>>             if no object is found.
>>>>>         """
>>>>>
>>>>>         if self._implicit_flush_block_count == 0:
>>>>>             self.flush()
>>>>>
>>>>>         if type(key) != tuple:
>>>>>             key = (key,)
>>>>>
>>>>>         cls_info = get_cls_info(cls)
>>>>>
>>>>>         assert len(key) == len(cls_info.primary_key)
>>>>>
>>>>>         primary_vars = []
>>>>>         for column, variable in zip(cls_info.primary_key, key):
>>>>>             if not isinstance(variable, Variable):
>>>>>                 variable = column.variable_factory(value=variable)
>>>>>             primary_vars.append(variable)
>>>>>
>>>>>         primary_values = tuple(var.get(to_db=True) for var in 
>>>>> primary_vars)
>>>>>
>>>>>         # Patched
>>>>>         alive_key = (cls_info.cls, primary_values)
>>>>>         obj_info = self._alive.get(alive_key)
>>>>>         if obj_info is not None and not obj_info.get("invalidated"):
>>>>>             return self._get_object(obj_info)
>>>>>
>>>>>         if obj_info is None and not exists and alive_key in 
>>>>> self.nonexistent_cache:
>>>>>             return None
>>>>>         # End of patch
>>>>>
>>>>>         where = compare_columns(cls_info.primary_key, primary_vars)
>>>>>
>>>>>         select = Select(cls_info.columns, where,
>>>>>                         default_tables=cls_info.table, limit=1)
>>>>>
>>>>>         result = self._connection.execute(select)
>>>>>         values = result.get_one()
>>>>>         if values is None:
>>>>>             # Patched
>>>>>             self.nonexistent_cache.add(alive_key)
>>>>>             # End of patch
>>>>>             return None
>>>>>         return self._load_object(cls_info, result, values)
>>>>>
>>>>>     def get_multi(self, cls, keys, exists=False):
>>>>>         """Get objects of type cls with the given primary key from the 
>>>>> database.
>>>>>
>>>>>         If the object is alive the database won't be touched.
>>>>>
>>>>>         @param cls: Class of the object to be retrieved.
>>>>>         @param key: Collection of primary key of objects (that may be a 
>>>>> tuple for composed keys).
>>>>>
>>>>>         @return: The object found with the given primary key, or None
>>>>>             if no object is found.
>>>>>         """
>>>>>         result = {}
>>>>>         missing = {}
>>>>>         if self._implicit_flush_block_count == 0:
>>>>>             self.flush()
>>>>>
>>>>>         for key in keys:
>>>>>             key_orig = key
>>>>>             if type(key) != tuple:
>>>>>                 key = (key,)
>>>>>
>>>>>             cls_info = get_cls_info(cls)
>>>>>
>>>>>             assert len(key) == len(cls_info.primary_key)
>>>>>
>>>>>             primary_vars = []
>>>>>             for column, variable in zip(cls_info.primary_key, key):
>>>>>                 if not isinstance(variable, Variable):
>>>>>                     variable = column.variable_factory(value=variable)
>>>>>                 primary_vars.append(variable)
>>>>>
>>>>>             primary_values = tuple(var.get(to_db=True) for var in 
>>>>> primary_vars)
>>>>>
>>>>>             alive_key = (cls_info.cls, primary_values)
>>>>>             obj_info = self._alive.get(alive_key)
>>>>>             if obj_info is not None and not obj_info.get("invalidated"):
>>>>>                 result[key_orig] = self._get_object(obj_info)
>>>>>                 continue
>>>>>
>>>>>             if obj_info is None and not exists and alive_key in 
>>>>> self.nonexistent_cache:
>>>>>                 result[key_orig] = None
>>>>>                 continue
>>>>>
>>>>>             missing[primary_values] = key_orig
>>>>>
>>>>>         if not missing:
>>>>>             return result
>>>>>
>>>>>         wheres = []
>>>>>         for i, column in enumerate(cls_info.primary_key):
>>>>>             wheres.append(In(cls_info.primary_key[i], tuple(v[i] for v in 
>>>>> missing)))
>>>>>         where = And(*wheres) if len(wheres) > 1 else wheres[0]
>>>>>
>>>>>         for obj in self.find(cls, where):
>>>>>             key_orig = missing.pop(tuple(var.get(to_db=True) for var in 
>>>>> get_obj_info(obj).get("primary_vars")))
>>>>>             result[key_orig] = obj
>>>>>
>>>>>         for primary_values, key_orig in missing.items():
>>>>>             self.nonexistent_cache.add((cls, primary_values))
>>>>>             result[key_orig] = None
>>>>>
>>>>>         return result
>>>>>
>>>>>     def reset(self):
>>>>>         StoreOrig.reset(self)
>>>>>         del self.nonexistent_cache[:]
>>>>>
>>>>>
>>>>>
>>>>> 2015-01-16 9:03 GMT+02:00 Free Ekanayaka <[email protected]>:
>>>>>>
>>>>>> Hi Ivan
>>>>>>
>>>>>> On Thu, Jan 15, 2015 at 10:23 PM, Ivan Zakrevskyi 
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi all.
>>>>>>>
>>>>>>> Storm has excellent caching behavior, but stores in Store._alive only 
>>>>>>> existent objects. If object does not exists for some key, storm makes 
>>>>>>> DB-query again and again.
>>>>>>>
>>>>>>> Are you planning add caching for keys of nonexistent objects to prevent 
>>>>>>> DB-query?
>>>>>>
>>>>>>
>>>>>> If an object doesn't exist in the cache it meas that either it was not 
>>>>>> yet loaded at all,  or it was loaded but it's now mark as "invalidated" 
>>>>>> (for example the transaction in which it was loaded fresh has 
>>>>>> terminated).
>>>>>>
>>>>>> So I'm note sure what you mean in you question, but I don't think 
>>>>>> anything more that could be cached (in terms of key->object values).
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> storm mailing list
>>>>> [email protected]
>>>>> Modify settings or unsubscribe at: 
>>>>> https://lists.ubuntu.com/mailman/listinfo/storm
>>>>>
>>>>
>>>

-- 
storm mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/storm

Re: [storm] Cache for nonexistent result

Reply via email to