Re: [storm] Cache for nonexistent result

Ivan Zakrevskyi Fri, 16 Jan 2015 15:34:44 -0800

> For "Read Committed" and "Read Uncommitted" my patch is not safe, because 
> this levels should not to have "Non-repeatable reads". But for existent 
> object storm also can not provide "repeatable reads". So, it's not mater, 
> will "Non-repeatable" be applied reads for existent object or for nonexistent 
> object.


-- sorry. For "Read Committed" and "Read Uncommitted" my patch is not
safe, because this levels should to have "Non-repeatable reads". But
for existent object storm also can not provide "Non-repeatable reads".
So, it's not mater, will "Non-repeatable reads" be prevent for
existent object or for nonexistent object.


2015-01-17 1:12 GMT+02:00 Ivan Zakrevskyi <[email protected]>:
> Thanks for answer.
>
> If you add object to DB by this store, you will can get this object,
> because store.get() looking object first in store._alive and then in
> store.nonexistent_cache.
>
> If you will create object by other store (for example "master" store
> or by concurrent thread), then you can get this object after
> transaction has been stared (but before first reading), if your DB
> transaction-level is read commited. So, it's not exactly serializable
> isolation level or repeteable reads.
>
> But if you have already tried get this object in current transaction,
> and will try to get it again, you will get None even if object has
> been already created by concurrent thread. Of course, you can call
> store.get() with exists=True argument to bypass nonexistent_cache, or
> you can even reset nonexistent_cache (it's a public attribute).
>
> On the other hand, suppose that an object exists, and you have already
> got this object in current transaction. After it, suppose, object was
> changed in DB by concurrent thread. But these changes will not affect
> your object. I think in this case it does not matter what type of
> object, None or Model instance. Since the object has been read, it can
> not be changed even if it has been modified by a parallel process.
>
> My patch does not affect store.find(), and, hence, selection. I'm not
> sure, that phantom reads is possible here, except that
> store.get_multi(). There is rather a "Non-repeatable reads", than
> "Phantom reads". Because it can hide changes of certain row (with
> specified primary key), but not of selection.
>
> So, for "Repeatable Read" and "Serializable" my patch is safe (only
> need add reset of store.nonexistent_cache on commit).
>
> For "Read Committed" and "Read Uncommitted" my patch is not safe,
> because this levels should not to have "Non-repeatable reads". But for
> existent object storm also can not provide "repeatable reads". So,
> it's not mater, will "Non-repeatable" be applied reads for existent
> object or for nonexistent object.
>
> Of course, my patch is temporary solution. There is can be more
> elegant solutions on library level. But it really reduce many DB
> queries for nonexistent primary keys.
>
>
>
> 2015-01-16 23:20 GMT+02:00 Free Ekanayaka <[email protected]>:
>>
>> See:
>>
>> http://en.wikipedia.org/wiki/Isolation_%28database_systems%29
>>
>> for reference.
>>
>> On Fri, Jan 16, 2015 at 10:19 PM, Free Ekanayaka <[email protected]> wrote:
>>>
>>> Hi Ivan,
>>>
>>> it feels what you suggest would work safely on for transactions set the 
>>> serializable isolation level, not repeteable reads down to read uncommitted 
>>> (since phantom reads could occur there, and the non-existing cache would 
>>> hide new results).
>>>
>>> Cheers
>>>
>>> On Fri, Jan 16, 2015 at 5:55 PM, Ivan Zakrevskyi 
>>> <[email protected]> wrote:
>>>>
>>>> Hi, all. Thanks for answer. I'll try to explain.
>>>>
>>>> Try to get existent object.
>>>>
>>>> In [2]: store.get(StTwitterProfile, (1,3))
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id = 
>>>> %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 3)'
>>>> Out[2]: <users.orm.TwitterProfile at 0x7f1e93b6d450>
>>>>
>>>> In [3]: store.get(StTwitterProfile, (1,3))
>>>> Out[3]: <users.orm.TwitterProfile at 0x7f1e93b6d450>
>>>>
>>>> In [4]: store.get(StTwitterProfile, (1,3))
>>>> Out[4]: <users.orm.TwitterProfile at 0x7f1e93b6d450>
>>>>
>>>> You can see, that storm made only one query.
>>>>
>>>> Ok, now try get nonexistent twitter profile for given context:
>>>>
>>>> In [5]: store.get(StTwitterProfile, (10,3))
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id = 
>>>> %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'
>>>>
>>>> In [6]: store.get(StTwitterProfile, (10,3))
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id = 
>>>> %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'
>>>>
>>>> In [7]: store.get(StTwitterProfile, (10,3))
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id = 
>>>> %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)'
>>>>
>>>> Storm sends a query to the database each time.
>>>>
>>>> For example, we have a some util:
>>>>
>>>> def myutil(user_id, *args, **kwargs):
>>>>     context_id = 
>>>> get_context_from_mongodb_redis_memcache_environment_etc(user_id, *args, 
>>>> **kwargs)
>>>>     twitter_profile = store.get(TwitterProfile, (context_id, user_id))
>>>>     return twitter_profile.some_attr
>>>>
>>>> In this case, Storm will send a query to the database each time.
>>>>
>>>> The similar situation for non-existent relation.
>>>>
>>>> In [20]: u = store.get(StUser, 10)
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM user WHERE user.id = %s LIMIT 1; args=(10,)'
>>>>
>>>>
>>>> In [22]: u.profile
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s LIMIT 
>>>> 1; args=(10,)'
>>>>
>>>> In [23]: u.profile
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s LIMIT 
>>>> 1; args=(10,)'
>>>>
>>>> In [24]: u.profile
>>>> base.py:50 =>
>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s LIMIT 
>>>> 1; args=(10,)'
>>>>
>>>> I've created a temporary patch, to reduce number of DB queries (see 
>>>> bellow). But I am sure that a solution can be more elegant (on library 
>>>> level).
>>>>
>>>>
>>>> class NonexistentCache(list):
>>>>
>>>>     _size = 1000
>>>>
>>>>     def add(self, val):
>>>>         if val in self:
>>>>             self.remove(val)
>>>>         self.insert(0, val)
>>>>         if len(self) > self._size:
>>>>             self.pop()
>>>>
>>>>
>>>> class Store(StoreOrig):
>>>>
>>>>     def __init__(self, database, cache=None):
>>>>         StoreOrig.__init__(self, database, cache)
>>>>         self.nonexistent_cache = NonexistentCache()
>>>>
>>>>     def get(self, cls, key, exists=False):
>>>>         """Get object of type cls with the given primary key from the 
>>>> database.
>>>>
>>>>         This method is patched to cache nonexistent values to reduce 
>>>> number of DB-queries.
>>>>         If the object is alive the database won't be touched.
>>>>
>>>>         @param cls: Class of the object to be retrieved.
>>>>         @param key: Primary key of object. May be a tuple for composed 
>>>> keys.
>>>>
>>>>         @return: The object found with the given primary key, or None
>>>>             if no object is found.
>>>>         """
>>>>
>>>>         if self._implicit_flush_block_count == 0:
>>>>             self.flush()
>>>>
>>>>         if type(key) != tuple:
>>>>             key = (key,)
>>>>
>>>>         cls_info = get_cls_info(cls)
>>>>
>>>>         assert len(key) == len(cls_info.primary_key)
>>>>
>>>>         primary_vars = []
>>>>         for column, variable in zip(cls_info.primary_key, key):
>>>>             if not isinstance(variable, Variable):
>>>>                 variable = column.variable_factory(value=variable)
>>>>             primary_vars.append(variable)
>>>>
>>>>         primary_values = tuple(var.get(to_db=True) for var in primary_vars)
>>>>
>>>>         # Patched
>>>>         alive_key = (cls_info.cls, primary_values)
>>>>         obj_info = self._alive.get(alive_key)
>>>>         if obj_info is not None and not obj_info.get("invalidated"):
>>>>             return self._get_object(obj_info)
>>>>
>>>>         if obj_info is None and not exists and alive_key in 
>>>> self.nonexistent_cache:
>>>>             return None
>>>>         # End of patch
>>>>
>>>>         where = compare_columns(cls_info.primary_key, primary_vars)
>>>>
>>>>         select = Select(cls_info.columns, where,
>>>>                         default_tables=cls_info.table, limit=1)
>>>>
>>>>         result = self._connection.execute(select)
>>>>         values = result.get_one()
>>>>         if values is None:
>>>>             # Patched
>>>>             self.nonexistent_cache.add(alive_key)
>>>>             # End of patch
>>>>             return None
>>>>         return self._load_object(cls_info, result, values)
>>>>
>>>>     def get_multi(self, cls, keys, exists=False):
>>>>         """Get objects of type cls with the given primary key from the 
>>>> database.
>>>>
>>>>         If the object is alive the database won't be touched.
>>>>
>>>>         @param cls: Class of the object to be retrieved.
>>>>         @param key: Collection of primary key of objects (that may be a 
>>>> tuple for composed keys).
>>>>
>>>>         @return: The object found with the given primary key, or None
>>>>             if no object is found.
>>>>         """
>>>>         result = {}
>>>>         missing = {}
>>>>         if self._implicit_flush_block_count == 0:
>>>>             self.flush()
>>>>
>>>>         for key in keys:
>>>>             key_orig = key
>>>>             if type(key) != tuple:
>>>>                 key = (key,)
>>>>
>>>>             cls_info = get_cls_info(cls)
>>>>
>>>>             assert len(key) == len(cls_info.primary_key)
>>>>
>>>>             primary_vars = []
>>>>             for column, variable in zip(cls_info.primary_key, key):
>>>>                 if not isinstance(variable, Variable):
>>>>                     variable = column.variable_factory(value=variable)
>>>>                 primary_vars.append(variable)
>>>>
>>>>             primary_values = tuple(var.get(to_db=True) for var in 
>>>> primary_vars)
>>>>
>>>>             alive_key = (cls_info.cls, primary_values)
>>>>             obj_info = self._alive.get(alive_key)
>>>>             if obj_info is not None and not obj_info.get("invalidated"):
>>>>                 result[key_orig] = self._get_object(obj_info)
>>>>                 continue
>>>>
>>>>             if obj_info is None and not exists and alive_key in 
>>>> self.nonexistent_cache:
>>>>                 result[key_orig] = None
>>>>                 continue
>>>>
>>>>             missing[primary_values] = key_orig
>>>>
>>>>         if not missing:
>>>>             return result
>>>>
>>>>         wheres = []
>>>>         for i, column in enumerate(cls_info.primary_key):
>>>>             wheres.append(In(cls_info.primary_key[i], tuple(v[i] for v in 
>>>> missing)))
>>>>         where = And(*wheres) if len(wheres) > 1 else wheres[0]
>>>>
>>>>         for obj in self.find(cls, where):
>>>>             key_orig = missing.pop(tuple(var.get(to_db=True) for var in 
>>>> get_obj_info(obj).get("primary_vars")))
>>>>             result[key_orig] = obj
>>>>
>>>>         for primary_values, key_orig in missing.items():
>>>>             self.nonexistent_cache.add((cls, primary_values))
>>>>             result[key_orig] = None
>>>>
>>>>         return result
>>>>
>>>>     def reset(self):
>>>>         StoreOrig.reset(self)
>>>>         del self.nonexistent_cache[:]
>>>>
>>>>
>>>>
>>>> 2015-01-16 9:03 GMT+02:00 Free Ekanayaka <[email protected]>:
>>>>>
>>>>> Hi Ivan
>>>>>
>>>>> On Thu, Jan 15, 2015 at 10:23 PM, Ivan Zakrevskyi 
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Hi all.
>>>>>>
>>>>>> Storm has excellent caching behavior, but stores in Store._alive only 
>>>>>> existent objects. If object does not exists for some key, storm makes 
>>>>>> DB-query again and again.
>>>>>>
>>>>>> Are you planning add caching for keys of nonexistent objects to prevent 
>>>>>> DB-query?
>>>>>
>>>>>
>>>>> If an object doesn't exist in the cache it meas that either it was not 
>>>>> yet loaded at all,  or it was loaded but it's now mark as "invalidated" 
>>>>> (for example the transaction in which it was loaded fresh has terminated).
>>>>>
>>>>> So I'm note sure what you mean in you question, but I don't think 
>>>>> anything more that could be cached (in terms of key->object values).
>>>>>
>>>>> Cheers
>>>>>
>>>>
>>>>
>>>> --
>>>> storm mailing list
>>>> [email protected]
>>>> Modify settings or unsubscribe at: 
>>>> https://lists.ubuntu.com/mailman/listinfo/storm
>>>>
>>>
>>

-- 
storm mailing list
[email protected]
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/storm

Re: [storm] Cache for nonexistent result

Reply via email to