> For "Read Committed" and "Read Uncommitted" my patch is not safe, because > this levels should not to have "Non-repeatable reads". But for existent > object storm also can not provide "repeatable reads". So, it's not mater, > will "Non-repeatable" be applied reads for existent object or for nonexistent > object.
-- sorry. For "Read Committed" and "Read Uncommitted" my patch is not safe, because this levels should to have "Non-repeatable reads". But for existent object storm also can not provide "Non-repeatable reads". So, it's not mater, will "Non-repeatable reads" be prevent for existent object or for nonexistent object. > 2015-01-17 1:12 GMT+02:00 Ivan Zakrevskyi <[email protected]>: >> Thanks for answer. >> >> If you add object to DB by this store, you will can get this object, >> because store.get() looking object first in store._alive and then in >> store.nonexistent_cache. >> >> If you will create object by other store (for example "master" store >> or by concurrent thread), then you can get this object after >> transaction has been stared (but before first reading), if your DB >> transaction-level is read commited. So, it's not exactly serializable >> isolation level or repeteable reads. >> >> But if you have already tried get this object in current transaction, >> and will try to get it again, you will get None even if object has >> been already created by concurrent thread. Of course, you can call >> store.get() with exists=True argument to bypass nonexistent_cache, or >> you can even reset nonexistent_cache (it's a public attribute). >> >> On the other hand, suppose that an object exists, and you have already >> got this object in current transaction. After it, suppose, object was >> changed in DB by concurrent thread. But these changes will not affect >> your object. I think in this case it does not matter what type of >> object, None or Model instance. Since the object has been read, it can >> not be changed even if it has been modified by a parallel process. >> >> My patch does not affect store.find(), and, hence, selection. I'm not >> sure, that phantom reads is possible here, except that >> store.get_multi(). There is rather a "Non-repeatable reads", than >> "Phantom reads". Because it can hide changes of certain row (with >> specified primary key), but not of selection. >> >> So, for "Repeatable Read" and "Serializable" my patch is safe (only >> need add reset of store.nonexistent_cache on commit). >> >> For "Read Committed" and "Read Uncommitted" my patch is not safe, >> because this levels should not to have "Non-repeatable reads". But for >> existent object storm also can not provide "repeatable reads". So, >> it's not mater, will "Non-repeatable" be applied reads for existent >> object or for nonexistent object. >> >> Of course, my patch is temporary solution. There is can be more >> elegant solutions on library level. But it really reduce many DB >> queries for nonexistent primary keys. >> >> >> >> 2015-01-16 23:20 GMT+02:00 Free Ekanayaka <[email protected]>: >>> >>> See: >>> >>> http://en.wikipedia.org/wiki/Isolation_%28database_systems%29 >>> >>> for reference. >>> >>> On Fri, Jan 16, 2015 at 10:19 PM, Free Ekanayaka <[email protected]> wrote: >>>> >>>> Hi Ivan, >>>> >>>> it feels what you suggest would work safely on for transactions set the >>>> serializable isolation level, not repeteable reads down to read >>>> uncommitted (since phantom reads could occur there, and the non-existing >>>> cache would hide new results). >>>> >>>> Cheers >>>> >>>> On Fri, Jan 16, 2015 at 5:55 PM, Ivan Zakrevskyi >>>> <[email protected]> wrote: >>>>> >>>>> Hi, all. Thanks for answer. I'll try to explain. >>>>> >>>>> Try to get existent object. >>>>> >>>>> In [2]: store.get(StTwitterProfile, (1,3)) >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id >>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 3)' >>>>> Out[2]: <users.orm.TwitterProfile at 0x7f1e93b6d450> >>>>> >>>>> In [3]: store.get(StTwitterProfile, (1,3)) >>>>> Out[3]: <users.orm.TwitterProfile at 0x7f1e93b6d450> >>>>> >>>>> In [4]: store.get(StTwitterProfile, (1,3)) >>>>> Out[4]: <users.orm.TwitterProfile at 0x7f1e93b6d450> >>>>> >>>>> You can see, that storm made only one query. >>>>> >>>>> Ok, now try get nonexistent twitter profile for given context: >>>>> >>>>> In [5]: store.get(StTwitterProfile, (10,3)) >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id >>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)' >>>>> >>>>> In [6]: store.get(StTwitterProfile, (10,3)) >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id >>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)' >>>>> >>>>> In [7]: store.get(StTwitterProfile, (10,3)) >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM twitterprofile WHERE twitterprofile.context_id >>>>> = %s AND twitterprofile.user_id = %s LIMIT 1; args=(1, 10)' >>>>> >>>>> Storm sends a query to the database each time. >>>>> >>>>> For example, we have a some util: >>>>> >>>>> def myutil(user_id, *args, **kwargs): >>>>> context_id = >>>>> get_context_from_mongodb_redis_memcache_environment_etc(user_id, *args, >>>>> **kwargs) >>>>> twitter_profile = store.get(TwitterProfile, (context_id, user_id)) >>>>> return twitter_profile.some_attr >>>>> >>>>> In this case, Storm will send a query to the database each time. >>>>> >>>>> The similar situation for non-existent relation. >>>>> >>>>> In [20]: u = store.get(StUser, 10) >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM user WHERE user.id = %s LIMIT 1; args=(10,)' >>>>> >>>>> >>>>> In [22]: u.profile >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s >>>>> LIMIT 1; args=(10,)' >>>>> >>>>> In [23]: u.profile >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s >>>>> LIMIT 1; args=(10,)' >>>>> >>>>> In [24]: u.profile >>>>> base.py:50 => >>>>> u'(0.001) SELECT ... FROM userprofile WHERE userprofile.user_id = %s >>>>> LIMIT 1; args=(10,)' >>>>> >>>>> I've created a temporary patch, to reduce number of DB queries (see >>>>> bellow). But I am sure that a solution can be more elegant (on library >>>>> level). >>>>> >>>>> >>>>> class NonexistentCache(list): >>>>> >>>>> _size = 1000 >>>>> >>>>> def add(self, val): >>>>> if val in self: >>>>> self.remove(val) >>>>> self.insert(0, val) >>>>> if len(self) > self._size: >>>>> self.pop() >>>>> >>>>> >>>>> class Store(StoreOrig): >>>>> >>>>> def __init__(self, database, cache=None): >>>>> StoreOrig.__init__(self, database, cache) >>>>> self.nonexistent_cache = NonexistentCache() >>>>> >>>>> def get(self, cls, key, exists=False): >>>>> """Get object of type cls with the given primary key from the >>>>> database. >>>>> >>>>> This method is patched to cache nonexistent values to reduce >>>>> number of DB-queries. >>>>> If the object is alive the database won't be touched. >>>>> >>>>> @param cls: Class of the object to be retrieved. >>>>> @param key: Primary key of object. May be a tuple for composed >>>>> keys. >>>>> >>>>> @return: The object found with the given primary key, or None >>>>> if no object is found. >>>>> """ >>>>> >>>>> if self._implicit_flush_block_count == 0: >>>>> self.flush() >>>>> >>>>> if type(key) != tuple: >>>>> key = (key,) >>>>> >>>>> cls_info = get_cls_info(cls) >>>>> >>>>> assert len(key) == len(cls_info.primary_key) >>>>> >>>>> primary_vars = [] >>>>> for column, variable in zip(cls_info.primary_key, key): >>>>> if not isinstance(variable, Variable): >>>>> variable = column.variable_factory(value=variable) >>>>> primary_vars.append(variable) >>>>> >>>>> primary_values = tuple(var.get(to_db=True) for var in >>>>> primary_vars) >>>>> >>>>> # Patched >>>>> alive_key = (cls_info.cls, primary_values) >>>>> obj_info = self._alive.get(alive_key) >>>>> if obj_info is not None and not obj_info.get("invalidated"): >>>>> return self._get_object(obj_info) >>>>> >>>>> if obj_info is None and not exists and alive_key in >>>>> self.nonexistent_cache: >>>>> return None >>>>> # End of patch >>>>> >>>>> where = compare_columns(cls_info.primary_key, primary_vars) >>>>> >>>>> select = Select(cls_info.columns, where, >>>>> default_tables=cls_info.table, limit=1) >>>>> >>>>> result = self._connection.execute(select) >>>>> values = result.get_one() >>>>> if values is None: >>>>> # Patched >>>>> self.nonexistent_cache.add(alive_key) >>>>> # End of patch >>>>> return None >>>>> return self._load_object(cls_info, result, values) >>>>> >>>>> def get_multi(self, cls, keys, exists=False): >>>>> """Get objects of type cls with the given primary key from the >>>>> database. >>>>> >>>>> If the object is alive the database won't be touched. >>>>> >>>>> @param cls: Class of the object to be retrieved. >>>>> @param key: Collection of primary key of objects (that may be a >>>>> tuple for composed keys). >>>>> >>>>> @return: The object found with the given primary key, or None >>>>> if no object is found. >>>>> """ >>>>> result = {} >>>>> missing = {} >>>>> if self._implicit_flush_block_count == 0: >>>>> self.flush() >>>>> >>>>> for key in keys: >>>>> key_orig = key >>>>> if type(key) != tuple: >>>>> key = (key,) >>>>> >>>>> cls_info = get_cls_info(cls) >>>>> >>>>> assert len(key) == len(cls_info.primary_key) >>>>> >>>>> primary_vars = [] >>>>> for column, variable in zip(cls_info.primary_key, key): >>>>> if not isinstance(variable, Variable): >>>>> variable = column.variable_factory(value=variable) >>>>> primary_vars.append(variable) >>>>> >>>>> primary_values = tuple(var.get(to_db=True) for var in >>>>> primary_vars) >>>>> >>>>> alive_key = (cls_info.cls, primary_values) >>>>> obj_info = self._alive.get(alive_key) >>>>> if obj_info is not None and not obj_info.get("invalidated"): >>>>> result[key_orig] = self._get_object(obj_info) >>>>> continue >>>>> >>>>> if obj_info is None and not exists and alive_key in >>>>> self.nonexistent_cache: >>>>> result[key_orig] = None >>>>> continue >>>>> >>>>> missing[primary_values] = key_orig >>>>> >>>>> if not missing: >>>>> return result >>>>> >>>>> wheres = [] >>>>> for i, column in enumerate(cls_info.primary_key): >>>>> wheres.append(In(cls_info.primary_key[i], tuple(v[i] for v in >>>>> missing))) >>>>> where = And(*wheres) if len(wheres) > 1 else wheres[0] >>>>> >>>>> for obj in self.find(cls, where): >>>>> key_orig = missing.pop(tuple(var.get(to_db=True) for var in >>>>> get_obj_info(obj).get("primary_vars"))) >>>>> result[key_orig] = obj >>>>> >>>>> for primary_values, key_orig in missing.items(): >>>>> self.nonexistent_cache.add((cls, primary_values)) >>>>> result[key_orig] = None >>>>> >>>>> return result >>>>> >>>>> def reset(self): >>>>> StoreOrig.reset(self) >>>>> del self.nonexistent_cache[:] >>>>> >>>>> >>>>> >>>>> 2015-01-16 9:03 GMT+02:00 Free Ekanayaka <[email protected]>: >>>>>> >>>>>> Hi Ivan >>>>>> >>>>>> On Thu, Jan 15, 2015 at 10:23 PM, Ivan Zakrevskyi >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> Hi all. >>>>>>> >>>>>>> Storm has excellent caching behavior, but stores in Store._alive only >>>>>>> existent objects. If object does not exists for some key, storm makes >>>>>>> DB-query again and again. >>>>>>> >>>>>>> Are you planning add caching for keys of nonexistent objects to prevent >>>>>>> DB-query? >>>>>> >>>>>> >>>>>> If an object doesn't exist in the cache it meas that either it was not >>>>>> yet loaded at all, or it was loaded but it's now mark as "invalidated" >>>>>> (for example the transaction in which it was loaded fresh has >>>>>> terminated). >>>>>> >>>>>> So I'm note sure what you mean in you question, but I don't think >>>>>> anything more that could be cached (in terms of key->object values). >>>>>> >>>>>> Cheers >>>>>> >>>>> >>>>> >>>>> -- >>>>> storm mailing list >>>>> [email protected] >>>>> Modify settings or unsubscribe at: >>>>> https://lists.ubuntu.com/mailman/listinfo/storm >>>>> >>>> >>> -- storm mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/storm
