[Wikidata-bugs] [Maniphest] [Commented On] T197252: Can't correctly unserialize cached EntityRevision on WikibaseClient (Wikipedia)

2018-07-02 Thread WMDE-leszek
WMDE-leszek added a comment.
Well, I vaguely remember loading the php-serialization from cache being about 30% faster than deserializing JSON, but I can't quite remember where I benchmarked this (probably benchmarked this while working on dumps). When considering this, we should really benchmark this thoroughly.

@hoo any chance you'd be able to dig up this benchmarking? Not necessarily the exact results, but maybe when it happened, and in which context it has been measured?
Asking, because I'd be a bit reluctant to give up on something that seems "a right thing" if we're not completely sure of what performance implication are NOW.TASK DETAILhttps://phabricator.wikimedia.org/T197252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: WMDE-leszekCc: Aklapper, aaron, Tarrow, Ladsgroup, ArielGlenn, daniel, aude, Addshore, Aleksey_WMDE, Pablo-WMDE, hoo, WMDE-leszek, Mringgaard, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, Darkdadaah, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197252: Can't correctly unserialize cached EntityRevision on WikibaseClient (Wikipedia)

2018-06-15 Thread hoo
hoo added a comment.
There might be some difference in memory consumption or or time to deserialize to consider, but probably nothing too bad.

Well, I vaguely remember loading the php-serialization from cache being about 30% faster than deserializing JSON, but I can't quite remember where I benchmarked this (probably benchmarked this while working on dumps). When considering this, we should really benchmark this thoroughly.

My preferred option would be to make the fetching from the cache more resilient against instances of unknown classes in thereā€¦ maybe using at-ease?TASK DETAILhttps://phabricator.wikimedia.org/T197252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: aaron, Tarrow, Ladsgroup, ArielGlenn, daniel, aude, Addshore, Aleksey_WMDE, Pablo-WMDE, hoo, WMDE-leszek, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, Darkdadaah, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197252: Can't correctly unserialize cached EntityRevision on WikibaseClient (Wikipedia)

2018-06-15 Thread Addshore
Addshore added a comment.

In T197252#4289410, @WMDE-leszek wrote:
What line in Wikibase actually does this unserialization? perhaps we could change that to something that wouldn't have such a global effect.

That's where the fun lies. Technically speaking, Wikibase itself does not do serialize/unserialize. PHP serialization happens in EntityRevisionCache::set/get which are basically calling set/get on BagOStuff putting EntityRevision instance in the cache.

That actually brings up another option we've briefly mentioned in the exploration group yesterday (as with others, I am not suggesting it being a good pick):


Do not store EntityRevision objects into MW cache, but use own serialization of those objects or so. Do problem of handling of unknown EntityIds etc would not really be solved. Wikibase would be more in the control of such situations, though. Which might be good, or not that good thing.



Yup, we could just put our JSON there instead of serializing a php object (which just sucks 99% of the time anyway), then we could handle deserialization issues in a local way, rather than the global approach of unserialize-callback-func.

There might be some difference in memory consumption or or time to deserialize to consider, but probably nothing too bad.TASK DETAILhttps://phabricator.wikimedia.org/T197252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: aaron, Tarrow, Ladsgroup, ArielGlenn, daniel, aude, Addshore, Aleksey_WMDE, Pablo-WMDE, hoo, WMDE-leszek, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, Darkdadaah, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197252: Can't correctly unserialize cached EntityRevision on WikibaseClient (Wikipedia)

2018-06-15 Thread WMDE-leszek
WMDE-leszek added a comment.
What line in Wikibase actually does this unserialization? perhaps we could change that to something that wouldn't have such a global effect.

That's where the fun lies. Technically speaking, Wikibase itself does not do serialize/unserialize. PHP serialization happens in EntityRevisionCache::set/get which are basically calling set/get on BagOStuff putting EntityRevision instance in the cache.

That actually brings up another option we've briefly mentioned in the exploration group yesterday (as with others, I am not suggesting it being a good pick):


Do not store EntityRevision objects into MW cache, but use own serialization of those objects or so.


Do problem of handling of unknown EntityIds etc would not really be solved. Wikibase would be more in the control of such situations, though. Which might be good, or not that good thing.TASK DETAILhttps://phabricator.wikimedia.org/T197252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: WMDE-leszekCc: aaron, Tarrow, Ladsgroup, ArielGlenn, daniel, aude, Addshore, Aleksey_WMDE, Pablo-WMDE, hoo, WMDE-leszek, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, Darkdadaah, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T197252: Can't correctly unserialize cached EntityRevision on WikibaseClient (Wikipedia)

2018-06-15 Thread Addshore
Addshore added a comment.
https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/Wikibase.php#L37-L48


In T197252#4283504, @WMDE-leszek wrote:
Some options to overcome the issue (list not complete and not sorted in any way)


Always run the same version of the software on both systems. Tricky/Maybe even not wanted, as not only means enabling WikibaseLexeme, WikibaseMediaInfo etc on wikiedias, but to get things 100% right, means always having the same version of Wikibase running on both systems. Not doable with the current deployment strategy with deploying Wikibase to different wikis on different days



I do not like this idea.
This means when rolling out new extensions we just have to flip one almighty switch and load it everywhere at once, sounds scary, especially as we will have a rollout like lexeme, but for media info again in the not to distant future.


Do not share cache prefixes between Repo and Client unless they're the same wiki. Sharing cache between repo and client makes sense in the single-wiki case (e.g. wikidata.org), but is this really needed otherwise? Little bit of random archeology and finding https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/Wikibase/+/203811/ does not really provide the answer why the shared cache prefix has been introduced.


So the cache key for production is created here: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/Wikibase.php#L37-L48
It currently only varies on the branch of mediawiki & wikibase that is deployed.
Right now we have some different combinations that this doesn't account for


Wikibase & No Lexeme
Wikibase & Lexeme


During the the MediaInfo rollout this will increase again, I'm not sure what the final list of clients for which repos etc will look like or how it will roll out at all.

One important thing to find out here is how much cache space do we actually use?
Is having more keys a viable possibility?
Having different keys based on the deployed wikibase extensions seems feasible to me


Handle unserialization of unknown classes more "gracefully", using http://php.net/manual/en/var.configuration.php#ini.unserialize-callback-func. We could make said callback throw an exception that Wikibase could handle (e.g. ignore reading from cache in such case). Problem is the setting is global, and could lead to all sort of unforeseen issues.

Pinging @hoo @Amir1 @aude @daniel @ArielGlenn as they might have some clue here, or know some background.


What line in Wikibase actually does this unserialization? perhaps we could change that to something that wouldn't have such a global effect.TASK DETAILhttps://phabricator.wikimedia.org/T197252EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: ArielGlenn, daniel, aude, Addshore, Aleksey_WMDE, Pablo-WMDE, hoo, WMDE-leszek, Aklapper, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, Darkdadaah, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs