Addshore added a comment.
In T217897#5026499 <https://phabricator.wikimedia.org/T217897#5026499>, @Smalyshev wrote: > > I guess the wdqs internal machines would have comparable response times? > > You can see response times for RDF loading in the dashboard: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=now-24h&to=now&panelId=26&fullscreen So, p95 is around 80ms, which lines up pretty well with the data in T217897#5020225 <https://phabricator.wikimedia.org/T217897#5020225> where a short page took 60ms. I guess most entities are toward the smaller end of the scale as the mean seems to be closer to ~55-60ms. Still, a cache hit for something that takes 80ms in php would likely only take ~25ms if hitting varnish. > > >> but 11 hours in a 24 hour period is still pretty significant > > I'm not sure I understand how this figure was obtained but there's absolutely no way Updater spends half time in waiting for RDF loading. In reality, it spends most of its time in SPARQL Update. I'm not sure if the figure is totally accurate, it is based on multiple estimations. Let me try and refine it slightly. Again this ignores batching, but the actual edit count on the day was ~1.1 million, which resulted in between 926848 and 826878 requests to load entity data, depending on which wdqs host you look at, so 84-75% of edits end up triggering a entity data load with the current batching methods. But the fact stands that varnish will always respond faster than php, and looking at even the smallest entity, a varnish hit shaves around 50% off the request time. If we say a single wdqs host is making 800k requests to special entity data right now with an average load time of 55ms, thats 44000000ms or 12 hours spent loading data If we pretend we are loading every single revision (so 1.1 million) and actually hit the varnish cache (well sometimes not if we are the first server to ask) then we have ((1100000/12*11) * 25ms ) + ( 1100000/12*1 * 55ms ) = 30250000 ms or 8.5 hours So probably a saving of closer to 4 hours each day per instance of loading time. But if the updater were to actually then write to blazegraph for each of the retrieved revisisions then of course that would be 300k more updates, but IMO the wdqs updater can still request revisions like this, and choose not to actually write every single revision. This is all generally meant to just highlight that hitting varnish is obviously going to be faster, even if the updater itself think that entity data retrieval is fast enough. >> writing to blazegraph while getting the next data ready? > > That could be possible but doesn't happen now. May be a good idea to try. However, since SPARQL Update dominates the timings pretty heavily it's unlikely we'd save too much. And since we need to validate IDs against database (to ensure we don't already have the revision we're about to fetch) we can not fetch RDF before previous update has finished, thus reducing the parallelizeable part to essentially only Kafka data loading, which doesn't seem to be worth it. I'm still a bit confused about this logic inside the updater, especially with this id validation checking if we have the revision already etc? The fastest way for this to work in the distributed fashion that that it is currently laid out in is to just retrieve every revision of entity data, using a varnish cacheable query string, hold the latest revision of an entity in some internal queue in the updater for a few second while waiting for more updates, and then just commit that to blazegraph for storage after a few seconds. This means less reducing the php calls dramatically, increasing varnish hits, decreasing overall time spent waiting for special:entitydata responses, and still allowing for batching. A few other comment that we might want to think about. PHP is being hit very roughly with 12.5 million requests to turn some PHP object into RDF output for special entity data, we might want to just consider caching that in its own memcached key inside wikibase so we only have to do that conversion once per revision, reducing this logic from running 12.5 million times to just around 1.1 million times. This hasn't been considered before because special entity data is cacheable, and these should all be varnish cache hits anyway, but if the updater behaviour does not change then maybe we should add this? Also regarding 3rd parties using the updater, perhaps the revid based approach needs to be developed anyway to reduce the load that is likely to continue to increase. These should be hitting the cache, but they should also nt be getting out of date data, revid is the solution to that. https://grafana.wikimedia.org/d/000000188/wikidata-special-entitydata shows the issue pretty well with the number of requests for uncached ttl data to special entity data. In the last year the number of requests seem to have doubled, and the request rate doesn't look like it is slowing down, are we ready to double the requests using this uncachable method again in the next 12 months? TASK DETAIL https://phabricator.wikimedia.org/T217897 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Addshore Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, Mbch331, Jay8g, fgiunchedi
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
