Addshore added a comment.

  In T217897#5026499 <https://phabricator.wikimedia.org/T217897#5026499>, 
@Smalyshev wrote:
  
  > > I guess the wdqs internal machines would have comparable response times?
  >
  > You can see response times for RDF loading in the dashboard: 
https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=now-24h&to=now&panelId=26&fullscreen
  
  
  So, p95 is around 80ms, which lines up pretty well with the data in 
T217897#5020225 <https://phabricator.wikimedia.org/T217897#5020225> where a 
short page took 60ms.
  I guess most entities are toward the smaller end of the scale as the mean 
seems to be closer to ~55-60ms.
  Still, a cache hit for something that takes 80ms in php would likely only 
take ~25ms if hitting varnish.
  
  > 
  > 
  >> but 11 hours in a 24 hour period is still pretty significant
  > 
  > I'm not sure I understand how this figure was obtained but there's 
absolutely no way Updater spends half time in waiting for RDF loading. In 
reality, it spends most of its time in SPARQL Update.
  
  I'm not sure if the figure is totally accurate, it is based on multiple 
estimations.
  Let me try and refine it slightly.
  
  Again this ignores batching, but the actual edit count on the day was ~1.1 
million, which resulted in between 926848 and 826878 requests to load entity 
data, depending on which wdqs host you look at, so 84-75% of edits end up 
triggering a entity data load with the current batching methods.
  
  But the fact stands that varnish will always respond faster than php, and 
looking at even the smallest entity, a varnish hit shaves around 50% off the 
request time.
  If we say a single wdqs host is making 800k requests to special entity data 
right now with an average load time of 55ms, thats 44000000ms or 12 hours spent 
loading data
  If we pretend we are loading every single revision (so 1.1 million) and 
actually hit the varnish cache (well sometimes not if we are the first server 
to ask) then we have ((1100000/12*11) * 25ms ) + ( 1100000/12*1 * 55ms ) = 
30250000 ms or 8.5 hours
  So probably a saving of closer to 4 hours each day per instance of loading 
time. But if the updater were to actually then write to blazegraph for each of 
the retrieved revisisions then of course that would be 300k more updates, but 
IMO the wdqs updater can still request revisions like this, and choose not to 
actually write every single revision.
  
  This is all generally meant to just highlight that hitting varnish is 
obviously going to be faster, even if the updater itself think that entity data 
retrieval is fast enough.
  
  >> writing to blazegraph while getting the next data ready?
  > 
  > That could be possible but doesn't happen now. May be a good idea to try. 
However, since SPARQL Update dominates the timings pretty heavily it's unlikely 
we'd save too much. And since we need to validate IDs against database (to 
ensure we don't already have the revision we're about to fetch) we can not 
fetch RDF before previous update has finished, thus reducing the 
parallelizeable part to essentially only Kafka data loading, which doesn't seem 
to be worth it.
  
  I'm still a bit confused about this logic inside the updater, especially with 
this id validation checking if we have the revision already etc?
  The fastest way for this to work in the distributed fashion that that it is 
currently laid out in is to just retrieve every revision of entity data, using 
a varnish cacheable query string, hold the latest revision of an entity in some 
internal queue in the updater for a few second while waiting for more updates, 
and then just commit that to blazegraph for storage after a few seconds.
  This means less reducing the php calls dramatically, increasing varnish hits, 
decreasing overall time spent waiting for special:entitydata responses, and 
still allowing for batching.
  
  A few other comment that we might want to think about.
  
  PHP is being hit very roughly with 12.5 million requests to turn some PHP 
object into RDF output for special entity data, we might want to just consider 
caching that in its own memcached key inside wikibase so we only have to do 
that conversion once per revision, reducing this logic from running 12.5 
million times to just around 1.1 million times.
  This hasn't been considered before because special entity data is cacheable, 
and these should all be varnish cache hits anyway, but if the updater behaviour 
does not change then maybe we should add this?
  
  Also regarding 3rd parties using the updater, perhaps the revid based 
approach needs to be developed anyway to reduce the load that is likely to 
continue to increase. These should be hitting the cache, but they should also 
nt be getting out of date data, revid is the solution to that.
  
  https://grafana.wikimedia.org/d/000000188/wikidata-special-entitydata shows 
the issue pretty well with the number of requests for uncached ttl data to 
special entity data.
  In the last year the number of requests seem to have doubled, and the request 
rate doesn't look like it is slowing down, are we ready to double the requests 
using this uncachable method again in the next 12 months?

TASK DETAIL
  https://phabricator.wikimedia.org/T217897

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: Addshore, Smalyshev, BBlack, Aklapper, Gehel, alaa_wmde, Legado_Shulgin, 
Nandana, thifranc, AndyTan, Davinaclare77, Qtn1293, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, 
merbst, LawExplorer, Zppix, _jensen, rosalieper, Jonas, Xmlizer, Wong128hk, 
jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, faidon, 
Mbch331, Jay8g, fgiunchedi
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to