[Wikidata-bugs] [Maniphest] [Changed Project Column] T233520: page_props wikibase_item is sometimes not added to client pages when a sitelink is added on a repo

2020-02-25 Thread Ladsgroup
Ladsgroup moved this task from Stalled/Waiting to Peer Review on the 
Wikidata-Campsite (Wikidata-Campsite-Iteration-∞) board.
Ladsgroup added a comment.


  In T233520#5917452 , 
@daniel wrote:
  
  > In T233520#5897446 , 
@Ladsgroup wrote:
  >
  >> This already happens (changing page_touched) through `HTMLCacheUpdateJob` 
and that's one of the reasons the problem exists, because the 
try_to_get_cached_parser_output method (the aforementioned method I pasted) 
checks if the page_touched is newer than the rootTimeStamp which by definition 
is (root job is the change notification one that spawns HTMLCacheUpdateJob)
  >
  > So the solution would be for the LinksUpdate job to not inherit the root 
timestamp?
  > If that's not possible for some reason, LinksUpdate could have a flag for 
suppressing the opportunistic behavior, or for providing a different timestamp.
  
  That's a rather easy way to fix it, yes.

TASK DETAIL
  https://phabricator.wikimedia.org/T233520

WORKBOARD
  https://phabricator.wikimedia.org/project/board/3539/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ladsgroup
Cc: CCicalese_WMF, Anomie, Petar.petkovic, Zoranzoki21, Tacsipacsi, 
matej_suchanek, Bencemac, Urbanecm, RolandUnger, alaa_wmde, Ladsgroup, 
Lea_Lacroix_WMDE, LibrErli, Krinkle, daniel, tstarling, aaron, WMDE-leszek, 
Addshore, Lydia_Pintscher, Lucas_Werkmeister_WMDE, Jheald, Aklapper, Mike_Peel, 
Hazizibinmahdi, Beast1978, CBogen, Un1tY, Hook696, Daryl-TTMG, RomaAmorRoma, 
E.S.A-Sheild, Iflorez, darthmon_wmde, WDoranWMF, holger.knust, EvanProdromou, 
Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, Gaboe420, Versusxo, 
Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Af420, 
Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, 
Ramalepe, Liugev6, QZanden, LawExplorer, WSH1906, Lewizho99, Maathavan, 
Poyekhali, _jensen, rosalieper, Agabi10, Taiwania_Justo, Scott_WUaS, Pchelolo, 
Jonas, Ixocactus, Wong128hk, Wikidata-bugs, aude, El_Grafo, Dinoguy1000, 
Steinsplitter, Mbch331, Keegan
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Changed Project Column] T233520: page_props wikibase_item is sometimes not added to client pages when a sitelink is added on a repo

2020-02-17 Thread Ladsgroup
Ladsgroup moved this task from To Do to Peer Review on the Wikidata-Campsite 
(Wikidata-Campsite-Iteration-∞) board.
Ladsgroup added a comment.


  Okay, I spent several hours debugging and trying to figure out what's going 
on and how I0cd13c424 
 broke it.
  
  Spoiler alert: It's not logic in the hook or issues with virtual usages.
  
  Let's break down thesteps to reproduce it:
  
  - Create a page on wikipedia that includes some data data from a statement on 
wikidata item
- This creates a ParserCache entry without the wikidata data (stuff that 
goes into page_props)
  - Add a sitelink from the item to the page
- This happens in Wikidata, it means it triggers a row in wb_changes table 
that later gets dispatched to the client wiki
- The dispatched patch, sees a need to refreshLinks on that page and thus 
enqueues `ChangeNotification` job
- The `ChangeNotification` job enqueues a `refreshLinks` job
- The great `refreshLinks` job needs a ParserOutput object, it tries to 
load it but with the change merged it tries to load PO object from PC:
  - Oh there's one PC object already for when the page was made originally 
and my rootJobTimestamp + 10 seconds is still younger than the original PC
  - Look, the revision id hasn't changed since then, same as page_touched
  - Cool, let's just use the cached PO instead
  
  As the result:
  
  - The page_props will not be updated for the page
  
  The code at fault is this:
  
/**
 * Get the parser output from cache if it reflects the change that 
triggered this job
 *
 * @param ParserCache $parserCache
 * @param WikiPage $page
 * @param RevisionRecord $currentRevision
 * @param StatsdDataFactoryInterface $stats
 * @return ParserOutput|null
 */
private function getParserOutputFromCache(
ParserCache $parserCache,
WikiPage $page,
RevisionRecord $currentRevision,
StatsdDataFactoryInterface $stats
) {
$cachedOutput = null;
// If page_touched changed after this root job, then it is 
likely that
// any views of the pages already resulted in re-parses which 
are now in
// cache. The cache can be reused to avoid expensive parsing in 
some cases.
$rootTimestamp = $this->params['rootJobTimestamp'] ?? null;
if ( $rootTimestamp !== null ) {
$opportunistic = !empty( 
$this->params['isOpportunistic'] );
if ( $opportunistic ) {
// Neither clock skew nor DB snapshot/replica 
DB lag matter much for
// such updates; focus on reusing the (often 
recently updated) cache
$lagAwareTimestamp = $rootTimestamp;
} else {
// For transclusion updates, the template 
changes must be reflected
$lagAwareTimestamp = wfTimestamp(
TS_MW,
wfTimestamp( TS_UNIX, $rootTimestamp ) 
+ self::NORMAL_MAX_LAG
);
}

if ( $page->getTouched() >= $rootTimestamp || 
$opportunistic ) {
// Cache is suspected to be up-to-date so it's 
worth the I/O of checking.
// As long as the cache rev ID matches the 
current rev ID and it reflects
// the job's triggering change, then it is 
usable.
$parserOptions = $page->makeParserOptions( 
'canonical' );
$output = $parserCache->getDirty( $page, 
$parserOptions );
if (
$output &&
$output->getCacheRevisionId() == 
$currentRevision->getId() &&
$output->getCacheTime() >= 
$lagAwareTimestamp
) {
$cachedOutput = $output;
}
}
}

if ( $cachedOutput ) {
$stats->increment( 'refreshlinks.parser_cached' );
} else {
$stats->increment( 'refreshlinks.parser_uncached' );
}

return $cachedOutput;
}
  
  I was able to confirm that this patch causes it by making the cache function 
return null all the time and saw the records appearing in page_props.
  
  Options:
  
  - Introduce an opposite of `isOpportunistic` option and make the refreshlinks 
jobs enqueued by wikibase to set that.
- Side note: