| phuedx added subscribers: Imarlier, dr0ptp4kt, phuedx. phuedx added a comment. |
Quoting from @dr0ptp4kt's notes of the meeting on Wednesday, 29th August:
- We believe that the MediaWiki PHP Parser-based desktop HTML pages are indeed being used as the basis for a number of parts of Google Search Console and crawler behavior, as they are after all the canonical URLs for content.
- We also know that Parsoid and mdot are accessed by various Google automata (and other automata) for a number of pipelines.
- Our hunch is that application of the sameAs field to the MediaWiki PHP Parser-based HTML (which would show up in both desktop and mobile web) should therefore be tried first. It may turn out, however, that we need to update Parsoid for this to have an actual effect. As a general rule for tickets involving general markup, we may find out we need to implement changes in both the MediaWiki PHP Parser / extension hooks and Parsoid to ensure the broadest applicability for different automata, especially Google.
IIRC the consensus between @dr0ptp4kt, @Imarlier, and @Deskana was that Google is unlikely to provide us with any insight into their Wikipedia-specific pipelines.
There was an open question as to where the behaviour described in the third bullet above should live. My recommendation was to house this in a new MediaWiki extension that would go on to house all Wikipedia-specific markup that's added to the page – though there'll undoubtedly be exceptions. I did also recommend that some related code from MobileFrontend be relocated there but that was a red herring.
The bulk of the code in this extension could be implemented as follows:
public static function onBeforePageDisplay( OutputPage $out ) { $title = $out->getTitle();
// The Wikibase extension sets this property here: https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/master/client/includes/ParserOutput/ClientParserOutputDataUpdater.php#79 $itemId = $out->getProperty( 'wikibase_item' ); if ( !$title || !$itemId ) { return; } $out->addInlineScript( json_encode( [ '@context' => 'http://schema.org', 'name' => $title->getText(), 'url' => $title->getFullURL(), 'sameAs' => [ "https://www.wikidata.org/wiki/{$itemId}", ], ] ) ); }
N.B. scripts added to the output via OutputPage#addInlineScript are put at the end of body. AFAICT Google's crawler shouldn't have a problem with this as it's expecting to encounter structured data in both head and body (see https://developers.google.com/search/docs/guides/intro-structured-data#structured-data-format). However, the script can be added at the end of head if necessary.
There is a way to test this change at less cost than creating and deploying a small extension, however. Quoting Google's Intro to Structure Data:
Also, Google can read JSON-LD data when it is dynamically injected into the page's contents, such as by _javascript_ code or embedded widgets in your content management system.
The Wikibase extension also sends the Wikibase item's ID to the client so the code above could be written as a gadget, which can be deployed on a per-wiki basis as quick as we can write it:
const itemId = mw.config.get( 'wgWikibaseItemId' ); if ( ! itemId ) { return; } const script = document.createElement( 'script' ); script.innerHTML = JSON.stringify({ "@context": "http://schema.org", "name": mw.config.get( 'wgTitle' ), "url": document.location, "sameAs": [ "https://www.wikidata.org/wiki/" + mw.config.get( 'wgWikibaseItemId' ), ] }); document.body.appendChild( script );
This approach maximises speed of deployment at the potential cost of adding another variable to the "How does Google crawl the Wikipedias?" question, i.e. we have to rely on Google's Wikipedia-specific pipeline operating as described in their general documentation.
Cc: phuedx, dr0ptp4kt, Imarlier, cscott, Krenair, Aklapper, Deskana, Lahi, Gq86, GoranSMilovanovic, Chicocvenancio, QZanden, LawExplorer, Wikidata-bugs, aude, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
