Reedy added a comment.
Looking at the code...
foreach ( $this->getCategoryIterator( $dbr ) as $batch ) {
$pages = [];
foreach ( $batch as $row ) {
$this->categoriesRdf->writeCategoryData(
$row->page_title,
$row->pp_propname === 'hiddencat',
(int)$row->cat_pages -
(int)$row->cat_subcats - (int)$row->cat_files,
(int)$row->cat_subcats
);
$pages[$row->page_id] = $row->page_title;
}
foreach ( $this->getCategoryLinksIterator( $dbr,
array_keys( $pages ) ) as $row ) {
$this->categoriesRdf->writeCategoryLinkData(
$pages[$row->cl_from], $row->cl_to );
}
fwrite( $output, $this->rdfWriter->drain() );
}
The answer would seemingly be from `array_keys( $pages ) `... But how would a
page_id end up not being a number if pulled from the DB?
Even more so as in `getCategoryIterator` it's using `page` as the main table,
and joining in others... ie the page table isn't the one LEFT JOIN'd in...
I guess it'd be useful to get the actual SQL query being generated by this
(ie by the code, not by what it should be if a human built it from that):
/**
* Produce row iterator for categories.
* @param IDatabase $dbr Database connection
* @return RecursiveIterator
*/
public function getCategoryIterator( IDatabase $dbr ) {
$it = new BatchRowIterator(
$dbr,
[ 'page', 'page_props', 'category' ],
[ 'page_title' ],
$this->getBatchSize()
);
$it->addConditions( [
'page_namespace' => NS_CATEGORY,
] );
$it->setFetchColumns( [
'page_title',
'page_id',
'pp_propname',
'cat_pages',
'cat_subcats',
'cat_files'
] );
$it->addJoinConditions(
[
'page_props' => [
'LEFT JOIN', [ 'pp_propname' =>
'hiddencat', 'pp_page = page_id' ]
],
'category' => [
'LEFT JOIN', [ 'cat_title = page_title'
]
]
]
);
return $it;
}
And then see what the result set looks like...
The two sets of code in question don't seem to have changed in any related
way recently, so it could just be some bad stuff in the actual DB
The question would be how far through (ie after how many batches) does this
error occur
TASK DETAIL
https://phabricator.wikimedia.org/T260232
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Reedy
Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF,
Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR,
jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou,
Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113,
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden,
EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10,
Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll,
Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000,
Manybubbles, Mbch331, Rxy, Jay8g
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs