Ash20001 created this task.
Ash20001 added projects: Wikidata, Dumps-Generation.
Restricted Application added a project: wdwb-tech-focus.
TASK DESCRIPTION
I normally have been ingesting wikidata json dump files into mongodb using
mongo import. This has worked for a year or so and then the last two weekly
dumps have failed with this error:
2021-03-05T15:35:17.320-0800 Failed: error reading separator after
document #11554732: bad JSON array format - found '{' outside JSON object/array
in input source
2021-03-05T15:35:17.320-0800 11553900 document(s) imported successfully. 0
document(s) failed to import.
The command I run is:
bunzip2 -dc ./wiki_job/latest-all.json.bz2 | mongoimport --host
127.0.0.1:27017 --db wikiData --collection wiki --type json --drop
--numInsertionWorkers 4 --jsonArray
The dumps affected are March 3 and February 24 (as of 3-05-2021).
Feb 24th dump:
https://dumps.wikimedia.org/wikidatawiki/entities/20210222/wikidata-20210222-all.json.bz2
March 3rd dump:
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2
I am not sure what has changed in the dump file but I have tried various
mongoimport parameters but all exhibit the issue. The weekly dumps before Feb
24th are fine.
TASK DETAIL
https://phabricator.wikimedia.org/T276643
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn, Ash20001
Cc: ArielGlenn, Ash20001, maantietaja, jannee_e, Akuckartz, Nandana, Lahi,
Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper,
Scott_WUaS, gnosygnu, abian, Wikidata-bugs, aude, Addshore, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs