Ash20001 created this task.
Ash20001 added projects: Wikidata, Dumps-Generation.
Restricted Application added a project: wdwb-tech-focus.

TASK DESCRIPTION
  I normally have been ingesting wikidata json dump files into mongodb using 
mongo import. This has worked for a year or so and then the last two weekly 
dumps have failed with this error:
  
  2021-03-05T15:35:17.320-0800    Failed: error reading separator after 
document #11554732: bad JSON array format - found '{' outside JSON object/array 
in input source
  2021-03-05T15:35:17.320-0800    11553900 document(s) imported successfully. 0 
document(s) failed to import.
  
  The command I run is:
   bunzip2 -dc ./wiki_job/latest-all.json.bz2 | mongoimport --host 
127.0.0.1:27017 --db wikiData --collection wiki --type json --drop 
--numInsertionWorkers 4 --jsonArray
  
  The dumps affected are March 3 and February 24 (as of 3-05-2021). 
  Feb 24th dump: 
https://dumps.wikimedia.org/wikidatawiki/entities/20210222/wikidata-20210222-all.json.bz2
  March 3rd dump: 
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.json.bz2
  
  I am not sure what has changed in the dump file but I have tried various 
mongoimport parameters but all exhibit the issue. The weekly dumps before Feb 
24th are fine.

TASK DETAIL
  https://phabricator.wikimedia.org/T276643

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn, Ash20001
Cc: ArielGlenn, Ash20001, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, gnosygnu, abian, Wikidata-bugs, aude, Addshore, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to