dr0ptp4kt added a comment.
After an update to the script (PS6) and a fresh run of the same commands new
files have been `hdfs-rsync`'d to `stat1006:~dr0ptp4kt/gzips` in anticipation
of doing a file transfer over to the WDQS graph split test servers.
Here's a very small sample of what the files look like:
$ zcat part-01022-c261bb68-4091-4613-ae52-88ce97d22c14-c000.txt.gz | tail
-10
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"\u0935\u093F\u0915\u093F\u092E\u093F\u0921\u093F\u092F\u093E
\u0936\u094D\u0930\u0947\u0923\u0940"@ne .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"\u043A\u0430\u0442\u0435\u0433\u043E\u0440\u0438\u0458\u0430 \u043D\u0430
\u0412\u0438\u043A\u0438\u043C\u0435\u0434\u0438\u0458\u0438"@sr .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"\u7DAD\u57FA\u5A92\u9AD4\u5206\u985E"@yue .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"Wikimedia-Kategorie"@de-ch .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"catigur\u00ECa di nu pruggettu Wikimedia"@scn .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"categoria di un progetto Wikimedia"@it .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/version>
"1979010859"^^<http://www.w3.org/2001/XMLSchema#integer> .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"kategori Wikimedia"@map-bms .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"Wikimedia-kategoriija"@se .
<http://www.wikidata.org/entity/Q99896811> <http://schema.org/description>
"\u7DAD\u57FA\u5A92\u9AD4\u5206\u985E"@zh-mo .
$ zcat part-01023-c261bb68-4091-4613-ae52-88ce97d22c14-c000.txt.gz | head
-10
<http://www.wikidata.org/entity/statement/Q99896811-7623BB4C-2D20-4D2E-8784-E2ED8AD3E8E5>
<http://wikiba.se/ontology#rank> <http://wikiba.se/ontology#NormalRank> .
<http://www.wikidata.org/entity/statement/Q99896811-7623BB4C-2D20-4D2E-8784-E2ED8AD3E8E5>
<http://www.wikidata.org/prop/statement/P31>
<http://www.wikidata.org/entity/Q4167836> .
<http://www.wikidata.org/entity/statement/Q99896811-7623BB4C-2D20-4D2E-8784-E2ED8AD3E8E5>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://wikiba.se/ontology#BestRank> .
<https://ar.wikipedia.org/wiki/%D8%AA%D8%B5%D9%86%D9%8A%D9%81:%D8%B4%D8%B1%D9%83%D8%A7%D8%AA_%D8%B3%D9%88%D9%8A%D8%B3%D8%B1%D9%8A%D8%A9_%D8%A3%D8%B3%D8%B3%D8%AA_%D9%81%D9%8A_1973>
<http://schema.org/about> <http://www.wikidata.org/entity/Q99896811> .
<https://ar.wikipedia.org/wiki/%D8%AA%D8%B5%D9%86%D9%8A%D9%81:%D8%B4%D8%B1%D9%83%D8%A7%D8%AA_%D8%B3%D9%88%D9%8A%D8%B3%D8%B1%D9%8A%D8%A9_%D8%A3%D8%B3%D8%B3%D8%AA_%D9%81%D9%8A_1973>
<http://schema.org/isPartOf> <https://ar.wikipedia.org/> .
<https://ar.wikipedia.org/wiki/%D8%AA%D8%B5%D9%86%D9%8A%D9%81:%D8%B4%D8%B1%D9%83%D8%A7%D8%AA_%D8%B3%D9%88%D9%8A%D8%B3%D8%B1%D9%8A%D8%A9_%D8%A3%D8%B3%D8%B3%D8%AA_%D9%81%D9%8A_1973>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Article> .
<https://ar.wikipedia.org/wiki/%D8%AA%D8%B5%D9%86%D9%8A%D9%81:%D8%B4%D8%B1%D9%83%D8%A7%D8%AA_%D8%B3%D9%88%D9%8A%D8%B3%D8%B1%D9%8A%D8%A9_%D8%A3%D8%B3%D8%B3%D8%AA_%D9%81%D9%8A_1973>
<http://schema.org/inLanguage> "ar" .
<https://ar.wikipedia.org/wiki/%D8%AA%D8%B5%D9%86%D9%8A%D9%81:%D8%B4%D8%B1%D9%83%D8%A7%D8%AA_%D8%B3%D9%88%D9%8A%D8%B3%D8%B1%D9%8A%D8%A9_%D8%A3%D8%B3%D8%B3%D8%AA_%D9%81%D9%8A_1973>
<http://schema.org/name>
"\u062A\u0635\u0646\u064A\u0641:\u0634\u0631\u0643\u0627\u062A
\u0633\u0648\u064A\u0633\u0631\u064A\u0629 \u0623\u0633\u0633\u062A
\u0641\u064A 1973"@ar .
<https://en.wikipedia.org/wiki/Category:Swiss_companies_established_in_1973>
<http://schema.org/inLanguage> "en" .
<https://en.wikipedia.org/wiki/Category:Swiss_companies_established_in_1973>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://schema.org/Article> .
You'll notice that the the files are partitioned by `context` and `subject`,
and within a partition they're also sorted by `context` and `subject` (the
`context` field isn't part of the output, though; one would get that from the
source tables). So you may see, as in this example, things that are logically
clustered together spanning from the end of one file and the beginning of the
next partition in sequence.
TASK DETAIL
https://phabricator.wikimedia.org/T350106
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dr0ptp4kt
Cc: RKemper, EBernhardson, Aklapper, BTullis, bking, dr0ptp4kt, JAllemandou,
dcausse, Danny_Benjafield_WMDE, Isabelladantes1983, Themindcoder, Adamm71,
Jersione, Hellket777, LisafBia6531, Astuthiodit_1, AWesterinen, 786, Biggs657,
karapayneWMDE, Invadibot, maantietaja, Juan90264, Alter-paule, Beast1978,
ItamarWMDE, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD,
Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420,
Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst,
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Neuronton, Scott_WUaS,
Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles,
Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]