[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2021-06-21 Thread bennofs
bennofs added a comment. In T222985#7163999 <https://phabricator.wikimedia.org/T222985#7163999>, @ArielGlenn wrote: > lbzip2 decompresses in parallel as well. We use that for compression of the SQL/XML dumps. Yes, the problem is that bzip2 is just really slow to d

[Wikidata-bugs] [Maniphest] [Commented On] T147577: NotMaterializedException (Vocab(2):) on combination of subquery, limit, triple, and label service

2019-09-21 Thread bennofs
bennofs added a comment. This does not seem fully fixed yet: https://www.wikidata.org/wiki/Wikidata_talk:SPARQL_query_service#Possible_bug. Example from that post: SELECT ?item ?itemLabel ?linkTo { ?item wdt:P780/wdt:P31*/wdt:P279* wd:Q737460, wd:Q86, wd:Q21120251. SERVICE

[Wikidata-bugs] [Maniphest] [Commented On] T230588: Wikidata Query Service is swapping items and properties

2019-08-19 Thread bennofs
bennofs added a comment. This query https://query.wikidata.org/#SELECT%20%3Fprop%20%3Ftype%20WHERE%20%7B%20%3Fprop%20wikibase%3ApropertyType%20%3Ftype%20FILTER%20%28CONTAINS%28STR%28%3Fprop%29%2C%22Q%22%29%20%26%26%203%21%3D1%29%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. $ time zstdcat -v -d wikidata-20190506-all.json.bz2 | zstd > /dev/null real4m5.341s user2m22.4

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. But I can do a zstd decompression -> zstd compression test. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bennofs Cc: ArielGlenn, Liuxinyu970226, benn

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. I don't have enough disk space for a compression test, that's correct. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: bennofs Cc: ArielGlenn, Liuxinyu970226, bennofs

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-15 Thread bennofs
bennofs added a comment. Now the same with zstd: $ time zstdcat -v -d wikidata-20190506-all.json.bz2 | cat > /dev/null real3m48.657s user0m3.792s sys 0m58.768s here's the sizes: 35G wikidata-20190506-all.json.bz2 39G wikid

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-14 Thread bennofs
bennofs added a comment. So I tried lbzip2, here's the result (on a VM sever with 2 cores, 2.1GHz, the decompression is CPU bound): $ time lbzip2 -n2 -v -d -c wikidata-20190506-all.json.bz2 | cat > /dev/n

[Wikidata-bugs] [Maniphest] [Created] T222985: Provide wikidata JSON dumps compressed with zstd

2019-05-10 Thread bennofs
bennofs created this task. bennofs added projects: Wikidata, Dumps-Generation. Restricted Application added a subscriber: Liuxinyu970226. TASK DESCRIPTION At this time, wikidata provides JSON dumps compressed with gzip or bzip2. However, neither are not optimal: - the gzip dump is quite