Seems like it was a network latency issue on my part. Changing the network
made it possible for me to download it like you did, in around 3.5 minutes.
I'm almost done processing the data!
Thanks so much,
Huji
On Wed, Mar 14, 2018 at 12:39 PM, Lucas Werkmeister <
lucas.werkmeis...@wikimedia.de> w
Sorry, I’m not sure what you mean about the query service being slow? I was
able to fetch all 3.5M results with the following command:
$ time curl --silent --header 'Accept: application/json' --get
--data-urlencode 'query=SELECT ?entity ?property_value WHERE { ?entity
wdt:P1566 ?property_value. }'
Actually, never mind. I reviewed the Java code behind it and it doesn't
support more items per page. It also gets slow when you look at later pages
(first few pages are in a warm cache and are fast).
I think my best bet is to just download the latest JSON dump from
https://www.wikidata.org/wiki/Wi
Lucas,
No I don't need the page_id. The other two are enough.
Wikidata Query Service seems very slow (it'll take about one day of
continuous querying to get all the data). Linked Data Fragments server
seems faster, but I wish I knew how to make it return more than 100 results
at a time. Do you?
Huji, do you need the page_id in the query results? Otherwise, I would
suggest using either the Wikidata Query Service, as Jaime suggested (though
I’d omit the LIMIT and OFFSET – I think it’s better to let the server send
you all the results at once) or the Linked Data Fragments server:
https://que
Thanks, Jaime, for your recommendation.
If I understand the result of [1] correctly, there are around 3.5 million
pages with a GeoNames property specified on Wikidata. I'm sure some of them
are redirects, or not cities, etc. But still, going through millions of
pages through API calls of 1000 at a
I am not 100% sure there is a perfect way to do what you want by querying
the metadata databases (I assume that is what you mean with query)- I don't
think that data is metadata, but content itself, which is not on the
metadata databases.
Calling the wikidata query service is probably what you wan