Re: [Cloud] Wikidata query request

2018-03-14 Thread Huji Lee
Seems like it was a network latency issue on my part. Changing the network made it possible for me to download it like you did, in around 3.5 minutes. I'm almost done processing the data! Thanks so much, Huji On Wed, Mar 14, 2018 at 12:39 PM, Lucas Werkmeister < lucas.werkmeis...@wikimedia.de> w

Re: [Cloud] Wikidata query request

2018-03-14 Thread Lucas Werkmeister
Sorry, I’m not sure what you mean about the query service being slow? I was able to fetch all 3.5M results with the following command: $ time curl --silent --header 'Accept: application/json' --get --data-urlencode 'query=SELECT ?entity ?property_value WHERE { ?entity wdt:P1566 ?property_value. }'

Re: [Cloud] Wikidata query request

2018-03-14 Thread Huji Lee
Actually, never mind. I reviewed the Java code behind it and it doesn't support more items per page. It also gets slow when you look at later pages (first few pages are in a warm cache and are fast). I think my best bet is to just download the latest JSON dump from https://www.wikidata.org/wiki/Wi

Re: [Cloud] Wikidata query request

2018-03-14 Thread Huji Lee
Lucas, No I don't need the page_id. The other two are enough. Wikidata Query Service seems very slow (it'll take about one day of continuous querying to get all the data). Linked Data Fragments server seems faster, but I wish I knew how to make it return more than 100 results at a time. Do you?

Re: [Cloud] Wikidata query request

2018-03-14 Thread Lucas Werkmeister
Huji, do you need the page_id in the query results? Otherwise, I would suggest using either the Wikidata Query Service, as Jaime suggested (though I’d omit the LIMIT and OFFSET – I think it’s better to let the server send you all the results at once) or the Linked Data Fragments server: https://que

Re: [Cloud] Wikidata query request

2018-03-13 Thread Huji Lee
Thanks, Jaime, for your recommendation. If I understand the result of [1] correctly, there are around 3.5 million pages with a GeoNames property specified on Wikidata. I'm sure some of them are redirects, or not cities, etc. But still, going through millions of pages through API calls of 1000 at a

Re: [Cloud] Wikidata query request

2018-03-13 Thread Jaime Crespo
I am not 100% sure there is a perfect way to do what you want by querying the metadata databases (I assume that is what you mean with query)- I don't think that data is metadata, but content itself, which is not on the metadata databases. Calling the wikidata query service is probably what you wan