| aborrero added a comment. |
I just had a videoconf with people @abian and Ruben Ojeda from Wikimedia Spain.
Some conclusions:
- IGN offers a lot of data, in many different formats. @abian or someone else should get an idea on how to post-process these files to a format understandable by Commons.
- We agreed on trying a 200GB VM for data processing before uploading to commons, and work by small chunks of data. Of these 200GB, 100GB is for the raw download, and 100GB for the post-process output before uploading to common. After a chunk is processed, the storage is cleaned to left space for next chunk.
- Apparently IGN doesn't have an API or other structured web URL for us to download the data using a script. They use some custom POST parameters, and we would need some information on them before we can script those.
- If we can't automate the download, there is an option to go to the IGN datacenter, plug a hard disk and fetch all the data without using the network. Once we have this hard disk we could either send it to a WMF datacenter or @abian can upload it from his home to our VM.
So, there are 2 different issues here:
- How to fetch the data from IGN (web API, http POST, hard disk, etc)
- How to process the data we fetched from IGN
In case we discover IGN has an API (or @abian can script the http POST easily) we could even think on having this pipeline build on Toolforge in our Grid Engine (download small chunk -> process -> upload to commons -> start again) .
TASK DETAIL
EMAIL PREFERENCES
To: aborrero
Cc: fgiunchedi, Reedy, bd808, Aklapper, aborrero, SandraF_WMF, Platonides, Rodelar, abian, AndyTan, sietec, Zylc, 1978Gage2001, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, GoranSMilovanovic, Chicocvenancio, QZanden, Tbscho, Tramullas, Acer, LawExplorer, JJMC89, Susannaanas, srodlund, Luke081515, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Gryllida, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, scfc, Steinsplitter, Mbch331, Krenair, chasemp
Cc: fgiunchedi, Reedy, bd808, Aklapper, aborrero, SandraF_WMF, Platonides, Rodelar, abian, AndyTan, sietec, Zylc, 1978Gage2001, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Anooprao, GoranSMilovanovic, Chicocvenancio, QZanden, Tbscho, Tramullas, Acer, LawExplorer, JJMC89, Susannaanas, srodlund, Luke081515, Aschroet, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Gryllida, Ricordisamoa, Lydia_Pintscher, Fabrice_Florin, Raymond, scfc, Steinsplitter, Mbch331, Krenair, chasemp
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
