Re: [atlas] Processing RIPE Atlas data as Big Data

Stephen D. Strowes Tue, 24 Jul 2018 03:05:55 -0700

Hi,

I assume you're referring to the daily dumps that we release here:
https://data-store.ripe.net/datasets/atlas-daily-dumps/

There are a couple of things that I find are relatively slow to dealwith on the command line: standard bzip2 tooling, and jq for jsonparsing. So I lean on a couple of other tools to speed things up for me:


- the lbzip2 suite parallelises parts of the compress/decompress pipeline
- GNU parallel can split data in a pipe onto one process per core

So, for example, on my laptop I can reasonably quickly pull out all ofthe traceroutes my own probe ran:lbzcat traceroute-2018-07-23T0700.bz2 | parallel -q --pipe jq '. |select(.prb_id == 14277)'

Stéphane has written about using jq to parse Atlas results onlabs.ripe.net also:https://labs.ripe.net/Members/stephane_bortzmeyer/processing-ripe-atlas-results-with-jq


Happy to hear from others what tools they use for data processing!

Cheers,

S.



On 21/07/2018 19:09, BELLAFKIH hayat wrote:

Dear RIPE Atlas users,
I am studying the processing of the data collected by the probes as aBig Data problem. For instance, one hour of traceroute data count for500 Mo (bzip2), so 7 Go of data in text format. Can you share with mehow you deal with these data in practice.
are you using a super machine, Big Data tools?

best regards,
Hayat

Re: [atlas] Processing RIPE Atlas data as Big Data

Reply via email to