Etl with spark

Sam Elamin Sun, 12 Feb 2017 03:04:41 -0800

Hey folks

Really simple question here. I currently have an etl pipeline that reads
from s3 and saves the data to an endstore



I have to read from a list of keys in s3 but I am doing a raw extract then
saving. Only some of the extracts have a simple transformation but overall
the code looks the same


I abstracted away this logic into a method that takes in an s3 path does
the common transformations and saves to source


But the job takes about 10 mins or so because I'm iteratively going down a
list of keys

Is it possible to asynchronously do this?

FYI I'm using spark.read.json to read from s3 because it infers my schema

Regards
Sam

Etl with spark

Reply via email to