Incomplete data when reading from S3

Blaž Šnuderl Fri, 18 Mar 2016 20:30:05 -0700

Hi.

We have json data stored in S3 (json record per line). When reading the
data from s3 using the following code we started noticing json decode
errors.


sc.textFile(paths).map(json.loads)


After a bit more investigation we noticed an incomplete line, basically the
line was

> {"key": "value", "key2":  <- notice the line abruptly ends with no json
> close tag etc


It is not an issue with our data and it doesn't happen very often, but it
makes us very scared since it means spark could be dropping data.

We are using spark 1.5.1. Any ideas why this happens and possible fixes?

Regards,
Blaž Šnuderl

Incomplete data when reading from S3

Reply via email to