Bug when loading multiple files?

Rodrigo Ferreira Wed, 24 Dec 2014 08:55:28 -0800

Hi everyone, happy holidays!

I have a Pig script that reads from 4 different folders in Amazon S3. This
is the code:


load_1 = LOAD 's3n://mybucket/{folder_1,folder_2,folder_3,folder_4}'
USING...;

It happens that instead of reading each folder just once and appending the
files Pig/Hadoop reads each folder 4 times.

The input should have 62174 records, but in the end I get 248696.

Why is that? Any ideas?

Thanks,
Rodrigo.

Bug when loading multiple files?

Reply via email to