Hi everyone, happy holidays! I have a Pig script that reads from 4 different folders in Amazon S3. This is the code:
load_1 = LOAD 's3n://mybucket/{folder_1,folder_2,folder_3,folder_4}' USING...; It happens that instead of reading each folder just once and appending the files Pig/Hadoop reads each folder 4 times. The input should have 62174 records, but in the end I get 248696. Why is that? Any ideas? Thanks, Rodrigo.