Hi folks,

in the end, the strange number of map tasks was due to the way I was using
the loaded records. Because of the lack of an IF statement, I was copying
the input every time I read from it.

Rodrigo.

2014-12-24 15:20 GMT-02:00 Rodrigo Ferreira <web...@gmail.com>:

> Thanks for your idea, Ankur.
>
> Unfortunately, I have many other folders under the root folder. Anyhow,
> it's a very strange behavior.
>
> I can also load the folders separately and join then with UNION. I can use
> one of these alternatives for now. But I'd like to know whether it's a new
> issue, a well-known bug of if I'm doing someting wrong.
>
> Thanks,
> Rodrigo.
>
> 2014-12-24 15:04 GMT-02:00 Ankur <ankur.kasliwal...@gmail.com>:
>
> Hi,
>>
>> Try giving path till root folder that is folder containing the mentioned
>> four folders. ( bucket in your case ).
>>
>> This is a temporary solution to your problem.
>>
>> Thanks,
>> Ankur
>>
>> Sent from my iPhone
>>
>> > On Dec 24, 2014, at 10:22 PM, Rodrigo Ferreira <web...@gmail.com>
>> wrote:
>> >
>> > Hi everyone, happy holidays!
>> >
>> > I have a Pig script that reads from 4 different folders in Amazon S3.
>> This
>> > is the code:
>> >
>> > load_1 = LOAD 's3n://mybucket/{folder_1,folder_2,folder_3,folder_4}'
>> > USING...;
>> >
>> > It happens that instead of reading each folder just once and appending
>> the
>> > files Pig/Hadoop reads each folder 4 times.
>> >
>> > The input should have 62174 records, but in the end I get 248696.
>> >
>> > Why is that? Any ideas?
>> >
>> > Thanks,
>> > Rodrigo.
>>
>
>

Reply via email to