Hi folks, in the end, the strange number of map tasks was due to the way I was using the loaded records. Because of the lack of an IF statement, I was copying the input every time I read from it.
Rodrigo. 2014-12-24 15:20 GMT-02:00 Rodrigo Ferreira <web...@gmail.com>: > Thanks for your idea, Ankur. > > Unfortunately, I have many other folders under the root folder. Anyhow, > it's a very strange behavior. > > I can also load the folders separately and join then with UNION. I can use > one of these alternatives for now. But I'd like to know whether it's a new > issue, a well-known bug of if I'm doing someting wrong. > > Thanks, > Rodrigo. > > 2014-12-24 15:04 GMT-02:00 Ankur <ankur.kasliwal...@gmail.com>: > > Hi, >> >> Try giving path till root folder that is folder containing the mentioned >> four folders. ( bucket in your case ). >> >> This is a temporary solution to your problem. >> >> Thanks, >> Ankur >> >> Sent from my iPhone >> >> > On Dec 24, 2014, at 10:22 PM, Rodrigo Ferreira <web...@gmail.com> >> wrote: >> > >> > Hi everyone, happy holidays! >> > >> > I have a Pig script that reads from 4 different folders in Amazon S3. >> This >> > is the code: >> > >> > load_1 = LOAD 's3n://mybucket/{folder_1,folder_2,folder_3,folder_4}' >> > USING...; >> > >> > It happens that instead of reading each folder just once and appending >> the >> > files Pig/Hadoop reads each folder 4 times. >> > >> > The input should have 62174 records, but in the end I get 248696. >> > >> > Why is that? Any ideas? >> > >> > Thanks, >> > Rodrigo. >> > >