storing to different files

Jennie Cochran-Chinn Fri, 29 Jan 2010 18:32:49 -0800

I had a question about storing data to different files. The basicjist of what we are doing is taking a large set of data, performing agroup by and then storing each group's dataBag into a distinct file(on S3). Currently we are using a UDF inside a FOREACH loop thatwrites the dataBag to a local tmp file and then pushes it to S3. Thisdoes not seem to be the ideal way to do this and we were wondering ifanyone had any suggestions. I know there is the MultiStore functionin the piggybank, but given that we have many different groups, itdoes not appear that would scale very well. For instance, in someexperiments the cluster I was using could not open new streams andthus failed.


Thanks,
Jennie

storing to different files

Reply via email to