Thanks, Jennie
I had a question about storing data to different files. The basic
jist of what we are doing is taking a large set of data, performing a
group by and then storing each group's dataBag into a distinct file
(on S3). Currently we are using a UDF inside a FOREACH loop that
writes the dataBag to a local tmp file and then pushes it to S3. This
does not seem to be the ideal way to do this and we were wondering if
anyone had any suggestions. I know there is the MultiStore function
in the piggybank, but given that we have many different groups, it
does not appear that would scale very well. For instance, in some
experiments the cluster I was using could not open new streams and
thus failed.
- storing to different files Jennie Cochran-Chinn
- Re: storing to different files Rekha Joshi
- Re: storing to different files Ankur C. Goel
- Re: storing to different files Jennie Cochran-Chinn
- Re: storing to different files zaki rahaman
- Re: storing to different files Ankur C. Goel
- Re: storing to different files Jennie Cochran-Chinn
