It sounds like you want to aggregate your rows in some way. I actually just wrote a blog post about that topic: https://medium.com/@albamus/spark-aggregating-your-data-the-fast-way-e37b53314fad
On Mon, Aug 19, 2019 at 4:24 PM Rishikesh Gawade <rishikeshg1...@gmail.com> wrote: > *This Message originated outside your organization.* > ------------------------------ > Hi All, > I have been trying to serialize a dataframe in protobuf format. So far, I > have been able to serialize every row of the dataframe by using map > function and the logic for serialization within the same(within the lambda > function). The resultant dataframe consists of rows in serialized format(1 > row = 1 serialized message). > I wish to form a single protobuf serialized message for this dataframe and > in order to do that i need to combine all the serialized rows using some > custom logic very similar to the one used in map operation. > I am assuming that this would be possible by using the reduce operation on > the dataframe, however, i am unaware of how to go about it. > Any suggestions/approach would be much appreciated. > > Thanks, > Rishikesh >