Hi everyone, I need some advice about how to make the following: having a RDD of vectors (each vector being Vector(Int, Int , Int, int)), I need to group the data, then I need to apply a function to every group comparing each consecutive item within a group and retaining a variable (that has to be added to the end of each vector) if a condition from the comparison is true.
I show an example next: (1, 2, 5, 2) (1, 3, 4, 4) (1, 3, 7, 3) (1, 3, 4, 8) Data are grouped by the two first fields, then for each group I have to compare each consecutive fourth field, the first field is used as initial value and then, if the next value is greater than the previous one that will be the next retained value added to the vector. So, the output should be: (1, 2 , 5, 2, 2) (1, 3 ,4, 4, 4) (1, 3 , 7, 3, 4) (1, 3, 4, 8, 8) My attempt is a groupBy and then a map with a loop for inside, then I have to build a vector of vectors adding the new field. However, I am not being able to get the right output since I cannot add a new field to the vector. I do not know either what should be the right output from the map to get the same shape than the original data once it has been grouped. Besides, my though is that the loop for is not the best option to iterate through the elements of each group. And finally, maybe this can be done with other operations like reducebykey or so. Any clue is very appreciated... Thanks in advance!