Hello friends. Newbie here, at least when it goes to Spark. I would be very thankful for data modeling suggestions for this scenario : I have 3 types of logs, with more than 48 columns each. For simplicity I modeled each as Tuple(PKsTuple, FinanceDataTuple, AuxData), i.e. Tuple of tuples. Eventually I want to join the 3 RDD's by the PKsTuple and do a few calcs on the FinanceDataTuple. Is this the path to go ? I thought about using a Tuple of case classes just for more elegant access to the data but I was worried that ser/de overhead would be wasteful. Btw, is there a recommanded operation for this multi way join ? The data is very skewed, much more data from one log and far less from the others.
Thanks, Amit Mor
