Modeling and implementation

Amit Mor Mon, 28 Oct 2013 11:16:14 -0700

Hello friends. Newbie here, at least when it goes to Spark. I would be very
thankful for data modeling suggestions for this scenario : I have 3 types
of logs, with more than 48 columns each. For simplicity I modeled each as
Tuple(PKsTuple, FinanceDataTuple, AuxData), i.e. Tuple of tuples.
Eventually I want to join the 3 RDD's by the PKsTuple and do a few calcs on
the FinanceDataTuple. Is this the path to go ? I thought about using a
Tuple of case classes just for more elegant access to the data but I was
worried that ser/de overhead would be wasteful. Btw, is there a recommanded
operation for this multi way join ? The data is very skewed, much more data
from one log and far less from the others.


Thanks,
Amit Mor

Modeling and implementation

Reply via email to