Hello, Now I'm at the stage where my job seem to completely hang. Source code is attached (it won't compile but I think gives a very good idea of what happens). Unfortunately I can't provide the datasets. Most of them are about 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB memory for each.
It was working for smaller input sizes. Any idea on what I can do differently is appreciated. Thans, Timur
FaithResolution.scala
Description: Binary data