
Now I'm at the stage where my job seem to completely hang. Source code is
attached (it won't compile but I think gives a very good idea of what
happens). Unfortunately I can't provide the datasets. Most of them are
about 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
memory for each.

It was working for smaller input sizes. Any idea on what I can do
differently is appreciated.


Attachment: FaithResolution.scala
Description: Binary data

Reply via email to