Hi All, I am passing Java static methods into RDD transformations map and mapValues. The first map is from a simple string K into a (K,V) where V is a Java ArrayList of large text strings, 50K each, read from Cassandra. MapValues does processing of these text blocks into very small ArrayLists.
The code runs quite slow compared to running it in parallel on the same servers from plain Java. I gave the same heap to Executors and Java. Does java run slower under Spark or do I suffer from excess heap pressure or am I missing something? Thank you for any insight, Oleg