Re: Java api overhead?

2014-10-29 Thread Koert Kuipers
since spark holds data structures on heap (and by default tries to work with all data in memory) and its written in Scala seeing lots of scala Tuple2 is not unexpected. how do these numbers relate to your data size? On Oct 27, 2014 2:26 PM, Sonal Goyal sonalgoy...@gmail.com wrote: Hi, I wanted

Re: Java api overhead?

2014-10-29 Thread Sonal Goyal
Thanks Koert. These numbers indeed tie back to our data and algorithms. Would going the scala route save some memory, as the java API creates wrapper Tuple2 for all pair functions? On Wednesday, October 29, 2014, Koert Kuipers ko...@tresata.com wrote: since spark holds data structures on heap

Java api overhead?

2014-10-27 Thread Sonal Goyal
Hi, I wanted to understand what kind of memory overheads are expected if at all while using the Java API. My application seems to have a lot of live Tuple2 instances and I am hitting a lot of gc so I am wondering if I am doing something fundamentally wrong. Here is what the top of my heap looks