Hey Richard, Good to hear from you as well. I thought I would ask if there was something Scala specific I was missing in handling these large classes.
I can tweak my job to do a map() and then only one large object will be created at a time and returned, which should allow me to lower my executor memory size. Thanks. -Don On Thu, Dec 14, 2017 at 2:58 PM, Richard Garris <rlgar...@databricks.com> wrote: > Hi Don, > > Good to hear from you. I think the problem is that regardless of whether > you use yield or a generator - Spark internally will produce the entire > result as a single large JVM object which will blow up your heap space. > > Would it be possible to shrink the overall size of the image object > storing it as a vector or Array vs a large Java class object? > > That might be the more prudent approach. > > -RG > > Richard Garris > > Principal Architect > > Databricks, Inc > > 650.200.0840 <(650)%20200-0840> > > rlgar...@databricks.com > > On December 14, 2017 at 10:23:00 AM, Marcelo Vanzin (van...@cloudera.com) > wrote: > > This sounds like something mapPartitions should be able to do, not > sure if there's an easier way. > > On Thu, Dec 14, 2017 at 10:20 AM, Don Drake <dondr...@gmail.com> wrote: > > I'm looking for some advice when I have a flatMap on a Dataset that is > > creating and returning a sequence of a new case class > > (Seq[BigDataStructure]) that contains a very large amount of data, much > > larger than the single input record (think images). > > > > In python, you can use generators (yield) to bypass creating a large > list of > > structures and returning the list. > > > > I'm programming this is in Scala and was wondering if there are any > similar > > tricks to optimally return a list of classes?? I found the for/yield > > semantics, but it appears the compiler is just creating a sequence for > you > > and this will blow through my Heap given the number of elements in the > list > > and the size of each element. > > > > Is there anything else I can use? > > > > Thanks. > > > > -Don > > > > -- > > Donald Drake > > Drake Consulting > > http://www.drakeconsulting.com/ > > https://twitter.com/dondrake > > 800-733-2143 <(800)%20733-2143> > > > > -- > Marcelo > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- Donald Drake Drake Consulting http://www.drakeconsulting.com/ https://twitter.com/dondrake <http://www.MailLaunder.com/> 800-733-2143