This sounds like something mapPartitions should be able to do, not sure if there's an easier way.
On Thu, Dec 14, 2017 at 10:20 AM, Don Drake <dondr...@gmail.com> wrote: > I'm looking for some advice when I have a flatMap on a Dataset that is > creating and returning a sequence of a new case class > (Seq[BigDataStructure]) that contains a very large amount of data, much > larger than the single input record (think images). > > In python, you can use generators (yield) to bypass creating a large list of > structures and returning the list. > > I'm programming this is in Scala and was wondering if there are any similar > tricks to optimally return a list of classes?? I found the for/yield > semantics, but it appears the compiler is just creating a sequence for you > and this will blow through my Heap given the number of elements in the list > and the size of each element. > > Is there anything else I can use? > > Thanks. > > -Don > > -- > Donald Drake > Drake Consulting > http://www.drakeconsulting.com/ > https://twitter.com/dondrake > 800-733-2143 -- Marcelo --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org