This sounds like something mapPartitions should be able to do, not
sure if there's an easier way.

On Thu, Dec 14, 2017 at 10:20 AM, Don Drake <dondr...@gmail.com> wrote:
> I'm looking for some advice when I have a flatMap on a Dataset that is
> creating and returning a sequence of a new case class
> (Seq[BigDataStructure]) that contains a very large amount of data, much
> larger than the single input record (think images).
>
> In python, you can use generators (yield) to bypass creating a large list of
> structures and returning the list.
>
> I'm programming this is in Scala and was wondering if there are any similar
> tricks to optimally return a list of classes?? I found the for/yield
> semantics, but it appears the compiler is just creating a sequence for you
> and this will blow through my Heap given the number of elements in the list
> and the size of each element.
>
> Is there anything else I can use?
>
> Thanks.
>
> -Don
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake
> 800-733-2143



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to