Hey Richard,

Good to hear from you as well.  I thought I would ask if there was
something Scala specific I was missing in handling these large classes.

I can tweak my job to do a map() and then only one large object will be
created at a time and returned, which should allow me to lower my executor
memory size.

Thanks.

-Don


On Thu, Dec 14, 2017 at 2:58 PM, Richard Garris <rlgar...@databricks.com>
wrote:

> Hi Don,
>
> Good to hear from you. I think the problem is that regardless of whether
> you use yield or a generator - Spark internally will produce the entire
> result as a single large JVM object which will blow up your heap space.
>
> Would it be possible to shrink the overall size of the image object
> storing it as a vector or Array vs a large Java class object?
>
> That might be the more prudent approach.
>
> -RG
>
> Richard Garris
>
> Principal Architect
>
> Databricks, Inc
>
> 650.200.0840 <(650)%20200-0840>
>
> rlgar...@databricks.com
>
> On December 14, 2017 at 10:23:00 AM, Marcelo Vanzin (van...@cloudera.com)
> wrote:
>
> This sounds like something mapPartitions should be able to do, not
> sure if there's an easier way.
>
> On Thu, Dec 14, 2017 at 10:20 AM, Don Drake <dondr...@gmail.com> wrote:
> > I'm looking for some advice when I have a flatMap on a Dataset that is
> > creating and returning a sequence of a new case class
> > (Seq[BigDataStructure]) that contains a very large amount of data, much
> > larger than the single input record (think images).
> >
> > In python, you can use generators (yield) to bypass creating a large
> list of
> > structures and returning the list.
> >
> > I'm programming this is in Scala and was wondering if there are any
> similar
> > tricks to optimally return a list of classes?? I found the for/yield
> > semantics, but it appears the compiler is just creating a sequence for
> you
> > and this will blow through my Heap given the number of elements in the
> list
> > and the size of each element.
> >
> > Is there anything else I can use?
> >
> > Thanks.
> >
> > -Don
> >
> > --
> > Donald Drake
> > Drake Consulting
> > http://www.drakeconsulting.com/
> > https://twitter.com/dondrake
> > 800-733-2143 <(800)%20733-2143>
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143

Reply via email to