subject:"\"\\\[Spark 2.x Core\\\] .collect\\\(\\\) size limit\""

Re: [EXT] [Spark 2.x Core] .collect() size limit

2018-05-01 Thread klrmowse

okie, i may have found an alternate/workaround to using .collect() for what i am trying to achieve... initially, for the Spark application that i am working on, i would call .collect() on two separate RDDs into a couple of ArrayLists (which was the reason i was asking what the size limit on the dr

Re: [EXT] [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Michael Mansour

Well, if you don't need to actually evaluate the information on the driver, but just need to trigger some sort of action, then you may want to consider using the `forEach` or `forEachPartition` method, which is an action and will execute your process. It won't return anything to the driver and

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Irving Duran

I don't think there is a magic number, so I would say that it will depend on how big your dataset is and the size of your worker(s). Thank You, Irving Duran On Sat, Apr 28, 2018 at 10:41 AM klrmowse wrote: > i am currently trying to find a workaround for the Spark application i am > working o

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Vadim Semenov

`.collect` returns an Array, and array's can't have more than Int.MaxValue elements, and in most JVMs it's lower: `Int.MaxValue - 8` So it puts upper limit, however, you can just create Array of Arrays, and so on, basically limitless, albeit with some gymnastics.

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Deepak Goel

AM. So, > if general guidelines are followed, **virtual memory** is moot. > > *From: *Deepak Goel > *Date: *Saturday, April 28, 2018 at 12:58 PM > *To: *Stephen Boesch > *Cc: *klrmowse , "user @spark" > *Subject: *Re: [Spark 2.x Core] .collect() size limit >

Re: [Spark 2.x Core] .collect() size limit

2018-04-30 Thread Lalwani, Jayesh

say that executor and driver memory should be kept at 80-85% of available RAM. So, if general guidelines are followed, *virtual memory* is moot. From: Deepak Goel Date: Saturday, April 28, 2018 at 12:58 PM To: Stephen Boesch Cc: klrmowse , "user @spark" Subject: Re: [Spark 2.x Core

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Mark Hamstra

spark.driver.maxResultSize http://spark.apache.org/docs/latest/configuration.html On Sat, Apr 28, 2018 at 8:41 AM, klrmowse wrote: > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use .collect() > > but, for now, it is going to

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel

I believe the virtualization of memory happens at the OS layer hiding it completely from the application layer On Sat, 28 Apr 2018, 22:22 Stephen Boesch, wrote: > While it is certainly possible to use VM I have seen in a number of places > warnings that collect() results must be able to be fit i

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch

While it is certainly possible to use VM I have seen in a number of places warnings that collect() results must be able to be fit in memory. I'm not sure if that applies to *all" spark calculations: but in the very least each of the specific collect()'s that are performed would need to be verified.

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Deepak Goel

There is something as *virtual memory* On Sat, 28 Apr 2018, 21:19 Stephen Boesch, wrote: > Do you have a machine with terabytes of RAM? afaik collect() requires > RAM - so that would be your limiting factor. > > 2018-04-28 8:41 GMT-07:00 klrmowse : > >> i am currently trying to find a workarou

Re: [Spark 2.x Core] .collect() size limit

2018-04-28 Thread Stephen Boesch

Do you have a machine with terabytes of RAM? afaik collect() requires RAM - so that would be your limiting factor. 2018-04-28 8:41 GMT-07:00 klrmowse : > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use .collect() > > but, fo

[Spark 2.x Core] .collect() size limit

2018-04-28 Thread klrmowse

i am currently trying to find a workaround for the Spark application i am working on so that it does not have to use .collect() but, for now, it is going to have to use .collect() what is the size limit (memory for the driver) of RDD file that .collect() can work with? i've been scouring google-

Re: [EXT] [Spark 2.x Core] .collect() size limit

Re: [EXT] [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

Re: [Spark 2.x Core] .collect() size limit

[Spark 2.x Core] .collect() size limit

12 matches

Site Navigation

Mail list logo

Footer information