Re: Executorlost failure

2022-04-07 Thread Wes Peng
I just did a test, even for a single node (local deployment), spark can 
handle the data whose size is much larger than the total memory.


My test VM (2g ram, 2 cores):

$ free -m
  totalusedfree  shared  buff/cache 
available
Mem:   19921845  92  19  54 
 36

Swap:  1023 285 738


The data size:

$ du -h rate.csv
3.2Grate.csv


Loading this file into spark for calculation can be done without error:

scala> val df = spark.read.format("csv").option("inferSchema", 
true).load("skydrive/rate.csv")
val df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 2 
more fields]


scala> df.printSchema
warning: 1 deprecation (since 2.13.3); for details, enable `:setting 
-deprecation` or `:replay -deprecation`

root
 |-- _c0: string (nullable = true)
 |-- _c1: string (nullable = true)
 |-- _c2: double (nullable = true)
 |-- _c3: integer (nullable = true)


scala> 
df.groupBy("_c1").agg(avg("_c2").alias("avg_rating")).orderBy(desc("avg_rating")).show
warning: 1 deprecation (since 2.13.3); for details, enable `:setting 
-deprecation` or `:replay -deprecation`
+--+--+ 


|   _c1|avg_rating|
+--+--+
|000136|   5.0|
|0001711474|   5.0|
|0001360779|   5.0|
|0001006657|   5.0|
|0001361155|   5.0|
|0001018043|   5.0|
|000136118X|   5.0|
|202010|   5.0|
|0001371037|   5.0|
|401048|   5.0|
|0001371045|   5.0|
|0001203010|   5.0|
|0001381245|   5.0|
|0001048236|   5.0|
|0001436163|   5.0|
|000104897X|   5.0|
|0001437879|   5.0|
|0001056107|   5.0|
|0001468685|   5.0|
|0001061240|   5.0|
+--+--+
only showing top 20 rows


So as you see spark can handle file larger than its memory well. :)

Thanks


rajat kumar wrote:

With autoscaling can have any numbers of executors.


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Executorlost failure

2022-04-07 Thread rajat kumar
With autoscaling can have any numbers of executors.

Thanks

On Fri, Apr 8, 2022, 08:27 Wes Peng  wrote:

> I once had a file which is 100+GB getting computed in 3 nodes, each node
> has 24GB memory only. And the job could be done well. So from my
> experience spark cluster seems to work correctly for big files larger
> than memory by swapping them to disk.
>
> Thanks
>
> rajat kumar wrote:
> > Tested this with executors of size 5 cores, 17GB memory. Data vol is
> > really high around 1TB
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Executorlost failure

2022-04-07 Thread Wes Peng
I once had a file which is 100+GB getting computed in 3 nodes, each node 
has 24GB memory only. And the job could be done well. So from my 
experience spark cluster seems to work correctly for big files larger 
than memory by swapping them to disk.


Thanks

rajat kumar wrote:
Tested this with executors of size 5 cores, 17GB memory. Data vol is 
really high around 1TB


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Executorlost failure

2022-04-07 Thread Wes Peng

how many executors do you have?

rajat kumar wrote:
Tested this with executors of size 5 cores, 17GB memory. Data vol is 
really high around 1TB


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Executorlost failure

2022-04-07 Thread rajat kumar
Tested this with executors of size 5 cores, 17GB memory. Data vol is really
high around 1TB

Thanks
Rajat

On Thu, Apr 7, 2022, 23:43 rajat kumar  wrote:

> Hello Users,
>
> I got following error, tried increasing executor memory and memory
> overhead that also did not help .
>
> ExecutorLost Failure(executor1 exited caused by one of the following
> tasks) Reason: container from a bad node:
>
> java.lang.OutOfMemoryError: enough memory for aggregation
>
>
> Can someone please suggest ?
>
> Thanks
> Rajat
>


Executorlost failure

2022-04-07 Thread rajat kumar
Hello Users,

I got following error, tried increasing executor memory and memory overhead
that also did not help .

ExecutorLost Failure(executor1 exited caused by one of the following tasks)
Reason: container from a bad node:

java.lang.OutOfMemoryError: enough memory for aggregation


Can someone please suggest ?

Thanks
Rajat