Re: Executorlost failure
I just did a test, even for a single node (local deployment), spark can handle the data whose size is much larger than the total memory. My test VM (2g ram, 2 cores): $ free -m totalusedfree shared buff/cache available Mem: 19921845 92 19 54 36 Swap: 1023 285 738 The data size: $ du -h rate.csv 3.2Grate.csv Loading this file into spark for calculation can be done without error: scala> val df = spark.read.format("csv").option("inferSchema", true).load("skydrive/rate.csv") val df: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string ... 2 more fields] scala> df.printSchema warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` root |-- _c0: string (nullable = true) |-- _c1: string (nullable = true) |-- _c2: double (nullable = true) |-- _c3: integer (nullable = true) scala> df.groupBy("_c1").agg(avg("_c2").alias("avg_rating")).orderBy(desc("avg_rating")).show warning: 1 deprecation (since 2.13.3); for details, enable `:setting -deprecation` or `:replay -deprecation` +--+--+ | _c1|avg_rating| +--+--+ |000136| 5.0| |0001711474| 5.0| |0001360779| 5.0| |0001006657| 5.0| |0001361155| 5.0| |0001018043| 5.0| |000136118X| 5.0| |202010| 5.0| |0001371037| 5.0| |401048| 5.0| |0001371045| 5.0| |0001203010| 5.0| |0001381245| 5.0| |0001048236| 5.0| |0001436163| 5.0| |000104897X| 5.0| |0001437879| 5.0| |0001056107| 5.0| |0001468685| 5.0| |0001061240| 5.0| +--+--+ only showing top 20 rows So as you see spark can handle file larger than its memory well. :) Thanks rajat kumar wrote: With autoscaling can have any numbers of executors. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Executorlost failure
With autoscaling can have any numbers of executors. Thanks On Fri, Apr 8, 2022, 08:27 Wes Peng wrote: > I once had a file which is 100+GB getting computed in 3 nodes, each node > has 24GB memory only. And the job could be done well. So from my > experience spark cluster seems to work correctly for big files larger > than memory by swapping them to disk. > > Thanks > > rajat kumar wrote: > > Tested this with executors of size 5 cores, 17GB memory. Data vol is > > really high around 1TB > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >
Re: Executorlost failure
I once had a file which is 100+GB getting computed in 3 nodes, each node has 24GB memory only. And the job could be done well. So from my experience spark cluster seems to work correctly for big files larger than memory by swapping them to disk. Thanks rajat kumar wrote: Tested this with executors of size 5 cores, 17GB memory. Data vol is really high around 1TB - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Executorlost failure
how many executors do you have? rajat kumar wrote: Tested this with executors of size 5 cores, 17GB memory. Data vol is really high around 1TB - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Executorlost failure
Tested this with executors of size 5 cores, 17GB memory. Data vol is really high around 1TB Thanks Rajat On Thu, Apr 7, 2022, 23:43 rajat kumar wrote: > Hello Users, > > I got following error, tried increasing executor memory and memory > overhead that also did not help . > > ExecutorLost Failure(executor1 exited caused by one of the following > tasks) Reason: container from a bad node: > > java.lang.OutOfMemoryError: enough memory for aggregation > > > Can someone please suggest ? > > Thanks > Rajat >
Executorlost failure
Hello Users, I got following error, tried increasing executor memory and memory overhead that also did not help . ExecutorLost Failure(executor1 exited caused by one of the following tasks) Reason: container from a bad node: java.lang.OutOfMemoryError: enough memory for aggregation Can someone please suggest ? Thanks Rajat