Could be lots of things. Implementations change, caching may have changed, etc. The size of the input doesn't really directly translate to heap usage. Here you just need a bit more memory.
On Mon, Jul 29, 2019 at 9:03 AM Dhrubajyoti Hati <dhruba.w...@gmail.com> wrote: > > Hi Sean, > > Yeah I checked the heap, its almost full. I checked the GC logs in the > executors where I found that GC cycles are kicking in frequently. The > Executors tab shows red in the "Total Time/GC Time". > > Also the data which I am dealing with is quite small(~4 GB) and the cluster > is quite big for that high GC. > > But what's troubling me is this issue doesn't occur in Spark 2.2 at all. What > could be the reason behind such a behaviour? > > Regards, > Dhrub > > On Mon, Jul 29, 2019 at 6:45 PM Sean Owen <sro...@gmail.com> wrote: >> >> -dev@ >> >> Yep, high GC activity means '(almost) out of memory'. I don't see that >> you've checked heap usage - is it nearly full? >> The answer isn't tuning but more heap. >> (Sometimes with really big heaps the problem is big pauses, but that's >> not the case here.) >> >> On Mon, Jul 29, 2019 at 1:26 AM Dhrubajyoti Hati <dhruba.w...@gmail.com> >> wrote: >> > >> > Hi, >> > >> > We were running Logistic Regression in Spark 2.2.X and then we tried to >> > see how does it do in Spark 2.3.X. Now we are facing an issue while >> > running a Logistic Regression Model in Spark 2.3.X on top of >> > Yarn(GCP-Dataproc). In the TreeAggregate method it takes a huge time due >> > to very High GC Activity. I have tuned the GC, created different sized >> > clusters, higher spark version(2.4.X), smaller data but nothing helps. The >> > GC time is 100 - 1000 times of the processing time in avg for iterations. >> > >> > The strange part is in Spark 2.2 this doesn't happen at all. Same code, >> > same cluster sizing, same data in both the cases. >> > >> > I was wondering if someone can explain this behaviour and help me to >> > resolve this. How can the same code has so different behaviour in two >> > Spark version, especially the higher ones? >> > >> > Here are the config which I used: >> > >> > >> > spark.serializer=org.apache.spark.serializer.KryoSerializer >> > >> > #GC Tuning >> > >> > spark.executor.extraJavaOptions= -XX:+UseG1GC -XX:+PrintFlagsFinal >> > -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails >> > -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy >> > -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -Xms9000m >> > -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 >> > >> > >> > spark.executor.instances=20 >> > >> > spark.executor.cores=1 >> > >> > spark.executor.memory=9010m >> > >> > >> > >> > Regards, >> > Dhrub >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org