Re: OutOfMemoryError

2021-07-06 Thread javaguy Java
021 at 1:43 PM Sean Owen wrote: > You need to set driver memory before the driver starts, on the CLI or > however you run your app, not in the app itself. By the time the driver > starts to run your app, its heap is already set. > > On Thu, Jul 1, 2021 at 12:10 AM javaguy Ja

OutOfMemoryError

2021-06-30 Thread javaguy Java
Hi, I'm getting Java OOM errors even though I'm setting my driver memory to 24g and I'm executing against local[*] I was wondering if anyone can give me any insight. The server this job is running on has more than enough memory as does the spark driver. The final result does write 3 csv files

Spark as an application server cache

2021-02-10 Thread javaguy Java
Hi, I was just curious if anyone has ever used Spark as an application server cache? My use case is: * I have large datasets which need to be updated / inserted (upsert) in the database * I have actually found that it is much easier to run a Spark submit job that pulls from the database, and

Re: A simple example that demonstrates that a Spark distributed cluster is faster than Spark Local Standalone

2020-09-25 Thread javaguy Java
//codait.github.io/spark-bench/ to > generate large workloads. > > On Fri, Sep 25, 2020 at 1:03 AM javaguy Java wrote: > > > > Hi Sean, > > > > Thanks for your reply. > > > > I understand distribution and parallelism very well and have used it > with other prod

Re: A simple example that demonstrates that a Spark distributed cluster is faster than Spark Local Standalone

2020-09-25 Thread javaguy Java
dup in this problem > until you hit more scale or modify the job to distribute a little > better, etc. > > On Thu, Sep 24, 2020 at 1:43 PM javaguy Java wrote: > > > > Hi, > > > > I made a post on stackoverflow that I can't seem to make any headway on > > &g

A simple example that demonstrates that a Spark distributed cluster is faster than Spark Local Standalone

2020-09-24 Thread javaguy Java
Hi, I made a post on stackoverflow that I can't seem to make any headway on https://stackoverflow.com/questions/63834379/spark-performance-local-faster-than-cluster Before someone starts making suggestions on changing the code; note that the code and example on the above post is from a Udemy