I tried increasing the batch size (1000 to 10,000 to 100,000) but it didn't appear to make any appreciable difference in my test case.
In addition I had read in the Oracle JDBC documentation that batches should be set between 10 and 100 and anything out of that range was not advisable. However, I don't have any evidence to prove or disprove that. On 21 Apr 2016 6:16 am, "Takeshi Yamamuro" <linguin....@gmail.com> wrote: > Sorry to wrongly send message in mid. > How about trying to increate 'batchsize` in a jdbc option to improve > performance? > > // maropu > > On Thu, Apr 21, 2016 at 2:15 PM, Takeshi Yamamuro <linguin....@gmail.com> > wrote: > >> Hi, >> >> How about trying to increate 'batchsize >> >> On Wed, Apr 20, 2016 at 7:14 AM, Jonathan Gray <jonny.g...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I'm trying to write ~60 million rows from a DataFrame to a database >>> using JDBC using Spark 1.6.1, something similar to df.write().jdbc(...) >>> >>> The write seems to not be performing well. Profiling the application >>> with a master of local[*] it appears there is not much socket write >>> activity and also not much CPU. >>> >>> I would expect there to be an almost continuous block of socket write >>> activity showing up somewhere in the profile. >>> >>> I can see that the top hot method involves >>> apache.spark.unsafe.platform.CopyMemory all from calls within >>> JdbcUtils.savePartition(...). However, the CPU doesn't seem particularly >>> stressed so I'm guessing this isn't the cause of the problem. >>> >>> Is there any best practices or has anyone come across a case like this >>> before where a write to a database seems to perform poorly? >>> >>> Thanks, >>> Jon >>> >> >> >> >> -- >> --- >> Takeshi Yamamuro >> > > > > -- > --- > Takeshi Yamamuro >