Re: Spark 1.6.1 DataFrame write to JDBC

Jonathan Gray Thu, 21 Apr 2016 07:29:53 -0700

I tried increasing the batch size (1000 to 10,000 to 100,000) but it didn't
appear to make any appreciable difference in my test case.


In addition I had read in the Oracle JDBC documentation that batches should
be set between 10 and 100 and anything out of that range was not advisable.
However, I don't have any evidence to prove or disprove that.
On 21 Apr 2016 6:16 am, "Takeshi Yamamuro" <linguin....@gmail.com> wrote:

> Sorry to wrongly send message in mid.
> How about trying to increate 'batchsize` in a jdbc option to improve
> performance?
>
> // maropu
>
> On Thu, Apr 21, 2016 at 2:15 PM, Takeshi Yamamuro <linguin....@gmail.com>
> wrote:
>
>> Hi,
>>
>> How about trying to increate 'batchsize
>>
>> On Wed, Apr 20, 2016 at 7:14 AM, Jonathan Gray <jonny.g...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to write ~60 million rows from a DataFrame to a database
>>> using JDBC using Spark 1.6.1, something similar to df.write().jdbc(...)
>>>
>>> The write seems to not be performing well.  Profiling the application
>>> with a master of local[*] it appears there is not much socket write
>>> activity and also not much CPU.
>>>
>>> I would expect there to be an almost continuous block of socket write
>>> activity showing up somewhere in the profile.
>>>
>>> I can see that the top hot method involves
>>> apache.spark.unsafe.platform.CopyMemory all from calls within
>>> JdbcUtils.savePartition(...).  However, the CPU doesn't seem particularly
>>> stressed so I'm guessing this isn't the cause of the problem.
>>>
>>> Is there any best practices or has anyone come across a case like this
>>> before where a write to a database seems to perform poorly?
>>>
>>> Thanks,
>>> Jon
>>>
>>
>>
>>
>> --
>> ---
>> Takeshi Yamamuro
>>
>
>
>
> --
> ---
> Takeshi Yamamuro
>

Re: Spark 1.6.1 DataFrame write to JDBC

Reply via email to