Re: Bulk insert strategy

Ashrafuzzaman Sun, 08 Mar 2015 12:04:24 -0700

Yes so that brings me to another question. How do I do a batch insert from
worker?
In prod we are planning to put a 3 shared kinesis. So the number of
partitions should be 3. Right?
On Mar 8, 2015 8:57 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:


> What's the expected number of partitions in your use case ?
>
> Have you thought of doing batching in the workers ?
>
> Cheers
>
> On Sat, Mar 7, 2015 at 10:54 PM, A.K.M. Ashrafuzzaman <
> ashrafuzzaman...@gmail.com> wrote:
>
>> While processing DStream in the Spark Programming Guide, the suggested
>> usage of connection is the following,
>>
>> dstream.foreachRDD(rdd => {
>>       rdd.foreachPartition(partitionOfRecords => {
>>           // ConnectionPool is a static, lazily initialized pool of 
>> connections
>>           val connection = ConnectionPool.getConnection()
>>           partitionOfRecords.foreach(record => connection.send(record))
>>           ConnectionPool.returnConnection(connection)  // return to the pool 
>> for future reuse
>>       })
>>   })
>>
>>
>> In this case processing and the insertion is done in the workers. There,
>> we don’t use batch insert in db. How about this use case, where we can
>> process(parse string JSON to obj) and send back those objects to master and
>> then send a bulk insert request. Is there any benefit for sending
>> individually using connection pool vs use of bulk operation in the master?
>>
>> A.K.M. Ashrafuzzaman
>> Lead Software Engineer
>> NewsCred <http://www.newscred.com/>
>>
>> (M) 880-175-5592433
>> Twitter <https://twitter.com/ashrafuzzaman> | Blog
>> <http://jitu-blog.blogspot.com/> | Facebook
>> <https://www.facebook.com/ashrafuzzaman.jitu>
>>
>> Check out The Academy <http://newscred.com/theacademy>, your #1 source
>> for free content marketing resources
>>
>>
>

Re: Bulk insert strategy

Reply via email to