Yes so that brings me to another question. How do I do a batch insert from worker? In prod we are planning to put a 3 shared kinesis. So the number of partitions should be 3. Right? On Mar 8, 2015 8:57 PM, "Ted Yu" <yuzhih...@gmail.com> wrote:
> What's the expected number of partitions in your use case ? > > Have you thought of doing batching in the workers ? > > Cheers > > On Sat, Mar 7, 2015 at 10:54 PM, A.K.M. Ashrafuzzaman < > ashrafuzzaman...@gmail.com> wrote: > >> While processing DStream in the Spark Programming Guide, the suggested >> usage of connection is the following, >> >> dstream.foreachRDD(rdd => { >> rdd.foreachPartition(partitionOfRecords => { >> // ConnectionPool is a static, lazily initialized pool of >> connections >> val connection = ConnectionPool.getConnection() >> partitionOfRecords.foreach(record => connection.send(record)) >> ConnectionPool.returnConnection(connection) // return to the pool >> for future reuse >> }) >> }) >> >> >> In this case processing and the insertion is done in the workers. There, >> we don’t use batch insert in db. How about this use case, where we can >> process(parse string JSON to obj) and send back those objects to master and >> then send a bulk insert request. Is there any benefit for sending >> individually using connection pool vs use of bulk operation in the master? >> >> A.K.M. Ashrafuzzaman >> Lead Software Engineer >> NewsCred <http://www.newscred.com/> >> >> (M) 880-175-5592433 >> Twitter <https://twitter.com/ashrafuzzaman> | Blog >> <http://jitu-blog.blogspot.com/> | Facebook >> <https://www.facebook.com/ashrafuzzaman.jitu> >> >> Check out The Academy <http://newscred.com/theacademy>, your #1 source >> for free content marketing resources >> >> >