Re: Spark Streaming: question on sticky session across batches ?

2016-11-15 Thread Manish Malhotra
Thanks! On Tue, Nov 15, 2016 at 1:07 AM Takeshi Yamamuro wrote: > - dev > > Hi, > > AFAIK, if you use RDDs only, you can control the partition mapping to some > extent > by using a partition key RDD[(key, data)]. > A defined partitioner distributes data into partitions

Re: Spark Streaming: question on sticky session across batches ?

2016-11-15 Thread Takeshi Yamamuro
- dev Hi, AFAIK, if you use RDDs only, you can control the partition mapping to some extent by using a partition key RDD[(key, data)]. A defined partitioner distributes data into partitions depending on the key. As a good example to control partitions, you can see the GraphX code;

Re: Spark Streaming: question on sticky session across batches ?

2016-11-14 Thread Manish Malhotra
sending again. any help is appreciated ! thanks in advance. On Thu, Nov 10, 2016 at 8:42 AM, Manish Malhotra < manish.malhotra.w...@gmail.com> wrote: > Hello Spark Devs/Users, > > Im trying to solve the use case with Spark Streaming 1.6.2 where for every > batch ( say 2 mins) data needs to go

Spark Streaming: question on sticky session across batches ?

2016-11-10 Thread Manish Malhotra
Hello Spark Devs/Users, Im trying to solve the use case with Spark Streaming 1.6.2 where for every batch ( say 2 mins) data needs to go to the same reducer node after grouping by key. The underlying storage is Cassandra and not HDFS. This is a map-reduce job, where also trying to use the