Re: Spark Streaming: Custom Receiver OOM consistently

2017-05-23 Thread Manish Malhotra
inished or after few batches receiver and worker nodes should discard the old data ? > > > > On Mon, May 22, 2017 at 5:20 PM, Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > >> thanks Alonso, >> >> Sorry, but there are some security reservation

Re: Spark Streaming: Custom Receiver OOM consistently

2017-05-22 Thread Manish Malhotra
o Isidoro Roman > [image: https://]about.me/alonso.isidoro.roman > > <https://about.me/alonso.isidoro.roman?promo=email_sig_source=email_sig_medium=email_sig_campaign=external_links> > > 2017-05-20 7:54 GMT+02:00 Manish Malhotra <manish.malhotra.w...@gmail.com> > : > &g

Spark Streaming: Custom Receiver OOM consistently

2017-05-19 Thread Manish Malhotra
Hello, have implemented Java based custom receiver, which consumes from messaging system say JMS. once received message, I call store(object) ... Im storing spark Row object. it run for around 8 hrs, and then goes OOM, and OOM is happening in receiver nodes. I also tried to run multiple

Re: [Spark Streamiing] Streaming job failing consistently after 1h

2017-05-19 Thread Manish Malhotra
Im also facing same problem. I have implemented Java based custom receiver, which consumes from messaging system say JMS. once received message, I call store(object) ... Im storing spark Row object. it run for around 8 hrs, and then goes OOM, and OOM is happening in receiver nodes. I also tried

Re: RDD getPartitions() size and HashPartitioner numPartitions

2016-12-04 Thread Manish Malhotra
Its a pretty nice question ! I'll trying to understand the problem, and see can help further. When you say CustomRDD I believe you will using it in the transformation stage, once the data is read from a file source like HDFS or Cassandra or Kafka. Now the RDD.getPartitions() should return the

Re: What benefits do we really get out of colocation?

2016-12-03 Thread Manish Malhotra
thanks for sharing number as well ! Now a days even network can be with very high throughput, and might out perform the disk, but as Sean mentioned data on network will have other dependencies like network hops, like if its across rack, which can have switch in between. But yes people are

Re: Spark Streaming: question on sticky session across batches ?

2016-11-15 Thread Manish Malhotra
RDDs to control the partition where > edge data are. > > // maropu > > > On Tue, Nov 15, 2016 at 5:19 AM, Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > > sending again. > any help is appreciated ! > > thanks in advance. > > On Thu, Nov

Re: Spark Streaming: question on sticky session across batches ?

2016-11-14 Thread Manish Malhotra
sending again. any help is appreciated ! thanks in advance. On Thu, Nov 10, 2016 at 8:42 AM, Manish Malhotra < manish.malhotra.w...@gmail.com> wrote: > Hello Spark Devs/Users, > > Im trying to solve the use case with Spark Streaming 1.6.2 where for every > batch ( say 2 mins

Spark Streaming: question on sticky session across batches ?

2016-11-10 Thread Manish Malhotra
Hello Spark Devs/Users, Im trying to solve the use case with Spark Streaming 1.6.2 where for every batch ( say 2 mins) data needs to go to the same reducer node after grouping by key. The underlying storage is Cassandra and not HDFS. This is a map-reduce job, where also trying to use the