Spark 2.0.0 Error Caused by: java.lang.IllegalArgumentException: requirement failed: Block broadcast_21_piece0 is already present in the MemoryStore

2016-10-11 Thread sandesh deshmane
I am getting this error some times when I run pyspark with spark 2.0.0 App > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) App > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) App > at java.lang.reflect.Method.invoke(Meth

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
> Exactly once, it needed requires in any system including spark more >> effort and usually the throughput is lower. A risk evaluation from a >> business point of view has to be done anyway... >> >> > On 22 Jun 2016, at 09:09, sandesh deshmane >> wrote: >> &

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
t needed requires in any system including spark more effort > and usually the throughput is lower. A risk evaluation from a business > point of view has to be done anyway... > > > On 22 Jun 2016, at 09:09, sandesh deshmane > wrote: > > > > Hi, > > > > I am writ

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
them processed, but at run time i need to do that lookup and for us , the number of messages is very high, so look up will ad up in processing time ? Thanks Sandesh Deshmane On Wed, Jun 22, 2016 at 2:36 PM, Mich Talebzadeh wrote: > Yes this is more of Kafka issue as Kafka send the messag

Re: how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 22 June 2016 at 08:09, sandesh deshma

how to avoid duplicate messages with spark streaming using checkpoint after restart in case of failure

2016-06-22 Thread sandesh deshmane
Hi, I am writing spark streaming application which reads messages from Kafka. I am using checkpointing and write ahead logs ( WAL) to achieve fault tolerance . I have created batch size of 10 sec for reading messages from kafka. I read messages for kakfa and generate the count of messages as pe

Re: Error while using checkpointing . Spark streaming 1.5.2- DStream checkpointing has been enabled but the DStreams with their functions are not serialisable

2016-06-09 Thread sandesh deshmane
ining the myFunction inside the Function and see if the problem persists. > > On Thu, Jun 9, 2016 at 3:57 AM, sandesh deshmane > wrote: > >> Hi, >> >> I am using spark streaming for streaming data from kafka 0.8 >> >> I am using checkpointing in HDFS . I am

Error while using checkpointing . Spark streaming 1.5.2- DStream checkpointing has been enabled but the DStreams with their functions are not serialisable

2016-06-09 Thread sandesh deshmane
Hi, I am using spark streaming for streaming data from kafka 0.8 I am using checkpointing in HDFS . I am getting error like below java.io.NotSerializableException: DStream checkpointing has been enabled but the DStreams with their functions are not serialisable field (class: org.apache.spark.st

Re: Error while deploying spark 1.6.1 on EC2

2016-03-14 Thread sandesh deshmane
> I am trying to install spark on EC2. > > I am getting below error. I had issues like RPC timeout and Fetchtimeout > for spark 1.6.0 so as per release notes was trying to get new cluster with > 1.6.1 > > Can you help? looks like spark 1.6.1 package is missing from s3. > > [timing] scala init: 00h