Re: Implementing circuit breaker pattern in Spark

Sean Owen Wed, 16 Feb 2022 07:32:07 -0800

There's nothing wrong with calling microservices this way. Something needs
to call the service with all the data arriving, and Spark is fine for
executing arbitrary logic including this kind of thing.
Kafka does not change that?


On Wed, Feb 16, 2022 at 9:24 AM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
> once again, just trying to understand the problem first.
>
> Why are we using SPARK to place calls to micro services? There are several
> reasons why this should never happen, including costs/ security/
> scalability concerns, etc.
>
> Is there a way that you can create a producer and put the data into Kafka
> first?
>
> Sorry, I am not suggesting any solutions, just trying to understand the
> problem first.
>
>
> Regards,
> Gourav
>
>
>
> On Wed, Feb 16, 2022 at 2:36 PM S <sheelst...@gmail.com> wrote:
>
>> No I want the job to stop and end once it discovers on repeated retries
>> that the microservice is not responding. But I think I got where you were
>> going right after sending my previous mail. Basically repeatedly failing of
>> your tasks on retries ultimately fails your job anyway. So thats an
>> in-built circuit breaker. So what that essentially means is we should not
>> be catching those HTTP 5XX exceptions (which we currently do) and let the
>> tasks fail on their own only for spark to retry them for finite number of
>> times and then subsequently fail and thereby break the circuit. Thanks.
>>
>> On Wed, Feb 16, 2022 at 7:59 PM Sean Owen <sro...@gmail.com> wrote:
>>
>>> You stop the Spark job by tasks failing repeatedly, that's already how
>>> it works. You can't kill the driver from the executor other ways, but
>>> should not need to. I'm not clear, you're saying you want to stop the job,
>>> but also continue processing?
>>>
>>> On Wed, Feb 16, 2022 at 7:58 AM S <sheelst...@gmail.com> wrote:
>>>
>>>> Retries have been already implemented. The question is how to stop the
>>>> spark job by having an executor JVM send a signal to the driver JVM. e.g. I
>>>> have a microbatch of 30 messages; 10 in each of the 3 partitions. Let's say
>>>> while a partition of 10 messages was being processed, first 3 went through
>>>> but then the microservice went down. Now when the 4th message in the
>>>> partition is sent to the microservice it keeps receiving 5XX on every retry
>>>> e.g. 5 retries. What I now want is to have that task from that executor JVM
>>>> send a signal to the driver JVM to terminate the spark job on the failure
>>>> of the 5th retry. Currently, what we have in place is retrying it 5 times
>>>> and then upon failure i.e. 5XX catch the exception and move the message to
>>>> a DLQ thereby having the flatmap produce a *None* and proceed to the
>>>> next message in the partition of that microbatch. This approach keeps the
>>>> pipeline alive and keeps pushing messages to DLQ microbatch after
>>>> microbatch until the microservice is back up.
>>>>
>>>>
>>>> On Wed, Feb 16, 2022 at 6:50 PM Sean Owen <sro...@gmail.com> wrote:
>>>>
>>>>> You could use the same pattern in your flatMap function. If you want
>>>>> Spark to keep retrying though, you don't need any special logic, that is
>>>>> what it would do already. You could increase the number of task retries
>>>>> though; see the spark.excludeOnFailure.task.* configurations.
>>>>>
>>>>> You can just implement the circuit breaker pattern directly too,
>>>>> nothing special there, though I don't think that's what you want? you
>>>>> actually want to retry the failed attempts, not just avoid calling the
>>>>> microservice.
>>>>>
>>>>> On Wed, Feb 16, 2022 at 3:18 AM S <sheelst...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> We have a spark job that calls a microservice in the lambda function
>>>>>> of the flatmap transformation  -> passes to this microservice, the 
>>>>>> inbound
>>>>>> element in the lambda function and returns the transformed value or 
>>>>>> "None"
>>>>>> from the microservice as an output of this flatMap transform. Of course 
>>>>>> the
>>>>>> lambda also takes care of exceptions from the microservice etc.. The
>>>>>> question is: there are times when the microservice may be down and there 
>>>>>> is
>>>>>> no point recording an exception and putting the message in the DLQ for
>>>>>> every element in our streaming pipeline so long as the microservice stays
>>>>>> down. Instead we want to be able to do is retry the microservice call 
>>>>>> for a
>>>>>> given event for a predefined no. of times and if found to be down then
>>>>>> terminate the spark job so that this current microbatch is terminated and
>>>>>> there is no next microbatch and the rest of the messages continue 
>>>>>> therefore
>>>>>> continue to be in the source kafka topics unpolled and therefore
>>>>>> unprocesseed.  until the microservice is back up and the spark job is
>>>>>> redeployed again. In regular microservices, we can implement this using 
>>>>>> the
>>>>>> Circuit breaker pattern. In Spark jobs however this would mean, being 
>>>>>> able
>>>>>> to somehow send a signal from an executor JVM to the driver JVM to
>>>>>> terminate the Spark job. Is there a way to do that in Spark?
>>>>>>
>>>>>> P.S.:
>>>>>> - Having the circuit breaker functionality helps specificize the
>>>>>> purpose of the DLQ to data or schema issues only instead of infra/network
>>>>>> related issues.
>>>>>> - As far as the need for the Spark job to use microservices is
>>>>>> concerned, think of it as a complex logic being maintained in a
>>>>>> microservice that does not warrant duplication.
>>>>>> - checkpointing is being taken care of manually and not using spark's
>>>>>> default checkpointing mechanism.
>>>>>>
>>>>>> Regards,
>>>>>> Sheel
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>>
>>>> Sheel Pancholi
>>>>
>>>
>>
>> --
>>
>> Best Regards,
>>
>> Sheel Pancholi
>>
>

Re: Implementing circuit breaker pattern in Spark

Reply via email to