Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-09-15 Thread Rajini Sivaram
Hi Ramkumar,

This is being fixed for 1.0.0.

a) Retries will be removed for authentication failures.
b) With the current behaviour, retries do add load on the broker as new
connections are established for retries. There is a backoff interval
between connections to reduce the impact.

Regards,

Rajini

On Fri, Sep 15, 2017 at 3:31 PM, SEMBAIYAN, RAMKUMAR  wrote:

> Hi,
>
> Can you pls let me know if this is resolved or any work around is there. I
> am using Kafka 0.11.0.1 version.
>
>
>
> a.   When incorrect credentials are sent the publisher or consumer
> API, logs below warning error , but keeps retrying to broker with out
> disconnecting.
>
> b.  We are using kafka as middleware application for multiple pubs and
> subs. So if any one use wrong password , the broker will also be busy  and
> see as performance issue, right?
>
>
> [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.NetworkClient
> - Connection to node -1 terminated during authentication. This may indicate
> that authentication failed due to invalid credentials.
> [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.NetworkClient
> - Connection to node -1 terminated during authentication. This may indicate
> that authentication failed due to invalid credentials.
> [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.NetworkClient
> - Connection to node -1 terminated during authentication. This may indicate
> that authentication failed due to invalid credentials.
> [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.NetworkClient
> - Connection to node -1 terminated during authentication. This may indicate
> that authentication failed due to invalid credentials.
> [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.NetworkClient
> - Connection to node -1 terminated during authentication. This may indicate
> that authentication failed due to invalid credentials.
> [kafka-producer-network-thread | producer-1] WARN 
> org.apache.kafka.clients.NetworkClient
> - Connection to node -1 terminated during authentication. This may indicate
> that authentication failed due to invalid credentials.
>
>
>
>
>
> Any inputs will be helpful.
>
>
>
> Thanks,
>
> Ramkumar
>
>
>
>
>
>
>
> On 2017-05-04 07:37, Rajini Sivaram >
> wrote:
>
> > Hi all,>
>
> >
>
> > I have created a KIP to improve diagnostics for SASL authentication>
>
> > failures and reduce retries and blocking when authentication fails:>
>
> >
>
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 152+-+Improve+diagnostics+for+SASL+authentication+failures>
>
> >
>
> > Comments and suggestions are welcome.>
>
> >
>
> > Thank you...>
>
> >
>
> > Regards,>
>
> >
>
> > Rajini>
>
> >
>


Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-09-15 Thread SEMBAIYAN, RAMKUMAR
Hi,

Can you pls let me know if this is resolved or any work around is there. I am 
using Kafka 0.11.0.1 version.



a.   When incorrect credentials are sent the publisher or consumer API, 
logs below warning error , but keeps retrying to broker with out disconnecting.

b.  We are using kafka as middleware application for multiple pubs and 
subs. So if any one use wrong password , the broker will also be busy  and see 
as performance issue, right?


[kafka-producer-network-thread | producer-1] WARN 
org.apache.kafka.clients.NetworkClient - Connection to node -1 terminated 
during authentication. This may indicate that authentication failed due to 
invalid credentials.
[kafka-producer-network-thread | producer-1] WARN 
org.apache.kafka.clients.NetworkClient - Connection to node -1 terminated 
during authentication. This may indicate that authentication failed due to 
invalid credentials.
[kafka-producer-network-thread | producer-1] WARN 
org.apache.kafka.clients.NetworkClient - Connection to node -1 terminated 
during authentication. This may indicate that authentication failed due to 
invalid credentials.
[kafka-producer-network-thread | producer-1] WARN 
org.apache.kafka.clients.NetworkClient - Connection to node -1 terminated 
during authentication. This may indicate that authentication failed due to 
invalid credentials.
[kafka-producer-network-thread | producer-1] WARN 
org.apache.kafka.clients.NetworkClient - Connection to node -1 terminated 
during authentication. This may indicate that authentication failed due to 
invalid credentials.
[kafka-producer-network-thread | producer-1] WARN 
org.apache.kafka.clients.NetworkClient - Connection to node -1 terminated 
during authentication. This may indicate that authentication failed due to 
invalid credentials.





Any inputs will be helpful.



Thanks,

Ramkumar







On 2017-05-04 07:37, Rajini Sivaram > 
wrote:

> Hi all,>

>

> I have created a KIP to improve diagnostics for SASL authentication>

> failures and reduce retries and blocking when authentication fails:>

>

> https://cwiki.apache.org/confluence/display/KAFKA/KIP-152+-+Improve+diagnostics+for+SASL+authentication+failures>

>

> Comments and suggestions are welcome.>

>

> Thank you...>

>

> Regards,>

>

> Rajini>

>


Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-22 Thread Rajini Sivaram
Hi all,

If there are no more comments, I will start vote on this KIP tomorrow.
Please let me know if there are any concerns.

Thanks,

Rajini

On Mon, Aug 21, 2017 at 9:24 AM, Rajini Sivaram 
wrote:

> Hi Jun,
>
> Thank you for the review.
>
> 1. Each token response will indicate success/failure. At the moment, the
> broker simply closes the connection in the case of failure. With this KIP,
> an empty token with an error_code indicating authentication failure will be
> sent by the broker before closing the connection. I have made this clearer
> in the KIP.
>
> 2. I have added a section on versioning of SaslAuthenticate and
> SaslHandshake requests.
>
>
> On Fri, Aug 18, 2017 at 2:47 PM, Jun Rao  wrote:
>
>> Hi, Rajini,
>>
>> Thanks for the KIP. Looks good overall. Just a couple of minor comments.
>>
>> 1. "The final message from the broker will indicate if authentication
>> succeeded or failed." Are we doing something special for the final
>> message?
>> I thought each token response will indicate success or failure in the
>> error
>> code.
>>
>> 2. It would be useful to document how we plan to evolve
>> SaslAuthenticateRequest
>> in the future. For example, if we add a new error code in
>> SaslAuthenticateResponse,
>> should we bump up the version of SaslAuthenticateRequest? When do we need
>> to bump up SaslHandshakeRequest?
>>
>> Jun
>>
>>
>> On Fri, Aug 18, 2017 at 10:59 AM, Ismael Juma  wrote:
>>
>> > Thanks Rajini. The changes look good. We have to update the motivation
>> to
>> > take into account the improvements in 0.11:
>> >
>> > "With the current Kafka SASL implementation, broker closes the client
>> > connection if SASL authentication fails. Clients see this as a
>> connection
>> > failure and do not get any feedback for the reason why the connection
>> was
>> > closed. Producers and consumers retry, attempting to create successful
>> > connections, treating authentication failures as transient failures.
>> There
>> > are no log entries on the client-side to indicate that any of these
>> > connection failures were due to authentication failure."
>> >
>> > 0.11.0.x logs a warning with a hint that the disconnect is likely
>> related
>> > to authentication.
>> >
>> > Ismael
>> >
>> > On Tue, Aug 15, 2017 at 9:23 PM, Rajini Sivaram <
>> rajinisiva...@gmail.com>
>> > wrote:
>> >
>> > > I have updated the KIP based on the discussions so far. It will be
>> good
>> > if
>> > > we can get some more feedback so that this can be implemented for
>> 1.0.0.
>> > >
>> > >
>> > > Thanks,
>> > >
>> > > Rajini
>> > >
>> > >
>> > > On Thu, May 4, 2017 at 10:22 PM, Ismael Juma 
>> wrote:
>> > >
>> > > > Hi Rajini,
>> > > >
>> > > > I think we were talking about slightly different things. I was just
>> > > > referring to the fact that there are cases where we throw an
>> > > > AuthorizationException back to the user without retrying from
>> various
>> > > > methods (poll, commitSync, etc).
>> > > >
>> > > > As you said, my initial preference was for not retrying at all
>> because
>> > it
>> > > > is what you want in the common case of a misconfigured application.
>> I
>> > > > hadn't considered credential updates for authenticators that rely on
>> > > > eventual consistency. Thinking about it some more, it seems like
>> this
>> > > > should be solved by the authenticator implementation as well. For
>> > > example,
>> > > > it could refresh the cached data for a user if authentication
>> failed (a
>> > > > good implementation would be a bit more involved to avoid going to
>> the
>> > > > underlying data source too often).
>> > > >
>> > > > Given that, not retrying sounds good to me.
>> > > >
>> > > > Ismael
>> > > >
>> > > > On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram <
>> > rajinisiva...@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Hi Ismael,
>> > > > >
>> > > > > I thought the blocking waits in the producer and consumer are
>> always
>> > > > > related to retrying for metadata. So an authorization exception
>> that
>> > > > > impacts this wait can only be due to Describe authorization
>> failure -
>> > > > that
>> > > > > always retries?
>> > > > >
>> > > > > I agree that connecting to different brokers when authentication
>> > fails
>> > > > with
>> > > > > one is not desirable. But I am not keen on retrying with a
>> suitable
>> > > > backoff
>> > > > > until timeout either. Because that has the same problem as the
>> > scenario
>> > > > > that you described. The next metadata request could be to
>> broker-1 to
>> > > > which
>> > > > > authentication succeeds and subsequent produce/consume  to
>> broker-0
>> > > could
>> > > > > still fail.
>> > > > >
>> > > > > How about we just fail fast if one authentication fails - I think
>> > that
>> > > is
>> > > > > what you were suggesting in the first place? We don't need to
>> > blackout
>> > > > any
>> > > > > nodes beyond the reconnect backoff interval. Applications can
>> 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-21 Thread Rajini Sivaram
Hi Jun,

Thank you for the review.

1. Each token response will indicate success/failure. At the moment, the
broker simply closes the connection in the case of failure. With this KIP,
an empty token with an error_code indicating authentication failure will be
sent by the broker before closing the connection. I have made this clearer
in the KIP.

2. I have added a section on versioning of SaslAuthenticate and
SaslHandshake requests.


On Fri, Aug 18, 2017 at 2:47 PM, Jun Rao  wrote:

> Hi, Rajini,
>
> Thanks for the KIP. Looks good overall. Just a couple of minor comments.
>
> 1. "The final message from the broker will indicate if authentication
> succeeded or failed." Are we doing something special for the final message?
> I thought each token response will indicate success or failure in the error
> code.
>
> 2. It would be useful to document how we plan to evolve
> SaslAuthenticateRequest
> in the future. For example, if we add a new error code in
> SaslAuthenticateResponse,
> should we bump up the version of SaslAuthenticateRequest? When do we need
> to bump up SaslHandshakeRequest?
>
> Jun
>
>
> On Fri, Aug 18, 2017 at 10:59 AM, Ismael Juma  wrote:
>
> > Thanks Rajini. The changes look good. We have to update the motivation to
> > take into account the improvements in 0.11:
> >
> > "With the current Kafka SASL implementation, broker closes the client
> > connection if SASL authentication fails. Clients see this as a connection
> > failure and do not get any feedback for the reason why the connection was
> > closed. Producers and consumers retry, attempting to create successful
> > connections, treating authentication failures as transient failures.
> There
> > are no log entries on the client-side to indicate that any of these
> > connection failures were due to authentication failure."
> >
> > 0.11.0.x logs a warning with a hint that the disconnect is likely related
> > to authentication.
> >
> > Ismael
> >
> > On Tue, Aug 15, 2017 at 9:23 PM, Rajini Sivaram  >
> > wrote:
> >
> > > I have updated the KIP based on the discussions so far. It will be good
> > if
> > > we can get some more feedback so that this can be implemented for
> 1.0.0.
> > >
> > >
> > > Thanks,
> > >
> > > Rajini
> > >
> > >
> > > On Thu, May 4, 2017 at 10:22 PM, Ismael Juma 
> wrote:
> > >
> > > > Hi Rajini,
> > > >
> > > > I think we were talking about slightly different things. I was just
> > > > referring to the fact that there are cases where we throw an
> > > > AuthorizationException back to the user without retrying from various
> > > > methods (poll, commitSync, etc).
> > > >
> > > > As you said, my initial preference was for not retrying at all
> because
> > it
> > > > is what you want in the common case of a misconfigured application. I
> > > > hadn't considered credential updates for authenticators that rely on
> > > > eventual consistency. Thinking about it some more, it seems like this
> > > > should be solved by the authenticator implementation as well. For
> > > example,
> > > > it could refresh the cached data for a user if authentication failed
> (a
> > > > good implementation would be a bit more involved to avoid going to
> the
> > > > underlying data source too often).
> > > >
> > > > Given that, not retrying sounds good to me.
> > > >
> > > > Ismael
> > > >
> > > > On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > I thought the blocking waits in the producer and consumer are
> always
> > > > > related to retrying for metadata. So an authorization exception
> that
> > > > > impacts this wait can only be due to Describe authorization
> failure -
> > > > that
> > > > > always retries?
> > > > >
> > > > > I agree that connecting to different brokers when authentication
> > fails
> > > > with
> > > > > one is not desirable. But I am not keen on retrying with a suitable
> > > > backoff
> > > > > until timeout either. Because that has the same problem as the
> > scenario
> > > > > that you described. The next metadata request could be to broker-1
> to
> > > > which
> > > > > authentication succeeds and subsequent produce/consume  to broker-0
> > > could
> > > > > still fail.
> > > > >
> > > > > How about we just fail fast if one authentication fails - I think
> > that
> > > is
> > > > > what you were suggesting in the first place? We don't need to
> > blackout
> > > > any
> > > > > nodes beyond the reconnect backoff interval. Applications can still
> > > retry
> > > > > if they want to. In the case of credential updates, it will be up
> to
> > > the
> > > > > application to retry. During regular operation, a misconfigured
> > > > application
> > > > > fails fast with a meaningful exception. What do you think?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > >
> > > > > On Thu, May 4, 2017 at 3:01 PM, Ismael Juma 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-21 Thread Rajini Sivaram
Ismael,

Thank you for the review. I have updated the motivation section.

On Fri, Aug 18, 2017 at 1:59 PM, Ismael Juma  wrote:

> Thanks Rajini. The changes look good. We have to update the motivation to
> take into account the improvements in 0.11:
>
> "With the current Kafka SASL implementation, broker closes the client
> connection if SASL authentication fails. Clients see this as a connection
> failure and do not get any feedback for the reason why the connection was
> closed. Producers and consumers retry, attempting to create successful
> connections, treating authentication failures as transient failures. There
> are no log entries on the client-side to indicate that any of these
> connection failures were due to authentication failure."
>
> 0.11.0.x logs a warning with a hint that the disconnect is likely related
> to authentication.
>
> Ismael
>
> On Tue, Aug 15, 2017 at 9:23 PM, Rajini Sivaram 
> wrote:
>
> > I have updated the KIP based on the discussions so far. It will be good
> if
> > we can get some more feedback so that this can be implemented for 1.0.0.
> >
> >
> > Thanks,
> >
> > Rajini
> >
> >
> > On Thu, May 4, 2017 at 10:22 PM, Ismael Juma  wrote:
> >
> > > Hi Rajini,
> > >
> > > I think we were talking about slightly different things. I was just
> > > referring to the fact that there are cases where we throw an
> > > AuthorizationException back to the user without retrying from various
> > > methods (poll, commitSync, etc).
> > >
> > > As you said, my initial preference was for not retrying at all because
> it
> > > is what you want in the common case of a misconfigured application. I
> > > hadn't considered credential updates for authenticators that rely on
> > > eventual consistency. Thinking about it some more, it seems like this
> > > should be solved by the authenticator implementation as well. For
> > example,
> > > it could refresh the cached data for a user if authentication failed (a
> > > good implementation would be a bit more involved to avoid going to the
> > > underlying data source too often).
> > >
> > > Given that, not retrying sounds good to me.
> > >
> > > Ismael
> > >
> > > On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > I thought the blocking waits in the producer and consumer are always
> > > > related to retrying for metadata. So an authorization exception that
> > > > impacts this wait can only be due to Describe authorization failure -
> > > that
> > > > always retries?
> > > >
> > > > I agree that connecting to different brokers when authentication
> fails
> > > with
> > > > one is not desirable. But I am not keen on retrying with a suitable
> > > backoff
> > > > until timeout either. Because that has the same problem as the
> scenario
> > > > that you described. The next metadata request could be to broker-1 to
> > > which
> > > > authentication succeeds and subsequent produce/consume  to broker-0
> > could
> > > > still fail.
> > > >
> > > > How about we just fail fast if one authentication fails - I think
> that
> > is
> > > > what you were suggesting in the first place? We don't need to
> blackout
> > > any
> > > > nodes beyond the reconnect backoff interval. Applications can still
> > retry
> > > > if they want to. In the case of credential updates, it will be up to
> > the
> > > > application to retry. During regular operation, a misconfigured
> > > application
> > > > fails fast with a meaningful exception. What do you think?
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Thu, May 4, 2017 at 3:01 PM, Ismael Juma 
> wrote:
> > > >
> > > > > H Rajini,
> > > > >
> > > > > Comments inline.
> > > > >
> > > > > On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ismael,
> > > > > >
> > > > > > Thank you for reviewing the KIP.
> > > > > >
> > > > > > An authenticated client that is not authorized to access a topic
> is
> > > > never
> > > > > > told that the operation was not authorized. This is to prevent
> the
> > > > client
> > > > > > from finding out if the topic exists by sending an unauthorized
> > > > request.
> > > > > So
> > > > > > in this case, the client will retry metadata requests with the
> > > > configured
> > > > > > backoff until it times out.
> > > > >
> > > > >
> > > > > This is true if the user does not have Describe permission. If the
> > user
> > > > has
> > > > > Describe access and no Read or Write access, then the user is
> > informed
> > > > that
> > > > > the operation was not authorized.
> > > > >
> > > > >
> > > > > > Another important distinction for authorization failures is that
> > the
> > > > > > connection is not terminated.
> > > > > >
> > > > > > For unauthenticated clients, we do want to inform the client that
> > > > > > authentication failed. The connection 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-18 Thread Jun Rao
Hi, Rajini,

Thanks for the KIP. Looks good overall. Just a couple of minor comments.

1. "The final message from the broker will indicate if authentication
succeeded or failed." Are we doing something special for the final message?
I thought each token response will indicate success or failure in the error
code.

2. It would be useful to document how we plan to evolve SaslAuthenticateRequest
in the future. For example, if we add a new error code in
SaslAuthenticateResponse,
should we bump up the version of SaslAuthenticateRequest? When do we need
to bump up SaslHandshakeRequest?

Jun


On Fri, Aug 18, 2017 at 10:59 AM, Ismael Juma  wrote:

> Thanks Rajini. The changes look good. We have to update the motivation to
> take into account the improvements in 0.11:
>
> "With the current Kafka SASL implementation, broker closes the client
> connection if SASL authentication fails. Clients see this as a connection
> failure and do not get any feedback for the reason why the connection was
> closed. Producers and consumers retry, attempting to create successful
> connections, treating authentication failures as transient failures. There
> are no log entries on the client-side to indicate that any of these
> connection failures were due to authentication failure."
>
> 0.11.0.x logs a warning with a hint that the disconnect is likely related
> to authentication.
>
> Ismael
>
> On Tue, Aug 15, 2017 at 9:23 PM, Rajini Sivaram 
> wrote:
>
> > I have updated the KIP based on the discussions so far. It will be good
> if
> > we can get some more feedback so that this can be implemented for 1.0.0.
> >
> >
> > Thanks,
> >
> > Rajini
> >
> >
> > On Thu, May 4, 2017 at 10:22 PM, Ismael Juma  wrote:
> >
> > > Hi Rajini,
> > >
> > > I think we were talking about slightly different things. I was just
> > > referring to the fact that there are cases where we throw an
> > > AuthorizationException back to the user without retrying from various
> > > methods (poll, commitSync, etc).
> > >
> > > As you said, my initial preference was for not retrying at all because
> it
> > > is what you want in the common case of a misconfigured application. I
> > > hadn't considered credential updates for authenticators that rely on
> > > eventual consistency. Thinking about it some more, it seems like this
> > > should be solved by the authenticator implementation as well. For
> > example,
> > > it could refresh the cached data for a user if authentication failed (a
> > > good implementation would be a bit more involved to avoid going to the
> > > underlying data source too often).
> > >
> > > Given that, not retrying sounds good to me.
> > >
> > > Ismael
> > >
> > > On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > I thought the blocking waits in the producer and consumer are always
> > > > related to retrying for metadata. So an authorization exception that
> > > > impacts this wait can only be due to Describe authorization failure -
> > > that
> > > > always retries?
> > > >
> > > > I agree that connecting to different brokers when authentication
> fails
> > > with
> > > > one is not desirable. But I am not keen on retrying with a suitable
> > > backoff
> > > > until timeout either. Because that has the same problem as the
> scenario
> > > > that you described. The next metadata request could be to broker-1 to
> > > which
> > > > authentication succeeds and subsequent produce/consume  to broker-0
> > could
> > > > still fail.
> > > >
> > > > How about we just fail fast if one authentication fails - I think
> that
> > is
> > > > what you were suggesting in the first place? We don't need to
> blackout
> > > any
> > > > nodes beyond the reconnect backoff interval. Applications can still
> > retry
> > > > if they want to. In the case of credential updates, it will be up to
> > the
> > > > application to retry. During regular operation, a misconfigured
> > > application
> > > > fails fast with a meaningful exception. What do you think?
> > > >
> > > > Regards,
> > > >
> > > > Rajini
> > > >
> > > >
> > > > On Thu, May 4, 2017 at 3:01 PM, Ismael Juma 
> wrote:
> > > >
> > > > > H Rajini,
> > > > >
> > > > > Comments inline.
> > > > >
> > > > > On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram <
> > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ismael,
> > > > > >
> > > > > > Thank you for reviewing the KIP.
> > > > > >
> > > > > > An authenticated client that is not authorized to access a topic
> is
> > > > never
> > > > > > told that the operation was not authorized. This is to prevent
> the
> > > > client
> > > > > > from finding out if the topic exists by sending an unauthorized
> > > > request.
> > > > > So
> > > > > > in this case, the client will retry metadata requests with the
> > > > configured
> > > > > > backoff until it times out.
> > > > >
> > > > >
> > > > 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-18 Thread Ismael Juma
Thanks Rajini. The changes look good. We have to update the motivation to
take into account the improvements in 0.11:

"With the current Kafka SASL implementation, broker closes the client
connection if SASL authentication fails. Clients see this as a connection
failure and do not get any feedback for the reason why the connection was
closed. Producers and consumers retry, attempting to create successful
connections, treating authentication failures as transient failures. There
are no log entries on the client-side to indicate that any of these
connection failures were due to authentication failure."

0.11.0.x logs a warning with a hint that the disconnect is likely related
to authentication.

Ismael

On Tue, Aug 15, 2017 at 9:23 PM, Rajini Sivaram 
wrote:

> I have updated the KIP based on the discussions so far. It will be good if
> we can get some more feedback so that this can be implemented for 1.0.0.
>
>
> Thanks,
>
> Rajini
>
>
> On Thu, May 4, 2017 at 10:22 PM, Ismael Juma  wrote:
>
> > Hi Rajini,
> >
> > I think we were talking about slightly different things. I was just
> > referring to the fact that there are cases where we throw an
> > AuthorizationException back to the user without retrying from various
> > methods (poll, commitSync, etc).
> >
> > As you said, my initial preference was for not retrying at all because it
> > is what you want in the common case of a misconfigured application. I
> > hadn't considered credential updates for authenticators that rely on
> > eventual consistency. Thinking about it some more, it seems like this
> > should be solved by the authenticator implementation as well. For
> example,
> > it could refresh the cached data for a user if authentication failed (a
> > good implementation would be a bit more involved to avoid going to the
> > underlying data source too often).
> >
> > Given that, not retrying sounds good to me.
> >
> > Ismael
> >
> > On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram 
> > wrote:
> >
> > > Hi Ismael,
> > >
> > > I thought the blocking waits in the producer and consumer are always
> > > related to retrying for metadata. So an authorization exception that
> > > impacts this wait can only be due to Describe authorization failure -
> > that
> > > always retries?
> > >
> > > I agree that connecting to different brokers when authentication fails
> > with
> > > one is not desirable. But I am not keen on retrying with a suitable
> > backoff
> > > until timeout either. Because that has the same problem as the scenario
> > > that you described. The next metadata request could be to broker-1 to
> > which
> > > authentication succeeds and subsequent produce/consume  to broker-0
> could
> > > still fail.
> > >
> > > How about we just fail fast if one authentication fails - I think that
> is
> > > what you were suggesting in the first place? We don't need to blackout
> > any
> > > nodes beyond the reconnect backoff interval. Applications can still
> retry
> > > if they want to. In the case of credential updates, it will be up to
> the
> > > application to retry. During regular operation, a misconfigured
> > application
> > > fails fast with a meaningful exception. What do you think?
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > > On Thu, May 4, 2017 at 3:01 PM, Ismael Juma  wrote:
> > >
> > > > H Rajini,
> > > >
> > > > Comments inline.
> > > >
> > > > On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram <
> > rajinisiva...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ismael,
> > > > >
> > > > > Thank you for reviewing the KIP.
> > > > >
> > > > > An authenticated client that is not authorized to access a topic is
> > > never
> > > > > told that the operation was not authorized. This is to prevent the
> > > client
> > > > > from finding out if the topic exists by sending an unauthorized
> > > request.
> > > > So
> > > > > in this case, the client will retry metadata requests with the
> > > configured
> > > > > backoff until it times out.
> > > >
> > > >
> > > > This is true if the user does not have Describe permission. If the
> user
> > > has
> > > > Describe access and no Read or Write access, then the user is
> informed
> > > that
> > > > the operation was not authorized.
> > > >
> > > >
> > > > > Another important distinction for authorization failures is that
> the
> > > > > connection is not terminated.
> > > > >
> > > > > For unauthenticated clients, we do want to inform the client that
> > > > > authentication failed. The connection is terminated by the broker.
> > > > > Especially if the client is using SASL_SSL, we really do want to
> > avoid
> > > > > reconnections that result in unnecessary expensive handshakes. So
> we
> > > want
> > > > > to return an exception to the user with minimal retries.
> > > > >
> > > >
> > > > Agreed.
> > > >
> > > > I was thinking that it may be useful to try more than one broker for
> > the
> > > > > case where 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-18 Thread Vahid S Hashemian
Hi Rajini,

Thanks for the KIP. It looks good to me.
It would be great if it can make it to 1.0.0.

--Vahid



From:   Rajini Sivaram <rajinisiva...@gmail.com>
To: dev <dev@kafka.apache.org>
Date:   08/15/2017 01:23 PM
Subject:    Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL 
authentication failures



I have updated the KIP based on the discussions so far. It will be good if
we can get some more feedback so that this can be implemented for 1.0.0.


Thanks,

Rajini


On Thu, May 4, 2017 at 10:22 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Rajini,
>
> I think we were talking about slightly different things. I was just
> referring to the fact that there are cases where we throw an
> AuthorizationException back to the user without retrying from various
> methods (poll, commitSync, etc).
>
> As you said, my initial preference was for not retrying at all because 
it
> is what you want in the common case of a misconfigured application. I
> hadn't considered credential updates for authenticators that rely on
> eventual consistency. Thinking about it some more, it seems like this
> should be solved by the authenticator implementation as well. For 
example,
> it could refresh the cached data for a user if authentication failed (a
> good implementation would be a bit more involved to avoid going to the
> underlying data source too often).
>
> Given that, not retrying sounds good to me.
>
> Ismael
>
> On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram <rajinisiva...@gmail.com>
> wrote:
>
> > Hi Ismael,
> >
> > I thought the blocking waits in the producer and consumer are always
> > related to retrying for metadata. So an authorization exception that
> > impacts this wait can only be due to Describe authorization failure -
> that
> > always retries?
> >
> > I agree that connecting to different brokers when authentication fails
> with
> > one is not desirable. But I am not keen on retrying with a suitable
> backoff
> > until timeout either. Because that has the same problem as the 
scenario
> > that you described. The next metadata request could be to broker-1 to
> which
> > authentication succeeds and subsequent produce/consume  to broker-0 
could
> > still fail.
> >
> > How about we just fail fast if one authentication fails - I think that 
is
> > what you were suggesting in the first place? We don't need to blackout
> any
> > nodes beyond the reconnect backoff interval. Applications can still 
retry
> > if they want to. In the case of credential updates, it will be up to 
the
> > application to retry. During regular operation, a misconfigured
> application
> > fails fast with a meaningful exception. What do you think?
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Thu, May 4, 2017 at 3:01 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > H Rajini,
> > >
> > > Comments inline.
> > >
> > > On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > Thank you for reviewing the KIP.
> > > >
> > > > An authenticated client that is not authorized to access a topic 
is
> > never
> > > > told that the operation was not authorized. This is to prevent the
> > client
> > > > from finding out if the topic exists by sending an unauthorized
> > request.
> > > So
> > > > in this case, the client will retry metadata requests with the
> > configured
> > > > backoff until it times out.
> > >
> > >
> > > This is true if the user does not have Describe permission. If the 
user
> > has
> > > Describe access and no Read or Write access, then the user is 
informed
> > that
> > > the operation was not authorized.
> > >
> > >
> > > > Another important distinction for authorization failures is that 
the
> > > > connection is not terminated.
> > > >
> > > > For unauthenticated clients, we do want to inform the client that
> > > > authentication failed. The connection is terminated by the broker.
> > > > Especially if the client is using SASL_SSL, we really do want to
> avoid
> > > > reconnections that result in unnecessary expensive handshakes. So 
we
> > want
> > > > to return an exception to the user with minimal retries.
> > > >
> > >
> > > Agreed.
> > >
> > > I was thinking that it may be useful to try more than one broker for
>

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-08-15 Thread Rajini Sivaram
I have updated the KIP based on the discussions so far. It will be good if
we can get some more feedback so that this can be implemented for 1.0.0.


Thanks,

Rajini


On Thu, May 4, 2017 at 10:22 PM, Ismael Juma  wrote:

> Hi Rajini,
>
> I think we were talking about slightly different things. I was just
> referring to the fact that there are cases where we throw an
> AuthorizationException back to the user without retrying from various
> methods (poll, commitSync, etc).
>
> As you said, my initial preference was for not retrying at all because it
> is what you want in the common case of a misconfigured application. I
> hadn't considered credential updates for authenticators that rely on
> eventual consistency. Thinking about it some more, it seems like this
> should be solved by the authenticator implementation as well. For example,
> it could refresh the cached data for a user if authentication failed (a
> good implementation would be a bit more involved to avoid going to the
> underlying data source too often).
>
> Given that, not retrying sounds good to me.
>
> Ismael
>
> On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram 
> wrote:
>
> > Hi Ismael,
> >
> > I thought the blocking waits in the producer and consumer are always
> > related to retrying for metadata. So an authorization exception that
> > impacts this wait can only be due to Describe authorization failure -
> that
> > always retries?
> >
> > I agree that connecting to different brokers when authentication fails
> with
> > one is not desirable. But I am not keen on retrying with a suitable
> backoff
> > until timeout either. Because that has the same problem as the scenario
> > that you described. The next metadata request could be to broker-1 to
> which
> > authentication succeeds and subsequent produce/consume  to broker-0 could
> > still fail.
> >
> > How about we just fail fast if one authentication fails - I think that is
> > what you were suggesting in the first place? We don't need to blackout
> any
> > nodes beyond the reconnect backoff interval. Applications can still retry
> > if they want to. In the case of credential updates, it will be up to the
> > application to retry. During regular operation, a misconfigured
> application
> > fails fast with a meaningful exception. What do you think?
> >
> > Regards,
> >
> > Rajini
> >
> >
> > On Thu, May 4, 2017 at 3:01 PM, Ismael Juma  wrote:
> >
> > > H Rajini,
> > >
> > > Comments inline.
> > >
> > > On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > Thank you for reviewing the KIP.
> > > >
> > > > An authenticated client that is not authorized to access a topic is
> > never
> > > > told that the operation was not authorized. This is to prevent the
> > client
> > > > from finding out if the topic exists by sending an unauthorized
> > request.
> > > So
> > > > in this case, the client will retry metadata requests with the
> > configured
> > > > backoff until it times out.
> > >
> > >
> > > This is true if the user does not have Describe permission. If the user
> > has
> > > Describe access and no Read or Write access, then the user is informed
> > that
> > > the operation was not authorized.
> > >
> > >
> > > > Another important distinction for authorization failures is that the
> > > > connection is not terminated.
> > > >
> > > > For unauthenticated clients, we do want to inform the client that
> > > > authentication failed. The connection is terminated by the broker.
> > > > Especially if the client is using SASL_SSL, we really do want to
> avoid
> > > > reconnections that result in unnecessary expensive handshakes. So we
> > want
> > > > to return an exception to the user with minimal retries.
> > > >
> > >
> > > Agreed.
> > >
> > > I was thinking that it may be useful to try more than one broker for
> the
> > > > case where brokers are being upgraded and some brokers haven't yet
> seen
> > > the
> > > > latest credentials. I suppose I was thinking that at the moment we
> keep
> > > on
> > > > retrying every broker forever in the consumer and suddenly if we stop
> > > > retrying altogether, it could potentially lead to some unforeseen
> > timing
> > > > issues. Hence the suggestion to try every broker once.
> > > >
> > >
> > > I see. Retrying forever is a side-effect of auto-topic creation, but
> it's
> > > something we want to move away from. As mentioned, we actually don't
> > retry
> > > at all if the user has Describe permission.
> > >
> > > Broker upgrades could be fixed by ensuring that the latest credentials
> > are
> > > loaded before the broker starts serving requests. More problematic is
> > > dealing with credential updates. This is another distinction when
> > compared
> > > to authorization.
> > >
> > > I am not sure if trying different brokers really helps us though. Say,
> we
> > > fail to authenticate with broker 0 and then we 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-05-04 Thread Ismael Juma
Hi Rajini,

I think we were talking about slightly different things. I was just
referring to the fact that there are cases where we throw an
AuthorizationException back to the user without retrying from various
methods (poll, commitSync, etc).

As you said, my initial preference was for not retrying at all because it
is what you want in the common case of a misconfigured application. I
hadn't considered credential updates for authenticators that rely on
eventual consistency. Thinking about it some more, it seems like this
should be solved by the authenticator implementation as well. For example,
it could refresh the cached data for a user if authentication failed (a
good implementation would be a bit more involved to avoid going to the
underlying data source too often).

Given that, not retrying sounds good to me.

Ismael

On Thu, May 4, 2017 at 4:04 PM, Rajini Sivaram 
wrote:

> Hi Ismael,
>
> I thought the blocking waits in the producer and consumer are always
> related to retrying for metadata. So an authorization exception that
> impacts this wait can only be due to Describe authorization failure - that
> always retries?
>
> I agree that connecting to different brokers when authentication fails with
> one is not desirable. But I am not keen on retrying with a suitable backoff
> until timeout either. Because that has the same problem as the scenario
> that you described. The next metadata request could be to broker-1 to which
> authentication succeeds and subsequent produce/consume  to broker-0 could
> still fail.
>
> How about we just fail fast if one authentication fails - I think that is
> what you were suggesting in the first place? We don't need to blackout any
> nodes beyond the reconnect backoff interval. Applications can still retry
> if they want to. In the case of credential updates, it will be up to the
> application to retry. During regular operation, a misconfigured application
> fails fast with a meaningful exception. What do you think?
>
> Regards,
>
> Rajini
>
>
> On Thu, May 4, 2017 at 3:01 PM, Ismael Juma  wrote:
>
> > H Rajini,
> >
> > Comments inline.
> >
> > On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram 
> > wrote:
> >
> > > Hi Ismael,
> > >
> > > Thank you for reviewing the KIP.
> > >
> > > An authenticated client that is not authorized to access a topic is
> never
> > > told that the operation was not authorized. This is to prevent the
> client
> > > from finding out if the topic exists by sending an unauthorized
> request.
> > So
> > > in this case, the client will retry metadata requests with the
> configured
> > > backoff until it times out.
> >
> >
> > This is true if the user does not have Describe permission. If the user
> has
> > Describe access and no Read or Write access, then the user is informed
> that
> > the operation was not authorized.
> >
> >
> > > Another important distinction for authorization failures is that the
> > > connection is not terminated.
> > >
> > > For unauthenticated clients, we do want to inform the client that
> > > authentication failed. The connection is terminated by the broker.
> > > Especially if the client is using SASL_SSL, we really do want to avoid
> > > reconnections that result in unnecessary expensive handshakes. So we
> want
> > > to return an exception to the user with minimal retries.
> > >
> >
> > Agreed.
> >
> > I was thinking that it may be useful to try more than one broker for the
> > > case where brokers are being upgraded and some brokers haven't yet seen
> > the
> > > latest credentials. I suppose I was thinking that at the moment we keep
> > on
> > > retrying every broker forever in the consumer and suddenly if we stop
> > > retrying altogether, it could potentially lead to some unforeseen
> timing
> > > issues. Hence the suggestion to try every broker once.
> > >
> >
> > I see. Retrying forever is a side-effect of auto-topic creation, but it's
> > something we want to move away from. As mentioned, we actually don't
> retry
> > at all if the user has Describe permission.
> >
> > Broker upgrades could be fixed by ensuring that the latest credentials
> are
> > loaded before the broker starts serving requests. More problematic is
> > dealing with credential updates. This is another distinction when
> compared
> > to authorization.
> >
> > I am not sure if trying different brokers really helps us though. Say, we
> > fail to authenticate with broker 0 and then we succeed with broker 1.
> This
> > helps with metadata requests, but we will be in trouble when we try to
> > produce or consume to broker 0 (because it's the leader of some
> > partitions). So maybe we just want to retry with a suitable backoff
> until a
> > timeout?
> >
> > Yes, I agree that blacking out nodes forever isn't a good idea. When we
> > > throw AuthenticationFailedException for the current operation or if
> > > authentication to another broker succeeds, we can clear the blackout so
> > > 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-05-04 Thread Magnus Edenhill
Hi Rajini, great KIP!
This solution was proposed on the original KIP-43 thread but voted down, so
let's hope it does better this time :)

/Magnus


2017-05-04 13:37 GMT+02:00 Rajini Sivaram :

> Hi all,
>
> I have created a KIP to improve diagnostics for SASL authentication
> failures and reduce retries and blocking when authentication fails:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 152+-+Improve+diagnostics+for+SASL+authentication+failures
>
> Comments and suggestions are welcome.
>
> Thank you...
>
> Regards,
>
> Rajini
>


Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-05-04 Thread Rajini Sivaram
Hi Ismael,

I thought the blocking waits in the producer and consumer are always
related to retrying for metadata. So an authorization exception that
impacts this wait can only be due to Describe authorization failure - that
always retries?

I agree that connecting to different brokers when authentication fails with
one is not desirable. But I am not keen on retrying with a suitable backoff
until timeout either. Because that has the same problem as the scenario
that you described. The next metadata request could be to broker-1 to which
authentication succeeds and subsequent produce/consume  to broker-0 could
still fail.

How about we just fail fast if one authentication fails - I think that is
what you were suggesting in the first place? We don't need to blackout any
nodes beyond the reconnect backoff interval. Applications can still retry
if they want to. In the case of credential updates, it will be up to the
application to retry. During regular operation, a misconfigured application
fails fast with a meaningful exception. What do you think?

Regards,

Rajini


On Thu, May 4, 2017 at 3:01 PM, Ismael Juma  wrote:

> H Rajini,
>
> Comments inline.
>
> On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram 
> wrote:
>
> > Hi Ismael,
> >
> > Thank you for reviewing the KIP.
> >
> > An authenticated client that is not authorized to access a topic is never
> > told that the operation was not authorized. This is to prevent the client
> > from finding out if the topic exists by sending an unauthorized request.
> So
> > in this case, the client will retry metadata requests with the configured
> > backoff until it times out.
>
>
> This is true if the user does not have Describe permission. If the user has
> Describe access and no Read or Write access, then the user is informed that
> the operation was not authorized.
>
>
> > Another important distinction for authorization failures is that the
> > connection is not terminated.
> >
> > For unauthenticated clients, we do want to inform the client that
> > authentication failed. The connection is terminated by the broker.
> > Especially if the client is using SASL_SSL, we really do want to avoid
> > reconnections that result in unnecessary expensive handshakes. So we want
> > to return an exception to the user with minimal retries.
> >
>
> Agreed.
>
> I was thinking that it may be useful to try more than one broker for the
> > case where brokers are being upgraded and some brokers haven't yet seen
> the
> > latest credentials. I suppose I was thinking that at the moment we keep
> on
> > retrying every broker forever in the consumer and suddenly if we stop
> > retrying altogether, it could potentially lead to some unforeseen timing
> > issues. Hence the suggestion to try every broker once.
> >
>
> I see. Retrying forever is a side-effect of auto-topic creation, but it's
> something we want to move away from. As mentioned, we actually don't retry
> at all if the user has Describe permission.
>
> Broker upgrades could be fixed by ensuring that the latest credentials are
> loaded before the broker starts serving requests. More problematic is
> dealing with credential updates. This is another distinction when compared
> to authorization.
>
> I am not sure if trying different brokers really helps us though. Say, we
> fail to authenticate with broker 0 and then we succeed with broker 1. This
> helps with metadata requests, but we will be in trouble when we try to
> produce or consume to broker 0 (because it's the leader of some
> partitions). So maybe we just want to retry with a suitable backoff until a
> timeout?
>
> Yes, I agree that blacking out nodes forever isn't a good idea. When we
> > throw AuthenticationFailedException for the current operation or if
> > authentication to another broker succeeds, we can clear the blackout so
> > that any new request from the client can attempt reconnection after the
> > reconnect backoff period as they do now.
> >
>
> Yes, that would be better if we decide that connecting to different brokers
> is worthwhile for the requests that can be sent to any broker.
>
> Ismael
>
> On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram 
> wrote:
>
> > Hi Ismael,
> >
> > Thank you for reviewing the KIP.
> >
> > An authenticated client that is not authorized to access a topic is never
> > told that the operation was not authorized. This is to prevent the client
> > from finding out if the topic exists by sending an unauthorized request.
> So
> > in this case, the client will retry metadata requests with the configured
> > backoff until it times out. Another important distinction for
> authorization
> > failures is that the connection is not terminated.
> >
> > For unauthenticated clients, we do want to inform the client that
> > authentication failed. The connection is terminated by the broker.
> > Especially if the client is using SASL_SSL, we really do want to avoid
> > reconnections that 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-05-04 Thread Ismael Juma
H Rajini,

Comments inline.

On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram 
wrote:

> Hi Ismael,
>
> Thank you for reviewing the KIP.
>
> An authenticated client that is not authorized to access a topic is never
> told that the operation was not authorized. This is to prevent the client
> from finding out if the topic exists by sending an unauthorized request. So
> in this case, the client will retry metadata requests with the configured
> backoff until it times out.


This is true if the user does not have Describe permission. If the user has
Describe access and no Read or Write access, then the user is informed that
the operation was not authorized.


> Another important distinction for authorization failures is that the
> connection is not terminated.
>
> For unauthenticated clients, we do want to inform the client that
> authentication failed. The connection is terminated by the broker.
> Especially if the client is using SASL_SSL, we really do want to avoid
> reconnections that result in unnecessary expensive handshakes. So we want
> to return an exception to the user with minimal retries.
>

Agreed.

I was thinking that it may be useful to try more than one broker for the
> case where brokers are being upgraded and some brokers haven't yet seen the
> latest credentials. I suppose I was thinking that at the moment we keep on
> retrying every broker forever in the consumer and suddenly if we stop
> retrying altogether, it could potentially lead to some unforeseen timing
> issues. Hence the suggestion to try every broker once.
>

I see. Retrying forever is a side-effect of auto-topic creation, but it's
something we want to move away from. As mentioned, we actually don't retry
at all if the user has Describe permission.

Broker upgrades could be fixed by ensuring that the latest credentials are
loaded before the broker starts serving requests. More problematic is
dealing with credential updates. This is another distinction when compared
to authorization.

I am not sure if trying different brokers really helps us though. Say, we
fail to authenticate with broker 0 and then we succeed with broker 1. This
helps with metadata requests, but we will be in trouble when we try to
produce or consume to broker 0 (because it's the leader of some
partitions). So maybe we just want to retry with a suitable backoff until a
timeout?

Yes, I agree that blacking out nodes forever isn't a good idea. When we
> throw AuthenticationFailedException for the current operation or if
> authentication to another broker succeeds, we can clear the blackout so
> that any new request from the client can attempt reconnection after the
> reconnect backoff period as they do now.
>

Yes, that would be better if we decide that connecting to different brokers
is worthwhile for the requests that can be sent to any broker.

Ismael

On Thu, May 4, 2017 at 2:29 PM, Rajini Sivaram 
wrote:

> Hi Ismael,
>
> Thank you for reviewing the KIP.
>
> An authenticated client that is not authorized to access a topic is never
> told that the operation was not authorized. This is to prevent the client
> from finding out if the topic exists by sending an unauthorized request. So
> in this case, the client will retry metadata requests with the configured
> backoff until it times out. Another important distinction for authorization
> failures is that the connection is not terminated.
>
> For unauthenticated clients, we do want to inform the client that
> authentication failed. The connection is terminated by the broker.
> Especially if the client is using SASL_SSL, we really do want to avoid
> reconnections that result in unnecessary expensive handshakes. So we want
> to return an exception to the user with minimal retries.
>
> I was thinking that it may be useful to try more than one broker for the
> case where brokers are being upgraded and some brokers haven't yet seen the
> latest credentials. I suppose I was thinking that at the moment we keep on
> retrying every broker forever in the consumer and suddenly if we stop
> retrying altogether, it could potentially lead to some unforeseen timing
> issues. Hence the suggestion to try every broker once.
>
> Yes, I agree that blacking out nodes forever isn't a good idea. When we
> throw AuthenticationFailedException for the current operation or if
> authentication to another broker succeeds, we can clear the blackout so
> that any new request from the client can attempt reconnection after the
> reconnect backoff period as they do now.
>
> Regards,
>
> Rajini
>
> On Thu, May 4, 2017 at 12:51 PM, Ismael Juma  wrote:
>
> > Thanks Rajini. This is a good improvement. One question, the proposal
> > states:
> >
> > Producer waitForMetadata and consumer ensureCoordinatorReady will be
> > > updated to throw AuthenticationFailedException if connections to all
> > > available brokers fail authentication.
> >
> >
> > Can you elaborate on the reason why we 

Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-05-04 Thread Rajini Sivaram
Hi Ismael,

Thank you for reviewing the KIP.

An authenticated client that is not authorized to access a topic is never
told that the operation was not authorized. This is to prevent the client
from finding out if the topic exists by sending an unauthorized request. So
in this case, the client will retry metadata requests with the configured
backoff until it times out. Another important distinction for authorization
failures is that the connection is not terminated.

For unauthenticated clients, we do want to inform the client that
authentication failed. The connection is terminated by the broker.
Especially if the client is using SASL_SSL, we really do want to avoid
reconnections that result in unnecessary expensive handshakes. So we want
to return an exception to the user with minimal retries.

I was thinking that it may be useful to try more than one broker for the
case where brokers are being upgraded and some brokers haven't yet seen the
latest credentials. I suppose I was thinking that at the moment we keep on
retrying every broker forever in the consumer and suddenly if we stop
retrying altogether, it could potentially lead to some unforeseen timing
issues. Hence the suggestion to try every broker once.

Yes, I agree that blacking out nodes forever isn't a good idea. When we
throw AuthenticationFailedException for the current operation or if
authentication to another broker succeeds, we can clear the blackout so
that any new request from the client can attempt reconnection after the
reconnect backoff period as they do now.

Regards,

Rajini

On Thu, May 4, 2017 at 12:51 PM, Ismael Juma  wrote:

> Thanks Rajini. This is a good improvement. One question, the proposal
> states:
>
> Producer waitForMetadata and consumer ensureCoordinatorReady will be
> > updated to throw AuthenticationFailedException if connections to all
> > available brokers fail authentication.
>
>
> Can you elaborate on the reason why we would treat authentication failures
> differently from authorization failures? It would be good to understand
> under which scenario it would be beneficial to try all the brokers (it
> seems that the proposal also suggests blacking out brokers permanently if
> we fail authentication, so that could also eventually cause issues).
>
> Ismael
>
>
> On Thu, May 4, 2017 at 12:37 PM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> > I have created a KIP to improve diagnostics for SASL authentication
> > failures and reduce retries and blocking when authentication fails:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-152+-+
> > Improve+diagnostics+for+SASL+authentication+failures
> >
> > Comments and suggestions are welcome.
> >
> > Thank you...
> >
> > Regards,
> >
> > Rajini
> >
>


Re: [DISCUSS] KIP-152 - Improve diagnostics for SASL authentication failures

2017-05-04 Thread Ismael Juma
Thanks Rajini. This is a good improvement. One question, the proposal
states:

Producer waitForMetadata and consumer ensureCoordinatorReady will be
> updated to throw AuthenticationFailedException if connections to all
> available brokers fail authentication.


Can you elaborate on the reason why we would treat authentication failures
differently from authorization failures? It would be good to understand
under which scenario it would be beneficial to try all the brokers (it
seems that the proposal also suggests blacking out brokers permanently if
we fail authentication, so that could also eventually cause issues).

Ismael


On Thu, May 4, 2017 at 12:37 PM, Rajini Sivaram 
wrote:

> Hi all,
>
> I have created a KIP to improve diagnostics for SASL authentication
> failures and reduce retries and blocking when authentication fails:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-152+-+
> Improve+diagnostics+for+SASL+authentication+failures
>
> Comments and suggestions are welcome.
>
> Thank you...
>
> Regards,
>
> Rajini
>