Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-08 Thread Sönke Liebau
s,
> > would
> > > see unencrypted or encrypted records. If we implemented the KIP as
> > written,
> > > it would still result in a bunch of plain text data in RocksDB
> > everywhere.
> > > Again, I'm not sure what the point would be. Perhaps using custom
> serdes
> > > would actually be a more holistic approach, since Kafka Streams etc
> could
> > > leverage these as well.
> > >
> > > Similarly, if the whole record is encrypted, it becomes impossible to
> do
> > > joins, group bys etc, which just need the record key and maybe don't
> have
> > > access to the encryption key. Maybe only record _values_ should be
> > > encrypted, and maybe Kafka Streams could defer decryption until the
> > actual
> > > value is inspected. That way joins etc are possible without the
> > encryption
> > > key, and RocksDB would not need to decrypt values before materializing
> to
> > > disk.
> > >
> > > This is why I've implemented encryption on a per-field basis, not at
> the
> > > record level, when addressing kafka security in the past. And I've had
> to
> > > build external pipelines that purge, re-encrypt, and re-ingest records
> > when
> > > keys are compromised.
> > >
> > > This KIP might be a step in the right direction, not sure. But I'm
> > hesitant
> > > to support the idea of end-to-end encryption without a plan to address
> > the
> > > myriad other problems.
> > >
> > > That said, we need this badly and I hope something shakes out.
> > >
> > > Ryanne
> > >
> > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
> > >  wrote:
> > >
> > > > All,
> > > >
> > > > I've asked for comments on this KIP in the past, but since I didn't
> > > really
> > > > get any feedback I've decided to reduce the initial scope of the KIP
> a
> > > bit
> > > > and try again.
> > > >
> > > > I have reworked to KIP to provide a limited, but useful set of
> features
> > > for
> > > > this initial KIP and laid out a very rough roadmap of what I'd
> envision
> > > > this looking like in a final version.
> > > >
> > > > I am aware that the KIP is currently light on implementation details,
> > but
> > > > would like to get some feedback on the general approach before fully
> > > > speccing everything.
> > > >
> > > > The KIP can be found at
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> > > >
> > > >
> > > > I would very much appreciate any feedback!
> > > >
> > > > Best regards,
> > > > Sönke
> > > >
> > >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-604: Remove ZooKeeper Flags from the Administrative Tools

2020-05-05 Thread Sönke Liebau
Thanks for making the changes Colin, lgtm!

On Mon, 4 May 2020 at 23:13, Colin McCabe  wrote:

> Hi Sönke,
>
> You're right on both counts.  Thanks for the corrections.
>
> Thinking about this more, I think we should just remove
> kafka-preferred-replica-election.sh.  It just duplicates
> kafka-leader-election.sh, and it has been deprecated for some time.  I
> changed the KIP.
>
> cheers,
> Colin
>
> On Mon, May 4, 2020, at 01:06, Sönke Liebau wrote:
> > Hi Colin,
> >
> > thanks for the kip, lgtm overall with two small comments:
> >
> > 1. you mention kafka-leader-election.sh in the list of tools to remove
> the
> > ZooKeeper option from, but I think the command never had that option, as
> it
> > was introduced in KIP-460 specifically to replace the deprecated command
> > with Zookeeper option.
> >
> > 2. You mention kafka-preferred-leader-election.sh which is probably just
> a
> > typo and refers to to kafka-preferred-replica-election.sh ?
> >
> > Best regards,
> > Sönke
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC
> >
> > On Fri, 1 May 2020 at 08:17, Colin McCabe  wrote:
> >
> > > Hi all,
> > >
> > > I posted a KIP about removing the deprecated --zookeeper flags from our
> > > administrative tools.  Check it out here:
> > > https://cwiki.apache.org/confluence/x/kRARCQ
> > >
> > > best,
> > > Colin
> > >
> >
> >
> > --
> > Sönke Liebau
> > Partner
> > Tel. +49 179 7940878
> > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-604: Remove ZooKeeper Flags from the Administrative Tools

2020-05-04 Thread Sönke Liebau
Hi Colin,

thanks for the kip, lgtm overall with two small comments:

1. you mention kafka-leader-election.sh in the list of tools to remove the
ZooKeeper option from, but I think the command never had that option, as it
was introduced in KIP-460 specifically to replace the deprecated command
with Zookeeper option.

2. You mention kafka-preferred-leader-election.sh which is probably just a
typo and refers to to kafka-preferred-replica-election.sh ?

Best regards,
Sönke

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-460%3A+Admin+Leader+Election+RPC

On Fri, 1 May 2020 at 08:17, Colin McCabe  wrote:

> Hi all,
>
> I posted a KIP about removing the deprecated --zookeeper flags from our
> administrative tools.  Check it out here:
> https://cwiki.apache.org/confluence/x/kRARCQ
>
> best,
> Colin
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-05-04 Thread Sönke Liebau
Ah, I see, thanks for the clarification!

Shouldn't be an issue I think. My understanding of KIPs was always that
they are mostly intended as a place to discuss and agree changes up front,
whereas tracking the actual releases that things go into should be handled
in Jira.
So maybe we just create new jiras for any subsequent work and either link
those or make them subtasks (even though this jira is already a subtask
itself), that should allow us to properly track all releases that work goes
into.

Thanks for your work on this!!

Best,
Sönke


On Sat, 2 May 2020 at 00:31, Boyang Chen  wrote:

> Sure thing Sonke, what I suggest is that usual KIPs get accepted to go into
> next release. It could span for a couple of releases because of engineering
> time, but no change has to be shipped in specific future releases, like the
> backward incompatible change for KafkaPrincipal. But I guess it's not
> really a blocker, as long as we stated clearly in the KIP how we are going
> to roll things out, and let it partially finish in 2.6.
>
> Boyang
>
> On Fri, May 1, 2020 at 2:32 PM Sönke Liebau
>  wrote:
>
> > Hi Boyang,
> >
> > thanks for the update, sounds reasonable to me. Making it a breaking
> change
> > is definitely the safer route to go.
> >
> > Just one quick question regarding your mail, I didn't fully understand
> what
> > you mean by "I think this is the first time we need to introduce a KIP
> > without having it
> > fully accepted in next release."  - could you perhaps explain that some
> > more very briefly?
> >
> > Best regards,
> > Sönke
> >
> >
> >
> > On Fri, 1 May 2020 at 23:03, Boyang Chen 
> > wrote:
> >
> > > Hey Tom,
> > >
> > > thanks for the suggestion. As long as we could correctly serialize the
> > > principal and embed in the Envelope, I think we could still leverage
> the
> > > controller to do the client request authentication. Although this pays
> an
> > > extra round trip if the authorization is doomed to fail on the receiver
> > > side, having a centralized processing unit is more favorable such as
> > > ensuring the audit log is consistent instead of scattering between
> > > forwarder and receiver.
> > >
> > > Boyang
> > >
> > > On Wed, Apr 29, 2020 at 9:50 AM Tom Bentley 
> wrote:
> > >
> > > > Hi Boyang,
> > > >
> > > > Thanks for the update. In the EnvelopeRequest handling section of the
> > KIP
> > > > it might be worth saying explicitly that authorization of the request
> > > will
> > > > happen as normal. Otherwise what you're proposing makes sense to me.
> > > >
> > > > Thanks again,
> > > >
> > > > Tom
> > > >
> > > >
> > > >
> > > > On Wed, Apr 29, 2020 at 5:27 PM Boyang Chen <
> > reluctanthero...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for the proposed idea Sonke. I reviewed it and had some
> > offline
> > > > > discussion with Colin, Rajini and Mathew.
> > > > >
> > > > > We do need to add serializability to the PrincipalBuilder
> interface,
> > > but
> > > > we
> > > > > should not make any default implementation which could go wrong and
> > > messy
> > > > > up with the security in a production environment if the user
> neglects
> > > it.
> > > > > Instead we need to make it required and backward incompatible. So I
> > > > > integrated your proposed methods and expand the Envelope RPC with a
> > > > couple
> > > > > of more fields for audit log purpose as well.
> > > > >
> > > > > Since the KafkaPrincipal builder serializability is a binary
> > > incompatible
> > > > > change, I propose (also stated in the KIP) the following
> > implementation
> > > > > plan:
> > > > >
> > > > >1. For next *2.x* release:
> > > > >   1. Get new admin client forwarding changes
> > > > >   2. Get the Envelope RPC implementation
> > > > >   3. Get the forwarding path working and validate the function
> > with
> > > > >   fake principals in testing environment, without actual
> > triggering
> > > > in
> > > > > the
> > > > >   production system
> > > > >2. For next *3.0 *release:
> > > > >   1. Introduce ser

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-01 Thread Sönke Liebau
Hi Tom,

thanks for taking a look!

Regarding your questions, I've answered below, but will also add more
detail to the KIP around these questions.

1. The functionality in this first phase could indeed be achieved with
custom serializers, that would then need to wrap the actual serializer that
is to be used. However, looking forward I intend to add functionality that
allows configuration to be configured broker-side via topic level configs
and investigate encrypting entire batches of messages for performance. Both
those things would require us to move past doing this in a serializer, so I
think we should take that plunge now to avoid unnecessary refactoring later
on.

2. Absolutely! I am currently working on a very (very) rough implementation
to kind of prove the principle. I'll add those to the KIP as soon as I
think they are in a somewhat final form.
There are a lot of design details missing from the KIP, I didn't want to go
all the way just for people to hate what I designed and have to start over
;)

3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of
this KIP that allows configuring keys per topic pattern and will read the
keys from a local file. This will provide encryption, but users would have
to manually sync keystores across consumer and producer systems. Proper key
management with rollover and retrieval from central vaults would come in a
later phase.

4. I'm not 100% sure I follow your meaning here tbh. But I think the
question may be academic in this first instance, as compression happens at
batch level, so we can't encrypt at the record level after that. If we want
to stick with encrypting individual records, that would have to happen
pre-compression, unless I am mistaken about the internals here.

Best regards,
Sönke


On Fri, 1 May 2020 at 18:19, Tom Bentley  wrote:

> Hi Sönke,
>
> I never looked at the original version, but what you describe in the new
> version makes sense to me.
>
> Here are a few things which sprang to mind while I was reading:
>
> 1. It wasn't immediately obvious why this can't be achieved using custom
> serializers and deserializers.
> 2. It would be useful to fully define the Java interfaces you're talking
> about.
> 3 Would a KeyManager implementation be provided?
> 4. About compression+encryption: My understanding is CRIME used a chosen
> plaintext attack. AFAICS using compression would potentially allow a known
> plaintext attack, which is a weaker way of attacking a cipher. Even without
> compression in the picture known plaintext attacks would be possible, for
> example if the attacker knew the key was JSON encoded.
>
> Kind regards,
>
> Tom
>
> On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau
>  wrote:
>
> > All,
> >
> > I've asked for comments on this KIP in the past, but since I didn't
> really
> > get any feedback I've decided to reduce the initial scope of the KIP a
> bit
> > and try again.
> >
> > I have reworked to KIP to provide a limited, but useful set of features
> for
> > this initial KIP and laid out a very rough roadmap of what I'd envision
> > this looking like in a final version.
> >
> > I am aware that the KIP is currently light on implementation details, but
> > would like to get some feedback on the general approach before fully
> > speccing everything.
> >
> > The KIP can be found at
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> >
> >
> > I would very much appreciate any feedback!
> >
> > Best regards,
> > Sönke
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-05-01 Thread Sönke Liebau
Hi Boyang,

thanks for the update, sounds reasonable to me. Making it a breaking change
is definitely the safer route to go.

Just one quick question regarding your mail, I didn't fully understand what
you mean by "I think this is the first time we need to introduce a KIP
without having it
fully accepted in next release."  - could you perhaps explain that some
more very briefly?

Best regards,
Sönke



On Fri, 1 May 2020 at 23:03, Boyang Chen  wrote:

> Hey Tom,
>
> thanks for the suggestion. As long as we could correctly serialize the
> principal and embed in the Envelope, I think we could still leverage the
> controller to do the client request authentication. Although this pays an
> extra round trip if the authorization is doomed to fail on the receiver
> side, having a centralized processing unit is more favorable such as
> ensuring the audit log is consistent instead of scattering between
> forwarder and receiver.
>
> Boyang
>
> On Wed, Apr 29, 2020 at 9:50 AM Tom Bentley  wrote:
>
> > Hi Boyang,
> >
> > Thanks for the update. In the EnvelopeRequest handling section of the KIP
> > it might be worth saying explicitly that authorization of the request
> will
> > happen as normal. Otherwise what you're proposing makes sense to me.
> >
> > Thanks again,
> >
> > Tom
> >
> >
> >
> > On Wed, Apr 29, 2020 at 5:27 PM Boyang Chen 
> > wrote:
> >
> > > Thanks for the proposed idea Sonke. I reviewed it and had some offline
> > > discussion with Colin, Rajini and Mathew.
> > >
> > > We do need to add serializability to the PrincipalBuilder interface,
> but
> > we
> > > should not make any default implementation which could go wrong and
> messy
> > > up with the security in a production environment if the user neglects
> it.
> > > Instead we need to make it required and backward incompatible. So I
> > > integrated your proposed methods and expand the Envelope RPC with a
> > couple
> > > of more fields for audit log purpose as well.
> > >
> > > Since the KafkaPrincipal builder serializability is a binary
> incompatible
> > > change, I propose (also stated in the KIP) the following implementation
> > > plan:
> > >
> > >1. For next *2.x* release:
> > >   1. Get new admin client forwarding changes
> > >   2. Get the Envelope RPC implementation
> > >   3. Get the forwarding path working and validate the function with
> > >   fake principals in testing environment, without actual triggering
> > in
> > > the
> > >   production system
> > >2. For next *3.0 *release:
> > >   1. Introduce serializability to PrincipalBuilder
> > >   2. Turn on forwarding path in production and perform end-to-end
> > >   testing
> > >
> > >
> > > I think this is the first time we need to introduce a KIP without
> having
> > it
> > > fully accepted in next release. Let me know if this sounds reasonable.
> > >
> > > On Fri, Apr 24, 2020 at 1:00 AM Sönke Liebau
> > >  wrote:
> > >
> > > > After thinking on this a little bit, maybe this would be an option:
> > > >
> > > > add default methods serialize and deserialize to the
> > > KafkaPrincipalBuilder
> > > > interface, these could be very short:
> > > >
> > > > default String serialize(KafkaPrincipal principal) {
> > > > return principal.toString();
> > > > }
> > > >
> > > > default KafkaPrincipal deserialize(String principalString) {
> > > > return SecurityUtils.parseKafkaPrincipal(principalString);
> > > > }
> > > >
> > > > This would mean that all existing implementations of that interface
> > > > are unaffected, as this code is pretty much what is currently being
> > > > used when their principals need to be serialized.
> > > >
> > > > But it offers people using custom principals the chance to override
> > > > these methods and ensure that all information gets serialized for
> > > > delegation tokens or request forwarding.
> > > >
> > > >
> > > > Wherever we need to de/serialize principals (for example in the
> > > > DelegationTokenManager [1]) we obtain an instance of the configured
> > > > PrincipalBuilder class and use that to do the actual work.
> > > >
> > > > What do you think?
> > > >
> > > >
> > > > Bes

[DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-04-28 Thread Sönke Liebau
All,

I've asked for comments on this KIP in the past, but since I didn't really
get any feedback I've decided to reduce the initial scope of the KIP a bit
and try again.

I have reworked to KIP to provide a limited, but useful set of features for
this initial KIP and laid out a very rough roadmap of what I'd envision
this looking like in a final version.

I am aware that the KIP is currently light on implementation details, but
would like to get some feedback on the general approach before fully
speccing everything.

The KIP can be found at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka


I would very much appreciate any feedback!

Best regards,
Sönke


Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-04-24 Thread Sönke Liebau
After thinking on this a little bit, maybe this would be an option:

add default methods serialize and deserialize to the KafkaPrincipalBuilder
interface, these could be very short:

default String serialize(KafkaPrincipal principal) {
return principal.toString();
}

default KafkaPrincipal deserialize(String principalString) {
return SecurityUtils.parseKafkaPrincipal(principalString);
}

This would mean that all existing implementations of that interface
are unaffected, as this code is pretty much what is currently being
used when their principals need to be serialized.

But it offers people using custom principals the chance to override
these methods and ensure that all information gets serialized for
delegation tokens or request forwarding.


Wherever we need to de/serialize principals (for example in the
DelegationTokenManager [1]) we obtain an instance of the configured
PrincipalBuilder class and use that to do the actual work.

What do you think?


Best regards,

Sönke


[1] 
https://github.com/apache/kafka/blob/33d06082117d971cdcddd4f01392006b543f3c01/core/src/main/scala/kafka/server/DelegationTokenManager.scala#L122


On Thu, 23 Apr 2020 at 17:42, Boyang Chen 
wrote:

> Thanks all,
>
> IIUC, the necessity of doing the audit log on the controller side is
> because we need to make sure the authorized resource modifications
> eventually arrive on the target broker side, but is that really necessary?
>
> I'm thinking the possibility of doing the audit log on the forwarding
> broker side, which could simplify the discussion of principal serialization
> here. The other option I could think of is to serialize the entire audit
> log message if we were supposed to approve, and pass it as part of the
> Envelope.
>
> Let me know if you think either of these approaches would work.
>
> On Thu, Apr 23, 2020 at 7:01 AM Sönke Liebau
>  wrote:
>
> > I agree that this would be useful to have and shouldn't create issues in
> > 99% of all cases. But it would be a breaking change to a public API.
> > I had a quick look at the two large projects that come to mind which
> might
> > be affected: Ranger and Sentry - both seem to operate directly with
> > KafkaPrincipal instead of subclassing it. But anybody who
> > extended KafkaPrincipal would probably need to update their code..
> >
> > Writing this sparked the thought that this issue should also concern
> > delegation tokens, as Principals need to be stored/sent around for those
> > too.
> > Had a brief look at the code and for Delegation Tokens we seem to use
> > SecurityUtils#parseKafkaPrincipal[1] which would ignore any additional
> > information from custom Principals.
> >
> > We'll probably want to at least add a note on that to the docs - unless
> it
> > is there already, I've only looked for about 30 seconds..
> >
> > Best regards,
> > Sönke
> >
> >
> > [1]
> >
> >
> https://github.com/apache/kafka/blob/e9fcfe4fb7b9ae2f537ce355fe2ab51a58034c64/clients/src/main/java/org/apache/kafka/common/utils/SecurityUtils.java#L52
> >
> > On Thu, 23 Apr 2020 at 14:35, Colin McCabe  wrote:
> >
> > > Hmm... Maybe we need to add some way to serialize and deserialize
> > > KafkaPrincipal subclasses to/from string.  We could add a method to
> > > KafkaPrincipalBuilder#deserialize and a method
> KafkaPrincipal#serialize,
> > I
> > > suppose.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Thu, Apr 23, 2020, at 02:14, Tom Bentley wrote:
> > > > Hi folks,
> > > >
> > > > Colin wrote:
> > > >
> > > > > The final broker knows it can trust the principal name in the
> > envelope
> > > > > (since EnvelopeRequest requires CLUSTERACTION on CLUSTER).  So it
> can
> > > use
> > > > > that principal name for authorization (given who you are, what can
> > you
> > > > > do?)  The forwarded principal name will also be used for logging.
> > > > >
> > > >
> > > > My understanding (and I'm happy to be corrected) is that a custom
> > > > authoriser might rely on the KafkaPrincipal instance being a subclass
> > of
> > > > KafkaPrincipal (e.g. the subclass has extra fields with the
> principal's
> > > > "roles"). So you can't construct a KafkaPrinicpal on the controller
> > which
> > > > would be guaranteed to work for arbitrary authorizers. You have to
> > > perform
> > > > authorization on the first broker (rejecting some of the batched
> > > requests),
> > > > forward the authorized ones

Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-04-23 Thread Sönke Liebau
I agree that this would be useful to have and shouldn't create issues in
99% of all cases. But it would be a breaking change to a public API.
I had a quick look at the two large projects that come to mind which might
be affected: Ranger and Sentry - both seem to operate directly with
KafkaPrincipal instead of subclassing it. But anybody who
extended KafkaPrincipal would probably need to update their code..

Writing this sparked the thought that this issue should also concern
delegation tokens, as Principals need to be stored/sent around for those
too.
Had a brief look at the code and for Delegation Tokens we seem to use
SecurityUtils#parseKafkaPrincipal[1] which would ignore any additional
information from custom Principals.

We'll probably want to at least add a note on that to the docs - unless it
is there already, I've only looked for about 30 seconds..

Best regards,
Sönke


[1]
https://github.com/apache/kafka/blob/e9fcfe4fb7b9ae2f537ce355fe2ab51a58034c64/clients/src/main/java/org/apache/kafka/common/utils/SecurityUtils.java#L52

On Thu, 23 Apr 2020 at 14:35, Colin McCabe  wrote:

> Hmm... Maybe we need to add some way to serialize and deserialize
> KafkaPrincipal subclasses to/from string.  We could add a method to
> KafkaPrincipalBuilder#deserialize and a method KafkaPrincipal#serialize, I
> suppose.
>
> best,
> Colin
>
>
> On Thu, Apr 23, 2020, at 02:14, Tom Bentley wrote:
> > Hi folks,
> >
> > Colin wrote:
> >
> > > The final broker knows it can trust the principal name in the envelope
> > > (since EnvelopeRequest requires CLUSTERACTION on CLUSTER).  So it can
> use
> > > that principal name for authorization (given who you are, what can you
> > > do?)  The forwarded principal name will also be used for logging.
> > >
> >
> > My understanding (and I'm happy to be corrected) is that a custom
> > authoriser might rely on the KafkaPrincipal instance being a subclass of
> > KafkaPrincipal (e.g. the subclass has extra fields with the principal's
> > "roles"). So you can't construct a KafkaPrinicpal on the controller which
> > would be guaranteed to work for arbitrary authorizers. You have to
> perform
> > authorization on the first broker (rejecting some of the batched
> requests),
> > forward the authorized ones to the controller, which then has to trust
> that
> > the authorization has been done and make the ZK writes without
> > authorization. Because the controller cannot invoke the authorizer that
> > means that the authorizer audit logging (on the controller) would not
> > include these operations. But they would be in the audit logging from the
> > original broker, so the information would not be lost.
> >
> > There's also a problem with using the constructed principal for other
> > logging (i.e. non authorizer logging) on the controller: There's nothing
> > stopping a custom KafkaPrincipal subclass from overriding toString() to
> > return something different from "${type}:${name}". If you're building a
> > "fake" KafkaPrincipal on the controller from the type and name then its
> > toString() would be "wrong". A solution to this would be to also
> serialize
> > the toString() into the envelope and have some ProxiedKafkaPrincipal
> class
> > which you instantiate on the controller which has overridden toString to
> > return the right value. Obviously you could optimize this using an
> optional
> > field so you only serialize the toString() if it differed from the value
> > you'd get from KafkaPrincipal.toString(). Maybe non-audit logging using
> the
> > wrong string value of a principal is sufficiently minor that this isn't
> > even worth trying to solve.
> >
> > Kind regards,
> >
> > Tom
> >
> >
> > On Wed, Apr 22, 2020 at 10:59 PM Sönke Liebau
> >  wrote:
> >
> > > Hi Colin,
> > >
> > > thanks for your summary! Just one question - and I may be missing an
> > > obvious point here..
> > > You write:
> > >
> > > "The initial broker should do authentication (who are you?) and come up
> > > with a principal name.  Then it creates an envelope request, which will
> > > contain that principal name, and sends it along with the unmodified
> > > original request to the final broker.   [... ] The final broker knows
> it
> > > can trust the principal name in the envelope (since EnvelopeRequest
> > > requires CLUSTERACTION on CLUSTER).  So it can use that principal name
> for
> > > authorization (given who you are, what can you do?) "
> > >
> > > My understanding is, that you d

Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-04-22 Thread Sönke Liebau
se to put the forwarded principal
> > > name.  I don't think overriding an existing field (like clientId) is a
> > good
> > > option for this.  It's messy, and loses information.  It also raises
> the
> > > question of how the final broker knows that the clientId in the
> received
> > > message is not "really" a clientId, but is a principal name.  Without
> an
> > > envelope, there's no way to clearly mark a request as forwarded, so
> > there's
> > > no reason for the final broker to treat this differently than a regular
> > > clientId (or whatever).
> > >
> > > We talked about using optional fields to contain the forwarded
> principal
> > > name, but this is also messy and potentially dangerous.  Older brokers
> > will
> > > simply ignore the optional fields, which could result in them executing
> > > operations as the wrong principal.  Of course, this would require a
> > > misconfiguration in order to happen, but it still seems better to set
> up
> > > the system so that this misconfiguration is detected, rather than
> > silently
> > > ignored.
> > >
> > > It's true that the need for forwarding is "temporary" in some sense,
> > since
> > > we only need it for older clients.  However, we will want to support
> > these
> > > older clients for many years to come.
> > >
> > > I agree that the usefulness of EnvelopeRequest is limited by it being a
> > > superuser-only request at the moment.  Perhaps there are some changes
> to
> > > how custom principals work that would allow us to get around this in
> the
> > > future.  We should think about that so that we have this functionality
> in
> > > the future if it's needed.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Wed, Apr 22, 2020, at 11:21, Guozhang Wang wrote:
> > > > Hello Gwen,
> > > >
> > > > The purpose here is for maintaining compatibility old clients, who do
> > not
> > > > have functionality to do re-routing admin requests themselves. New
> > > clients
> > > > can of course do this themselves by detecting who's the controller.
> > > >
> > > >
> > > > Hello Colin / Boyang,
> > > >
> > > > Regarding the usage of the envelope, I'm curious what are the
> potential
> > > > future use cases that would require request forwarding and hence
> > envelope
> > > > would be useful? Originally I thought that the forwarding mechanism
> is
> > > only
> > > > temporary as we need it for the bridge release, and moving forward we
> > > will
> > > > get rid of this to simplify the code base. If we do have valid use
> > cases
> > > in
> > > > the future which makes us believe that request forwarding would
> > actually
> > > be
> > > > a permanent feature retained on the broker side, I'm on board with
> > adding
> > > > the envelope request protocol.
> > > >
> > > >
> > > >
> > > > Guozhang
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Apr 22, 2020 at 8:22 AM Gwen Shapira 
> > wrote:
> > > >
> > > > > Hey Boyang,
> > > > >
> > > > > Sorry if this was already discussed, but I didn't see this as
> > rejected
> > > > > alternative:
> > > > >
> > > > > Until now, we always did client side routing - the client itself
> > found
> > > the
> > > > > controller via metadata and directed requests accordingly. Brokers
> > that
> > > > > were not the controller, rejected those requests.
> > > > >
> > > > > Why did we decide to move to broker side routing? Was the
> client-side
> > > > > option discussed and rejected somewhere and I missed it?
> > > > >
> > > > > Gwen
> > > > >
> > > > > On Fri, Apr 3, 2020, 4:45 PM Boyang Chen <
> reluctanthero...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hey all,
> > > > > >
> > > > > > I would like to start off the discussion for KIP-590, a follow-up
> > > > > > initiative after KIP-500:
> > > > > >
> > > > > >
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller
> > > > > >
> > > > > > This KIP proposes to migrate existing Zookeeper mutation paths,
> > > including
> > > > > > configuration, security and quota changes, to controller-only by
> > > always
> > > > > > routing these alterations to the controller.
> > > > > >
> > > > > > Let me know your thoughts!
> > > > > >
> > > > > > Best,
> > > > > > Boyang
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-04-21 Thread Sönke Liebau
Hi Boyang,

I think what Tom is referring to is that it is very hard to forward enough
information to the controller to put it into a position to properly
authenticate any request.

While the Default KafkaPrincipal can easily be serialized and sent to the
controller, as previously seen, those are just strings. For the Controller
to properly authenticate a request we'd need to forward the
AuthenticationContext (from which the Principal is built [1]) containing
the SSL/SASL details to the controller, in order for the controller to then
check certificates for validity etc.
And those checks will be very difficult, because what we are effectively
doing here is a man-in-the-middle attack (broadly speaking), as we are
forwarding a request "in the name of" someone else. And most authentication
methods have been built to prevent exactly that.
As soon as we have only the Principal we are trusting someone else to have
properly authenticated that principal, because we do not have all the
information to do that verification ourselves. And if we do that, then I
don't see why we should a

Regarding custom Principals, I don't think too many people do this in
practice, but in theory you can provide you own PrincipalBuilder and use
your own Principal objects that contain as much additional information as
you wish. And since these can basically be any Java object that makes them
very tough to serialize.
Currently these Principals don't need to be serialized, because they are
created and used within the same JVM, there is no need to forward them to a
different broker.
I wrote a blog post [2] about a scenario that uses a custom Principal a
little while ago, that shows a possible scenario, maybe that helps a little.

Feel free to correct me if I misinterpreted your meaning Tom :)

Best regards,
Sönke

[1] https://imgur.com/a/Gi0cFNH
[2]
https://www.opencore.com/de/blog/2018/3/group-based-authorization-in-kafka/

On Tue, 21 Apr 2020 at 20:33, Boyang Chen 
wrote:

> Hey Tom,
>
> I agree with the claim here. All the brokers should have the same
> authentication power, which means getting the forwarding broker verify the
> client request first is more favorable. This approach avoids sending one
> unnecessary forwarding request if it couldn't pass the authorization in the
> first place.
>
> In the meantime, could you give more context on the custom Kafka principal
> you are referring to? How does that get encoded today, and how server and
> client could both agree on the serialization? As the plain principal is
> only a String, I would like to know more about the security strategy people
> are using, thanks!
>
> Boyang
>
> On Tue, Apr 21, 2020 at 2:24 AM Tom Bentley  wrote:
>
> > Hi Boyang,
> >
> > The answer to my original question about the request principal was that
> the
> > forwarding broker would authorize the request and the controller would
> > trust the request since it was from another broker. AFAIU you added the
> > principal purely for logging purposes. In the "EnvelopeRequest Handling"
> > section the KIP now says "Once that part is done, we shall replace the
> > request context with Principal information embedded inside the
> > EnvelopeRequest to complete the inner request permission check.", which
> > sounds to me like the controller is now authorizing the request (maybe in
> > addition to the forwarding broker) using a principal it's deserialized
> from
> > the EnvelopeRequest. I don't think that works if a custom principal
> builder
> > is returning a subclass of KafkaPrincipal (the Javadoc for KafkaPrincipal
> > describes the contract I'm talking about). Basically the controller would
> > not be able to instantiate the subclass (even if that was included in the
> > envelope) because it wouldn't necessarily know the signature of the
> > constructor. Nor can it use the principal builder itself because it
> doesn't
> > have access to the original AuthenticationContext. Maybe you figure out
> > some way to make it work, otherwise I think the best you can do is to
> > revert to the model you had before where the controller trusts the
> embedded
> > request because it's been received from a broker.
> >
> > Cheers,
> >
> > Tom
> >
> > On Sat, Apr 18, 2020 at 8:56 PM Colin McCabe  wrote:
> >
> > > On Fri, Apr 17, 2020, at 13:11, Ismael Juma wrote:
> > > > Hi Colin,
> > > >
> > > > The read/modify/write is protected by the zk version, right?
> > > >
> > > > Ismael
> > >
> > > No, we don't use the ZK version when doing the write to the config
> > > znodes.  We do for ACLs, I think.
> > >
> > > This is something that we could fix just by using the ZK version, but
> > > there are other race conditions like what if we're deleting a topic
> while
> > > setting this config, etc.  A single writer is a lot easier to reason
> > about.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > >
> > > > On Fri, Apr 17, 2020 at 12:53 PM Colin McCabe 
> > > wrote:
> > > >
> > > > > On Thu, Apr 16, 2020, at 08:51, Ismael Juma wrote:
> > > > > > I 

Re: [ANNOUNCE] Apache Kafka 2.5.0

2020-04-15 Thread Sönke Liebau
Thanks David!!


On Wed, 15 Apr 2020 at 23:07, Bill Bejeck  wrote:

> David,
>
> Thanks for running the release!
>
> -Bill
>
> On Wed, Apr 15, 2020 at 4:45 PM Matthias J. Sax  wrote:
>
> > Thanks for running the release David!
> >
> > -Matthias
> >
> > On 4/15/20 1:15 PM, David Arthur wrote:
> > > The Apache Kafka community is pleased to announce the release for
> Apache
> > > Kafka 2.5.0
> > >
> > > This release includes many new features, including:
> > >
> > > * TLS 1.3 support (1.2 is now the default)
> > > * Co-groups for Kafka Streams
> > > * Incremental rebalance for Kafka Consumer
> > > * New metrics for better operational insight
> > > * Upgrade Zookeeper to 3.5.7
> > > * Deprecate support for Scala 2.11
> > >
> > > All of the changes in this release can be found in the release notes:
> > > https://www.apache.org/dist/kafka/2.5.0/RELEASE_NOTES.html
> > >
> > >
> > > You can download the source and binary release (Scala 2.12 and 2.13)
> > from:
> > > https://kafka.apache.org/downloads#2.5.0
> > >
> > >
> >
> ---
> > >
> > >
> > > Apache Kafka is a distributed streaming platform with four core APIs:
> > >
> > >
> > > ** The Producer API allows an application to publish a stream records
> to
> > > one or more Kafka topics.
> > >
> > > ** The Consumer API allows an application to subscribe to one or more
> > > topics and process the stream of records produced to them.
> > >
> > > ** The Streams API allows an application to act as a stream processor,
> > > consuming an input stream from one or more topics and producing an
> > > output stream to one or more output topics, effectively transforming
> the
> > > input streams to output streams.
> > >
> > > ** The Connector API allows building and running reusable producers or
> > > consumers that connect Kafka topics to existing applications or data
> > > systems. For example, a connector to a relational database might
> > > capture every change to a table.
> > >
> > >
> > > With these APIs, Kafka can be used for two broad classes of
> application:
> > >
> > > ** Building real-time streaming data pipelines that reliably get data
> > > between systems or applications.
> > >
> > > ** Building real-time streaming applications that transform or react
> > > to the streams of data.
> > >
> > >
> > > Apache Kafka is in use at large and small companies worldwide,
> including
> > > Capital One, Goldman Sachs, ING, LinkedIn, Netflix, Pinterest,
> Rabobank,
> > > Target, The New York Times, Uber, Yelp, and Zalando, among others.
> > >
> > > A big thank you for the following 108 contributors to this release!
> > >
> > > A. Sophie Blee-Goldman, Adam Bellemare, Alaa Zbair, Alex Kokachev, Alex
> > > Leung, Alex Mironov, Alice, Andrew Olson, Andy Coates, Anna Povzner,
> > Antony
> > > Stubbs, Arvind Thirunarayanan, belugabehr, bill, Bill Bejeck, Bob
> > Barrett,
> > > Boyang Chen, Brian Bushree, Brian Byrne, Bruno Cadonna, Bryan Ji,
> > Chia-Ping
> > > Tsai, Chris Egerton, Chris Pettitt, Chris Stromberger, Colin P. Mccabe,
> > > Colin Patrick McCabe, commandini, Cyrus Vafadari, Dae-Ho Kim, David
> > Arthur,
> > > David Jacot, David Kim, David Mao, dengziming, Dhruvil Shah, Edoardo
> > Comar,
> > > Eduardo Pinto, Fábio Silva, gkomissarov, Grant Henke, Greg Harris,
> Gunnar
> > > Morling, Guozhang Wang, Harsha Laxman, high.lee, highluck, Hossein
> > Torabi,
> > > huxi, huxihx, Ismael Juma, Ivan Yurchenko, Jason Gustafson, jiameixie,
> > John
> > > Roesler, José Armando García Sancio, Jukka Karvanen, Karan Kumar, Kevin
> > Lu,
> > > Konstantine Karantasis, Lee Dongjin, Lev Zemlyanov, Levani Kokhreidze,
> > > Lucas Bradstreet, Manikumar Reddy, Mathias Kub, Matthew Wong, Matthias
> J.
> > > Sax, Michael Gyarmathy, Michael Viamari, Mickael Maison, Mitch,
> > > mmanna-sapfgl, NanerLee, Narek Karapetian, Navinder Pal Singh Brar,
> > > nicolasguyomar, Nigel Liang, NIkhil Bhatia, Nikolay, ning2008wisc,
> Omkar
> > > Mestry, Rajini Sivaram, Randall Hauch, ravowlga123, Raymond Ng, Ron
> > > Dagostino, sainath batthala, Sanjana Kaundinya, Scott, Seungha Lee,
> Simon
> > > Clark, Stanislav Kozlovski, Svend Vanderveken, Sönke Liebau, Ted Yu,
> Tom
> > > Bentley, Tomislav, Tu Tran, Tu V. Tran, uttpal, Vikas Singh, Viktor
> > > Somogyi, vinoth chandar, wcarlson5, Will James, Xin Wang, zzccctv
> > >
> > > We welcome your help and feedback. For more information on how to
> > > report problems, and to get involved, visit the project website at
> > > https://kafka.apache.org/
> > >
> > > Thank you!
> > >
> > >
> > > Regards,
> > > David Arthur
> > >
> >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Preferred Partition Leaders

2020-04-14 Thread Sönke Liebau
Hi Lukasz,

that sounds like a worthwhile addition to me! Just a few thoughts that
occurred to me while reading your mail
"Preferred Leader" as a term is somewhat taken in the Kafka namespace. It
refers to the concept that the first leader from the list of replicas is
the preferred leader and will be chosen as the leader for that partition if
possible. Quite similar to what you are proposing - but without the logic
behind it, as currently that preferred leader is chosen mostly at random.

If your need for this is urgent, you could fairly easily use the
AdminClient to generate a new partition assignment that takes into account
the rack ids of the brokers already, and Kafka would try to honor that
preferred leader for your partitions. But this would only work
retrospectively, not for new topics, for those a random distribution would
again be chosen.

Regarding your idea, I agree that replica and leader assignment is a topic
that is in need of some love. However, by "just" adding the rack id to this
logic we run the risk of making potential future work for things like load
or size based replica assignment to brokers / disks harder. I'm not saying
we need to do it now, but I think we should consider what it might look
like to ensure that your solution can lay to groundwork for later work to
build on.

Best regards,
Sönke




On Tue, 14 Apr 2020 at 23:32, Michael K. Edwards 
wrote:

> #nailedit
>
> On Tue, Apr 14, 2020 at 2:05 PM Jamie  wrote:
>
> > Hi All,
> >
> > There is a KIP which might be of interest to you:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120736982
> -
> > it sounds like you want to blacklist brokers in DC3?
> >
> > Thanks,
> >
> > Jamie
> >
> > -Original Message-
> > From: Michael K. Edwards 
> > To: dev@kafka.apache.org
> > Sent: Tue, 14 Apr 2020 20:50
> > Subject: Re: Preferred Partition Leaders
> >
> > I have clients with a similar need relating to disaster recovery.  (Three
> > replicas per partition within a data center / AWS AZ/region, fourth
> replica
> > elsewhere, ineligible to become the partition leader without manual
> > intervention.)
> >
> > On Tue, Apr 14, 2020 at 12:31 PM Łukasz Antoniak <
> > lukasz.anton...@gmail.com>
> > wrote:
> >
> > > Hi Everyone,
> > >
> > > Recently I came across Kafka setup where two data centers are close to
> > each
> > > other, but the company could not find a suitable place for the third
> one.
> > > As a result third DC is little further, lower network throughput, but
> > still
> > > within range of decent network latency, qualifying for stretch cluster.
> > Let
> > > us assume that client applications are being deployed only on two
> > "primary"
> > > DCs. My idea was to minimize network traffic between DC3 and other data
> > > centers (ideally only to replication).
> > >
> > > For Kafka consumer, we can configure rack-awareness, so that consumers
> > will
> > > read data from closest replica (replica.selector.class).
> > > Kafka producers have to send data to partition leaders. There is no way
> > to
> > > tell that we prefer replica leaders to be running in DC1 and DC2. Kafka
> > > will also try to evenly balance leaders across brokers
> > > (auto.leader.rebalance.enable).
> > >
> > > Does it sound like a good feature to make the choice of partition
> leaders
> > > pluggable? Basically, users would be given list of topic-partitions
> with
> > > ISRs and rack they are running, and could reshuffle them according to
> > > custom logic.
> > >
> > > Comments appreciated.
> > >
> > > Kind regards,
> > > Lukasz
> > >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Apache Feathercast

2020-04-14 Thread Sönke Liebau
Thanks Jun!
I'll reach out to Rich about a date.


On Tue, 14 Apr 2020 at 19:11, Jun Rao  wrote:

> Hi, Sonke,
>
> That sounds like a good opportunity for promoting Kafka. Thanks for
> volunteering.
>
> Jun
>
> On Tue, Apr 14, 2020 at 8:08 AM Sönke Liebau
>  wrote:
>
> > Hi all,
> >
> > Rich is trying to reboot the Feathercast series [1] and looking for
> > projects that would be willing to give a quick intro about what they do.
> >
> > I'd be happy to do this for Kafka if no one has any objections.
> >
> > I am sure there are many people eminently more suited to this task than
> me
> > though, so happy to let someone else do the talking as well :)
> >
> > Best regards,
> > Sönke
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/re247e239da04256beaafc325aa7d411b2a8ad3884d90593b1a5bc139%40%3Cdev.community.apache.org%3E
> >
>


Apache Feathercast

2020-04-14 Thread Sönke Liebau
Hi all,

Rich is trying to reboot the Feathercast series [1] and looking for
projects that would be willing to give a quick intro about what they do.

I'd be happy to do this for Kafka if no one has any objections.

I am sure there are many people eminently more suited to this task than me
though, so happy to let someone else do the talking as well :)

Best regards,
Sönke

[1]
https://lists.apache.org/thread.html/re247e239da04256beaafc325aa7d411b2a8ad3884d90593b1a5bc139%40%3Cdev.community.apache.org%3E


Jira Cleanup - Kafka on Windows

2020-04-08 Thread Sönke Liebau
All,

I stumbled across a recent issue about Kafka crashing on Windows the other
day, which keeps coming up on jira.
So I went and had a look and found 16 unresolved tickets about this - some
of them definitely are duplicates, some of them might be a variation.

KAKFA-1194 - The kafka broker cannot delete the old log files after the
configured time
KAKFA-2170 - 10 LogTest cases failed for file.renameTo failed under windows
KAKFA-5377 - Kafka server process crashing due to access violation (caused
by log cleaner)
KAKFA-6059 - Kafka cant delete old log files on windows
KAKFA-6200 - 0015.timeindex: The process cannot access the
file because it is being used by another process.
KAKFA-6203 - Kafka crash when deleting a topic
KAKFA-6406 - Topic deletion fails and kafka shuts down (on windows only)
KAKFA-6689 - Kafka not release .deleted file.
KAKFA-6983 - Error while deleting segments - The process cannot access the
file because it is being used by another process
KAKFA-7020 - Error when deleting topic with access denied exception
KAKFA-7086 - Kafka server process dies after try deleting old log files
under Windows 10
KAKFA-7575 - Error while writing to checkpoint file' Issue
KAKFA-8097 - Kafka broker crashes with java.nio.file.FileSystemException
Exception
KAKFA-8145 - Broker fails with FATAL Shutdown on Windows, log or index
renaming fail
KAKFA-8811 - Can not delete topics in Windows OS
KAKFA-9458 - Kafka crashed in windows environment

KAKFA-1194 to me seems to have seen the most activity and been there first,
so I'd propose to track this all work related to file deletion and renames
on Windows under this ticket. Objections?

Also, is anybody aware of any activity around this still going on? Reading
through the comments for a little while it looked like a fix was close, but
the most current state on this I could find was a comment from Colin [1]
saying that the proposed fix fell short of a proper solution and that we'd
need to have a Windows specific implementation of the Log class. But I was
unable to find anybody picking up this piece of work..

Best,
Sönke


[1] https://github.com/apache/kafka/pull/6329#issuecomment-505209147


Re: Shutdown Broker Because Log Dirs Failed

2020-04-07 Thread Sönke Liebau
Hi Qarl,

ideally you'd run Kafka on a linux host. If all you have is a Windows
machine then Docker might be an option, Confluent has images available that
you could use. Alternatively some form of virtual machine running linux
would work as well.

Best regards,
Sönke

Qarl  schrieb am Di., 7. Apr. 2020, 16:02:

> Thank you so much for the response!
>
> If this is unresolved, could you suggest any alternative for me to make
> this work? If that is possible.
>
> Any other solution is highly appreciated! Thank you.
>
> From: Sönke Liebau
> Sent: Tuesday, 7 April, 2020 6:56 PM
> To: dev
> Subject: Re: Shutdown Broker Because Log Dirs Failed
>
> Hi Qarl,
>
> there probably (and sadly) is no good answer to your question. Windows is
> not an officially supported platform to run Kafka on and there is a large
> number of unresolved tickets in Jira on this topic.
> Most notable is probably KAFKA-1194 [1].
>
> I haven't closely followed this, I thought we were getting close to a
> solution a little while ago, but the ticket is still unresolved .. maybe
> someone else knows more on the current state than is documented in the
> ticket?
>
> Best regards,
> Sönke
>
> [1] https://issues.apache.org/jira/browse/KAFKA-1194
>
> On Tue, 7 Apr 2020 at 11:00, Qarl  wrote:
>
> > Hi, my name is Qarl. I am a newbie in software development, and I am
> using
> > Kafka in this software. I have been facing this issue for weeks now. I’m
> > running Kafka on Windows platform. The issue is when I run the Kafka
> > server, and the retention period has reached, it will clear the logs file
> > but there is always problem, such as ‘The process cannot access the file
> > because it is being used by another process.’  How do I solve this? I’m
> > using Kafka in a way where if the Kafka server stops running, our
> software
> > will face a serious issue.
> >
> >
> > Sent from Mail for Windows 10
> >
>
>


Re: Shutdown Broker Because Log Dirs Failed

2020-04-07 Thread Sönke Liebau
Hi Qarl,

there probably (and sadly) is no good answer to your question. Windows is
not an officially supported platform to run Kafka on and there is a large
number of unresolved tickets in Jira on this topic.
Most notable is probably KAFKA-1194 [1].

I haven't closely followed this, I thought we were getting close to a
solution a little while ago, but the ticket is still unresolved .. maybe
someone else knows more on the current state than is documented in the
ticket?

Best regards,
Sönke

[1] https://issues.apache.org/jira/browse/KAFKA-1194

On Tue, 7 Apr 2020 at 11:00, Qarl  wrote:

> Hi, my name is Qarl. I am a newbie in software development, and I am using
> Kafka in this software. I have been facing this issue for weeks now. I’m
> running Kafka on Windows platform. The issue is when I run the Kafka
> server, and the retention period has reached, it will clear the logs file
> but there is always problem, such as ‘The process cannot access the file
> because it is being used by another process.’  How do I solve this? I’m
> using Kafka in a way where if the Kafka server stops running, our software
> will face a serious issue.
>
>
> Sent from Mail for Windows 10
>


Re: [DISCUSS] KIP-590: Redirect Zookeeper Mutation Protocols to The Controller

2020-04-06 Thread Sönke Liebau
Hi Boyang,

thanks for the KIP. Sounds good overall.

@Tom: I thought about your remark a little and think that in principle we
can get away without forwarding the principal at all. Brokers currently
authenticate and authorize requests before performing writes to Zookeeper -
as long as we don't change that it shouldn't matter, whether the write goes
to ZK or the controller, as long as that request is properly authenticated.
So the broker would simply authorize and authenticate the original request
and then forward it to the controller using its own credentials. And the
controller could simply trust that this is a bona-fide request, because it
came from a trusted peer.

I can see two issues here, one is a bit academic I think..

1. The controller would be unable to write a proper audit log, because it
cannot know who sent the original request.

2. In theory, clusters could use Plaintext Listeners for inter broker
traffic because that is on a separate, secure network or similar reasons.
In that case, the forwarded request would be unauthenticated - then again,
so are all other requests between brokers, so nothing lost really.

Overall though, I think that sending the principal along with the request
shouldn't be a large issue though, it is just two Strings and a boolean.
And the controller could bypass the PrincipalBuilder and just pass the
Principal that was built and sent by the remote broker straight to the
Authorizer. Since PrincipalBuilders are the same on all brokers it
shouldn't matter who does the processing I think.

Best regards,
Sönke


On Mon, 6 Apr 2020 at 22:30, Boyang Chen  wrote:

> Thanks Tom for the question! I'm not super familiar with the Principal
> stuff, could you elaborate more on the two points you proposed here?
>
> I looked up Admin client and just take `createDelegationToken` API for an
> example, the request data encodes the principal information already, so
> broker should also leverage that information to proxy the request IMHO.
>
> Boyang
>
> On Mon, Apr 6, 2020 at 9:21 AM Tom Bentley  wrote:
>
> > Hi Boyang,
> >
> > Thanks for the KIP!
> >
> > When a broker proxies a request to the controller how does the
> > authenticated principal get propagated? I think a couple of things might
> > complicate this:
> >
> > 1. A PrincipalBuilder might be in use,
> > 2. A Principal does not have to be serializable.
> >
> >
> > Kind regards,
> >
> > Tom
> >
> > On Sat, Apr 4, 2020 at 12:52 AM Boyang Chen 
> > wrote:
> >
> > > Hey all,
> > >
> > > I would like to start off the discussion for KIP-590, a follow-up
> > > initiative after KIP-500:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-590%3A+Redirect+Zookeeper+Mutation+Protocols+to+The+Controller
> > >
> > > This KIP proposes to migrate existing Zookeeper mutation paths,
> including
> > > configuration, security and quota changes, to controller-only by always
> > > routing these alterations to the controller.
> > >
> > > Let me know your thoughts!
> > >
> > > Best,
> > > Boyang
> > >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-585: Conditional SMT

2020-04-02 Thread Sönke Liebau
ies/Map > > String> and what you've suggested has duplicate keys.
> > >
> > > One thing I did briefly consider what the ability to treat conditions
> as
> > > Configurable objects in their own right (opening up the possibility of
> > > people supplying their own Conditions, just like they can supply their
> > own
> > > SMTs). That might be configured something like this:
> > >
> > > transforms.conditionalExtract.condition: not
> > > transforms.conditionalExtract.condition.not.type: Not
> > > transforms.conditionalExtract.condition.not.negated: foo
> > > transforms.conditionalExtract.condition.foo.type:
> HasHeaderWithValue
> > > transforms.conditionalExtract.condition.foo.header: my-header
> > > transforms.conditionalExtract.condition.foo.value: my-value
> > >
> > > I didn't propose that I couldn't see the use cases to justify this kind
> > of
> > > complexity, especially as the common case would surely be matching
> > against
> > > topic name (to be honest I wasn't completely convinced by the need for
> > > "has-header"). In the current version of the KIP that's just
> > >
> > > transforms.conditionalExtract.condition: topic-matches:
> my-prefix-.*
> > >
> > > but using the more flexible scheme that would require something more
> like
> > > this:
> > >
> > > transforms.conditionalExtract.condition: bar
> > > transforms.conditionalExtract.condition.bar.type: TopicMatch
> > > transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
> > >
> > > If people know of use cases which would justify more sophistication,
> I'm
> > > happy to reconsider.
> > >
> > > Thanks again for taking a look!
> > >
> > > Tom
> > >
> > > On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
> > >  wrote:
> > >
> > > > Hi Tom,
> > > >
> > > > sounds useful to me, thanks for the KIP!
> > > > The only thought that I had while reading was that this will probably
> > > raise
> > > > questions about more involved conditions fairly quickly. For example
> > the
> > > > "has-header" will cause an appetite for conditions like
> > > > "this-header-has-that-value".
> > > > This would necessitate two parameters to be passed into the
> condition,
> > > > which I think is not currently included in the KIP. I am not saying
> add
> > > > this now, but might it make sense to discuss a concept of how that
> > might
> > > > look now, to avoid potential changes later on.
> > > >
> > > > Just of the top of my head it might look like:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > > header-name:my-header
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > field-value:my-value
> > > >
> > > >
> > > > Also, while writing that, I noticed that you have a version with and
> > > > without "name" for your transformation in the KIP:
> > > >
> > > > transforms.conditionalExtract.condition.hasMyHeader:
> > has-header:my-header
> > > > and
> > > > transforms.conditionalExtract.condition: has-header:my-header
> > > >
> > > >
> > > > Is this intentional and the name is optional?
> > > >
> > > >
> > > > Best regards,
> > > >
> > > > Sönke
> > > >
> > > >
> > > >
> > > > On Wed, 1 Apr 2020 at 11:12, Tom Bentley 
> wrote:
> > > > >
> > > > > Does anyone have any comments, feedback or thoughts about this?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Tom
> > > > >
> > > > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley 
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for
> any
> > > > > > feedback:
> > > > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > > > >
> > > > > > Many thanks,
> > > > > >
> > > > > > Tom
> > > > > >
> > > >
> > >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-585: Conditional SMT

2020-04-02 Thread Sönke Liebau
Extract.condition.foo.value: my-value
>
> I didn't propose that I couldn't see the use cases to justify this kind of
> complexity, especially as the common case would surely be matching against
> topic name (to be honest I wasn't completely convinced by the need for
> "has-header"). In the current version of the KIP that's just
>
> transforms.conditionalExtract.condition: topic-matches: my-prefix-.*
>
> but using the more flexible scheme that would require something more like
> this:
>
> transforms.conditionalExtract.condition: bar
> transforms.conditionalExtract.condition.bar.type: TopicMatch
> transforms.conditionalExtract.condition.bar.pattern: my-prefix-.*
>
> If people know of use cases which would justify more sophistication, I'm
> happy to reconsider.
>
> Thanks again for taking a look!
>
> Tom
>
> On Wed, Apr 1, 2020 at 2:05 PM Sönke Liebau
>  wrote:
>
> > Hi Tom,
> >
> > sounds useful to me, thanks for the KIP!
> > The only thought that I had while reading was that this will probably
> raise
> > questions about more involved conditions fairly quickly. For example the
> > "has-header" will cause an appetite for conditions like
> > "this-header-has-that-value".
> > This would necessitate two parameters to be passed into the condition,
> > which I think is not currently included in the KIP. I am not saying add
> > this now, but might it make sense to discuss a concept of how that might
> > look now, to avoid potential changes later on.
> >
> > Just of the top of my head it might look like:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: type:has-header
> > transforms.conditionalExtract.condition.hasMyHeader:
> header-name:my-header
> > transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value
> >
> >
> > Also, while writing that, I noticed that you have a version with and
> > without "name" for your transformation in the KIP:
> >
> > transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
> > and
> > transforms.conditionalExtract.condition: has-header:my-header
> >
> >
> > Is this intentional and the name is optional?
> >
> >
> > Best regards,
> >
> > Sönke
> >
> >
> >
> > On Wed, 1 Apr 2020 at 11:12, Tom Bentley  wrote:
> > >
> > > Does anyone have any comments, feedback or thoughts about this?
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> > > On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I've opened KIP-585 which is intended to provide a mechanism to
> > > > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > > > feedback:
> > > >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> > > >
> > > > Many thanks,
> > > >
> > > > Tom
> > > >
> >
>


Re: [DISCUSS] KIP-585: Conditional SMT

2020-04-01 Thread Sönke Liebau
Hi Tom,

sounds useful to me, thanks for the KIP!
The only thought that I had while reading was that this will probably raise
questions about more involved conditions fairly quickly. For example the
"has-header" will cause an appetite for conditions like
"this-header-has-that-value".
This would necessitate two parameters to be passed into the condition,
which I think is not currently included in the KIP. I am not saying add
this now, but might it make sense to discuss a concept of how that might
look now, to avoid potential changes later on.

Just of the top of my head it might look like:

transforms.conditionalExtract.condition.hasMyHeader: type:has-header
transforms.conditionalExtract.condition.hasMyHeader: header-name:my-header
transforms.conditionalExtract.condition.hasMyHeader: field-value:my-value


Also, while writing that, I noticed that you have a version with and
without "name" for your transformation in the KIP:

transforms.conditionalExtract.condition.hasMyHeader: has-header:my-header
and
transforms.conditionalExtract.condition: has-header:my-header


Is this intentional and the name is optional?


Best regards,

Sönke



On Wed, 1 Apr 2020 at 11:12, Tom Bentley  wrote:
>
> Does anyone have any comments, feedback or thoughts about this?
>
> Thanks,
>
> Tom
>
> On Tue, Mar 24, 2020 at 11:56 AM Tom Bentley  wrote:
>
> > Hi,
> >
> > I've opened KIP-585 which is intended to provide a mechanism to
> > conditionally apply SMTs in Kafka Connect. I'd be grateful for any
> > feedback:
> >
https://cwiki.apache.org/confluence/display/KAFKA/KIP-585%3A+Conditional+SMT
> >
> > Many thanks,
> >
> > Tom
> >


Jira Cleanup

2020-04-01 Thread Sönke Liebau
Hi all,

I've spent some time with jira and identified a few tickets that might be
worth looking at again / closing if there is no appetite for them.

KAFKA-1906  & KAFKA-3925
 - Both these ticket
refer to the default for log.dirs and recommend moving this away from /tmp/
- I've not found many other references to this topic, is there any interest
in discussing this? Otherwise I think we can safely close both tickets.

KAFKA-2333  - Add rename
topic support
KAFKA-976  - Order-Preserving
Mirror Maker Testcase
KAFKA-2493  - Add ability
to fetch only keys in consumer


As usual, I've left comments on the tickets and will close them if there
are no objections in the next few days.

Best,
Sönke


Re: Jira Cleanup

2020-03-19 Thread Sönke Liebau
FYI: just closed those five tickets

On Wed, 11 Mar 2020 at 22:09, Sönke Liebau 
wrote:

> All,
>
> I left a few comments on some old but still open jiras in an attempt to
> clean up a little bit.
>
> Since probably no one would notice these comments I thought I'd quickly
> list them here to give people a chance to check on them:
>
> KAFKA-1265 <https://issues.apache.org/jira/browse/KAFKA-1265> - SBT and
> Gradle create jars without expected Maven files
> KAFKA-1231 <https://issues.apache.org/jira/browse/KAFKA-1231> - Support
> partition shrink (delete partition)
> KAFKA-1336 <https://issues.apache.org/jira/browse/KAFKA-1336> - Create
> partition compaction analyzer
> KAFKA-1440 <https://issues.apache.org/jira/browse/KAFKA-1440> -
> Per-request tracing
> KAFKA-1518 <https://issues.apache.org/jira/browse/KAFKA-1518> -
> KafkaMetricsReporter prevents Kafka from starting if the custom reporter
> throws an exception
>
>
>  I'll wait a few days for objections and then close these issues if none
> are forthcoming.
>
> Best regards,
> Sönke
>
>

-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Jira Cleanup

2020-03-11 Thread Sönke Liebau
All,

I left a few comments on some old but still open jiras in an attempt to
clean up a little bit.

Since probably no one would notice these comments I thought I'd quickly
list them here to give people a chance to check on them:

KAFKA-1265  - SBT and
Gradle create jars without expected Maven files
KAFKA-1231  - Support
partition shrink (delete partition)
KAFKA-1336  - Create
partition compaction analyzer
KAFKA-1440  - Per-request
tracing
KAFKA-1518  -
KafkaMetricsReporter prevents Kafka from starting if the custom reporter
throws an exception


 I'll wait a few days for objections and then close these issues if none
are forthcoming.

Best regards,
Sönke


Re: KAFKA-9308: Request for Review of documentation

2020-02-24 Thread Sönke Liebau
Hi everybody,

as there has not been any response to this request I figured maybe
reviewing html changes in git changelogs is not that comfortable after all.
So I uploaded a version of the page here:
https://kafkadocs.liebau.biz/documentation/#security_ssl to make it easier
to read.

Again, any feedback would be appreciated!

Best regards,
Sönke



On Tue, 11 Feb 2020 at 11:48, Sönke Liebau 
wrote:

> Hi everybody,
>
> I've reworked the SSL part of the documentation a little in order to fix
> (among other things) KAFKA-9308[1] and would love some feedback if someone
> can spare a few minutes.
>
> Pull request: https://github.com/apache/kafka/pull/8009
>
> [1] https://issues.apache.org/jira/browse/KAFKA-9308
>
> Best regards,
> Sönke
>
>
>

-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


KAFKA-9308: Request for Review of documentation

2020-02-11 Thread Sönke Liebau
Hi everybody,

I've reworked the SSL part of the documentation a little in order to fix
(among other things) KAFKA-9308[1] and would love some feedback if someone
can spare a few minutes.

Pull request: https://github.com/apache/kafka/pull/8009

[1] https://issues.apache.org/jira/browse/KAFKA-9308

Best regards,
Sönke


Re: [ANNOUNCE] New Kafka PMC Members: Colin, Vahid and Manikumar

2020-01-16 Thread Sönke Liebau
Great to hear and well deserved! Congratulations and thanks to all of you!

On Thu, 16 Jan 2020 at 07:44, Vahid Hashemian 
wrote:

> Thank you all.
>
> Regards,
> --Vahid
>
> On Wed, Jan 15, 2020 at 9:15 AM Colin McCabe  wrote:
>
> > Thanks, everyone!
> >
> > best,
> > Colin
> >
> > On Wed, Jan 15, 2020, at 07:50, Sean Glover wrote:
> > > Congratulations Colin, Vahid and Manikumar and thank you for all your
> > > excellent work on Apache Kafka!
> > >
> > > On Wed, Jan 15, 2020 at 8:42 AM Ron Dagostino 
> wrote:
> > >
> > > > Congratulations!
> > > >
> > > > > On Jan 15, 2020, at 5:04 AM, Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com> wrote:
> > > > >
> > > > > Congrats to you guys, it's a great accomplishment! :)
> > > > >
> > > > >> On Wed, Jan 15, 2020 at 10:20 AM David Jacot  >
> > > > wrote:
> > > > >>
> > > > >> Congrats!
> > > > >>
> > > > >>> On Wed, Jan 15, 2020 at 12:00 AM James Cheng <
> wushuja...@gmail.com
> > >
> > > > wrote:
> > > > >>>
> > > > >>> Congrats Colin, Vahid, and Manikumar!
> > > > >>>
> > > > >>> -James
> > > > >>>
> > > > >>>> On Jan 14, 2020, at 10:59 AM, Tom Bentley 
> > > > wrote:
> > > > >>>>
> > > > >>>> Congratulations!
> > > > >>>>
> > > > >>>> On Tue, Jan 14, 2020 at 6:57 PM Rajini Sivaram <
> > > > >> rajinisiva...@gmail.com>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Congratulations Colin, Vahid and Manikumar!
> > > > >>>>>
> > > > >>>>> Regards,
> > > > >>>>> Rajini
> > > > >>>>>
> > > > >>>>> On Tue, Jan 14, 2020 at 6:32 PM Mickael Maison <
> > > > >>> mickael.mai...@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>>> Congrats Colin, Vahid and Manikumar!
> > > > >>>>>>
> > > > >>>>>> On Tue, Jan 14, 2020 at 5:43 PM Ismael Juma <
> ism...@juma.me.uk>
> > > > >> wrote:
> > > > >>>>>>>
> > > > >>>>>>> Congratulations Colin, Vahid and Manikumar!
> > > > >>>>>>>
> > > > >>>>>>> Ismael
> > > > >>>>>>>
> > > > >>>>>>> On Tue, Jan 14, 2020 at 9:30 AM Gwen Shapira <
> > g...@confluent.io>
> > > > >>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi everyone,
> > > > >>>>>>>>
> > > > >>>>>>>> I'm happy to announce that Colin McCabe, Vahid Hashemian and
> > > > >>>>> Manikumar
> > > > >>>>>>>> Reddy are now members of Apache Kafka PMC.
> > > > >>>>>>>>
> > > > >>>>>>>> Colin and Manikumar became committers on Sept 2018 and Vahid
> > on
> > > > Jan
> > > > >>>>>>>> 2019. They all contributed many patches, code reviews and
> > > > >>>>> participated
> > > > >>>>>>>> in many KIP discussions. We appreciate their contributions
> > and are
> > > > >>>>>>>> looking forward to many more to come.
> > > > >>>>>>>>
> > > > >>>>>>>> Congrats Colin, Vahid and Manikumar!
> > > > >>>>>>>>
> > > > >>>>>>>> Gwen, on behalf of Apache Kafka PMC
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
>
>
> --
>
> Thanks!
> --Vahid
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [VOTE] KIP-551: Expose disk read and write metrics

2020-01-14 Thread Sönke Liebau
+1 (non-binding)

Thanks for creating this!

On Tue, 14 Jan 2020 at 17:36, Mitchell  wrote:

> +1 (non-binding)!
> Very useful kip.
> -mitch
>
> On Tue, Jan 14, 2020 at 10:26 AM Manikumar 
> wrote:
> >
> > +1 (binding).
> >
> > Thanks for the KIP.
> >
> >
> >
> > On Sun, Jan 12, 2020 at 1:23 AM Lucas Bradstreet 
> wrote:
> >
> > > +1 (non binding)
> > >
> > > On Sat, 11 Jan 2020 at 02:32, M. Manna  wrote:
> > >
> > > > Hey Colin,
> > > >
> > > > +1 - Really useful for folks managing a cluster by themselves.
> > > >
> > > > M. MAnna
> > > >
> > > > On Fri, 10 Jan 2020 at 22:35, Jose Garcia Sancio <
> jsan...@confluent.io>
> > > > wrote:
> > > >
> > > > > +1, LGTM.
> > > > >
> > > > > On Fri, Jan 10, 2020 at 2:19 PM Gwen Shapira 
> > > wrote:
> > > > >
> > > > > > +1, thanks for driving this
> > > > > >
> > > > > > On Fri, Jan 10, 2020 at 2:17 PM Colin McCabe  >
> > > > wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'd like to start the vote on KIP-551: Expose disk read and
> write
> > > > > > metrics.
> > > > > > >
> > > > > > > KIP:  https://cwiki.apache.org/confluence/x/sotSC
> > > > > > >
> > > > > > > Discussion thread:
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/cfaac4426455406abe890464a7f4ae23a5c69a39afde66fe6eb3d696%40%3Cdev.kafka.apache.org%3E
> > > > > > >
> > > > > > > cheers,
> > > > > > > Colin
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -Jose
> > > > >
> > > >
> > >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [VOTE] KIP-409: Allow creating under-replicated topics and partitions

2019-12-18 Thread Sönke Liebau
Hi Mickael,

thanks for your response! That all makes perfect sense and I cannot
give any actual use cases for where what I asked about would be useful
:)
It was more the idle thought if this might be low hanging fruit while
changing this anyway to avoid having to circle back later on and
wanted to at least mention it.

I am totally happy either way!

Best regards,
Sönke

On Wed, 18 Dec 2019 at 11:20, Mickael Maison  wrote:
>
> Thanks Sönke for the feedback.
>
> I debated this point quite a bit before deciding to base creation
> around "min.insync.replicas".
>
> For me, the goal of this KIP is to enable administrators to provide
> higher availability. In a 3 node cluster configured for high
> availability (3 replicas, 2 min ISR), by enabling this feature,
> clusters should be fully usable even when 1 broker is down. This
> should cover all "normal" maintenance operations like a rolling
> restart or just the recovery of a broker.
>
> At the moment, when creating a topic/partition, the assumption is that
> the resource will be fully functioning. This KIP does not change this
> assumption. If this is something someone wants, I think it should be
> handled in a different KIP that targets that use case. By relying on
> "min.insync.replicas", we don't break any assumptions the user has and
> this should be fully transparent from the user point of view.
>
> About "min.insync.replicas", one caveat that is not explicit in the
> KIP is that it's currently possible to create topics with less
> replicas than this settings. For that reason, I think the
> implementation will actually rely on min(replicas, min-isr) instead of
> simply min.insync.replicas. I have updated the KIP to explicitly
> mention this point.
>
> I hope that answers your question, let me know.
> Thanks
>
>
> On Mon, Dec 16, 2019 at 4:38 PM Sönke Liebau
>  wrote:
> >
> > Hi Michael,
> >
> > that sounds like a useful addition! I can't help but wonder whether by
> > leaving in the restriction that "min.insync.replicas" has to be
> > satisfied we'll be back here in a years time because someone has a
> > scenario where he or she wants to go below that :)
> > I don't have a strong opinion either way to be honest, just a random
> > thought when reading the KIP.
> >
> > Best regards,
> > Sönke
> >
> > On Thu, 12 Dec 2019 at 22:44, Ryanne Dolan  wrote:
> > >
> > > +1 non-binding, thx
> > >
> > > On Thu, Dec 12, 2019 at 6:09 AM Mickael Maison 
> > > wrote:
> > >
> > > > Bumping this thread, I've not seen any votes or feedback.
> > > >
> > > > On Wed, Nov 13, 2019 at 12:17 PM Mickael Maison
> > > >  wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I'd like to start a vote on KIP-409: Allow creating under-replicated
> > > > > topics and partitions
> > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-409%3A+Allow+creating+under-replicated+topics+and+partitions
> > > > >
> > > > > Thanks
> > > >
> >
> >
> >
> > --
> > Sönke Liebau
> > Partner
> > Tel. +49 179 7940878
> > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [VOTE] KIP-409: Allow creating under-replicated topics and partitions

2019-12-16 Thread Sönke Liebau
Hi Michael,

that sounds like a useful addition! I can't help but wonder whether by
leaving in the restriction that "min.insync.replicas" has to be
satisfied we'll be back here in a years time because someone has a
scenario where he or she wants to go below that :)
I don't have a strong opinion either way to be honest, just a random
thought when reading the KIP.

Best regards,
Sönke

On Thu, 12 Dec 2019 at 22:44, Ryanne Dolan  wrote:
>
> +1 non-binding, thx
>
> On Thu, Dec 12, 2019 at 6:09 AM Mickael Maison 
> wrote:
>
> > Bumping this thread, I've not seen any votes or feedback.
> >
> > On Wed, Nov 13, 2019 at 12:17 PM Mickael Maison
> >  wrote:
> > >
> > > Hi all,
> > >
> > > I'd like to start a vote on KIP-409: Allow creating under-replicated
> > > topics and partitions
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-409%3A+Allow+creating+under-replicated+topics+and+partitions
> > >
> > > Thanks
> >



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [jira] [Created] (KAFKA-9292) KIP-551: Expose disk read and write metrics

2019-12-16 Thread Sönke Liebau
Hi Colin,

thanks for the KIP, sounds like a useful addition to me.

I considered suggesting also reading the rchar property to allow
getting an approximation of cache hit ratio, but after some reading it
appear to be not as simple as that :)

Best regards,
Sönke

On Wed, 11 Dec 2019 at 18:35, Colin McCabe (Jira)  wrote:
>
> Colin McCabe created KAFKA-9292:
> ---
>
>  Summary: KIP-551: Expose disk read and write metrics
>  Key: KAFKA-9292
>  URL: https://issues.apache.org/jira/browse/KAFKA-9292
>  Project: Kafka
>   Issue Type: Improvement
> Reporter: Colin McCabe
> Assignee: Colin McCabe
>
>
> It's often helpful to know how many bytes Kafka is reading and writing from 
> the disk.  The reason is because when disk access is required, there may be 
> some impact on latency and bandwidth.  We currently don't have a metric that 
> measures this directly.  It would be useful to add one.
>
> See 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-551%3A+Expose+disk+read+and+write+metrics
>
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Broker Interceptors

2019-12-03 Thread Sönke Liebau
Hi Thomas,

I think that idea is worth looking at. As you say, if no interceptor is
configured then the performance overhead should be negligible. Basically it
is then up to the user to decide if he wants tomtake the performance hit.
We should make sure to think about monitoring capabilities like time spent
in the interceptor for records etc.

The most obvious use case I think is server side schema validation, which
Confluent are also offering as part of their commercial product, but other
ideas come to mind as well.

Best regards,
Sönke

Thomas Aley  schrieb am Di., 3. Dez. 2019, 10:45:

> Hi M. Manna,
>
> Thank you for your feedback, any and all thoughts on this are appreciated
> from the community.
>
> I think it is important to distinguish that there are two parts to this.
> One would be a server side interceptor framework and the other would be
> the interceptor implementations themselves.
>
> The idea would be that the Interceptor framework manifests as a plug point
> in the request/response paths that by itself has negligible performance
> impact as without an interceptor registered in the framework it is
> essentially a no-op. This way the out-the-box behavior of the Kafka broker
> remains essentially unchanged, it is only if the cluster administrator
> registers an interceptor into the framework that the path of a record is
> intercepted. This is much like the already accepted and implemented client
> interceptors - the capability exists and it is an opt-in feature.
>
> As with the client interceptors and indeed interception in general, the
> interceptor implementations need to be thoughtfully crafted to ensure
> minimal performance impact. Yes the interceptor framework could tap into
> nearly everything but would only be tapping into the subset of APIs that
> the user wishes to intercept for their use case.
>
> Tom Aley
> thomas.a...@ibm.com
>
>
>
> From:   "M. Manna" 
> To: Kafka Users 
> Cc: dev@kafka.apache.org
> Date:   02/12/2019 11:31
> Subject:[EXTERNAL] Re: Broker Interceptors
>
>
>
> Hi Tom,
>
> On Mon, 2 Dec 2019 at 09:41, Thomas Aley  wrote:
>
> > Hi Kafka community,
> >
> > I am hoping to get some feedback and thoughts about broker interceptors.
> >
> > KIP-42 Added Producer and Consumer interceptors which have provided
> Kafka
> > users the ability to collect client side metrics and trace the path of
> > individual messages end-to-end.
> >
> > This KIP also mentioned "Adding message interceptor on the broker makes
> a
> > lot of sense, and will add more detail to monitoring. However, the
> > proposal is to do it later in a separate KIP".
> >
> > One of the motivations for leading with client interceptors was to gain
> > experience and see how useable they are before tackling the server side
> > implementation which would ultimately "allow us to have a more
> > complete/detailed message monitoring".
> >
> > Broker interceptors could also provide more value than just more
> complete
> > and detailed monitoring such as server side schema validation, so I am
> > curious to learn if anyone in the community has progressed this work;
> has
> > ideas about other potential server side interceptor uses or has actually
> > implemented something similar.
> >
>
>  I personally feel that the cost here is the impact on performance. If I
> am
> right, this interceptor is going to tap into nearly everything. If you
> have
> strong guarantee (min.in.sync.replicas = N-1) then this may incur some
> delay (and let's not forget inter broker comms protection by TLS config).
> This may not be desirable for some systems. That said, it would be good to
> know what others think about this.
>
> Thanks,
>
> >
> > Regards,
> >
> > Tom Aley
> > thomas.a...@ibm.com
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
> >
> >
>
>
>
> Unless stated otherwise above:
> IBM United Kingdom Limited - Registered in England and Wales with number
> 741598.
> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
>
>


Handling deserialization exceptions in Kafka consumer

2019-12-02 Thread Sönke Liebau
Hi all,

I was recently discussing deserialization errors and how they can be
handled in Kafka consumers.

I believe the current code still throws an exception if deserialization
fails which stops consumption unless you seek past that record.

This creates an issue for a JDBC sink connector that uses Avro data for
example, as this is effectively dead in the water when hitting a "poison
pill" record.

I've dug around for a little while and found the following issues on this:
https://issues.apache.org/jira/browse/KAFKA-4740
https://issues.apache.org/jira/browse/KAFKA-5858

https://issues.apache.org/jira/browse/KAFKA-5211 mentions that this
behavior was actually changed for a little while at some time in the past,
but that change was snuck in without a KIP.

There are also a few tickets specifically around Connect, but those refer
more to the wording of the error message than the actual functionality.

--

Is there a "gospel" way of handling this that people have found works for
them?

Or if not, would we want to extend the java consumer to allow configuring
behavior for this case?

At the very least it would be quite easy to add an option to simply ignore
deserialization errors, thinking a bit further, we might also allow
specifying a pluggable error handler that could be used to send malformed
records back to a dead letter queue or something similar.

Best regards,
Sönke


Re: Preliminary blog post about the Apache Kafka 2.4.0 release

2019-11-26 Thread Sönke Liebau
Hi Manikumar,

looks great, thanks for the effort!

Like Ismael, I think it would make sense to mention the ZK upgrade to 3.5 I
know a few people that are waiting for this because of the SSL support so
there is probably more of them out there :)

Best regards,
Sönke


On Mon, 25 Nov 2019 at 20:11, Ismael Juma  wrote:

> Manikumar,
>
> One thing I had previously missed, should we mention the upgrade to ZK 3.5?
>
> Ismael
>
> On Thu, Nov 14, 2019 at 10:41 AM Manikumar 
> wrote:
>
> > Hi all,
> >
> > I've prepared a preliminary blog post about the upcoming Apache Kafka
> 2.4.0
> > release.
> > Please take a look and let me know if you want to add/modify details.
> > Thanks to all who contributed to this blog post.
> >
> >
> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache1
> >
> > Thanks,
> > Manikumar
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-09 Thread Sönke Liebau
Hi Andrew,

thanks for your feedback!
I am interested though, why are you doubtful about getting a committer to
volunteer an opinion? Shouldn't this be in their interest as well?

I'll just continue along for now and start building a very rough poc
implementation based on what's in the KIP so far to flesh out more details
and add them to the KIP as I go along.

Best regards,
Sönke

On Wed, 7 Aug 2019 at 18:18, Andrew Schofield 
wrote:

> Hi,
> I think this is a useful KIP and it looks good in principle. While it can
> all be done using
> interceptors, if the brokers do not know anything about it, you need to
> maintain the
> mapping from topics to key ids somewhere external. I'd prefer the way
> you've done it.
>
> I'm not sure whether you'll manage to interest any committers in
> volunteering an
> opinion, and you'll need that before you can get the KIP accepted into
> Kafka.
>
> Thanks,
> Andrew Schofield (IBM)
>
> On 06/08/2019, 15:46, "Sönke Liebau" 
> wrote:
>
> Hi,
>
> I have so far received pretty much no comments on the technical details
> outlined in the KIP. While I am happy to continue with my own ideas of
> how
> to implement this, I would much prefer to at least get a very broad
> "looks
> good in principle, but still lots to flesh out" from a few people
> before I
> but more work into this.
>
> Best regards,
> Sönke
>
>
>
>
> On Tue, 21 May 2019 at 14:15, Sönke Liebau  >
> wrote:
>
> > Hi everybody,
> >
> > I'd like to rekindle the discussion around KIP-317.
> > I have reworked the KIP a little bit in order to design everything
> as a
> > pluggable implementation. During the course of that work I've also
> decided
> > to rename the KIP, as encryption will only be transparent in some
> cases. It
> > is now called "Add end to end data encryption functionality to Apache
> > Kafka" [1].
> >
> > I'd very much appreciate it if you could give the KIP a quick read.
> This
> > is not at this point a fully fleshed out design, as I would like to
> agree
> > on the underlying structure that I came up with first, before
> spending time
> > on details.
> >
> > TL/DR is:
> > Create three pluggable classes:
> > KeyManager runs on the broker and manages which keys to use, key
> rollover
> > etc
> > KeyProvider runs on the client and retrieves keys based on what the
> > KeyManager tells it
> > EncryptionEngine runs on the client andhandles the actual encryption
> > First idea of control flow between these components can be seen at
> [2]
> >
> > Please let me know any thoughts or concerns that you may have!
> >
> > Best regards,
> > Sönke
> >
> > [1]
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-317%253A%2BAdd%2Bend-to-end%2Bdata%2Bencryption%2Bfunctionality%2Bto%2BApache%2BKafkadata=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724sdata=GwcvmfILdjTZBxOseHR4IjUY0oMG3%2BKEjFNHo3pJlvc%3Dreserved=0
> > [2]
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdownload%2Fattachments%2F85479936%2Fkafka_e2e-encryption_control-flow.png%3Fversion%3D1%26modificationDate%3D1558439227551%26api%3Dv2data=02%7C01%7C%7Cc858aa722cc9434ba98d08d71a7cd547%7C84df9e7fe9f640afb435%7C1%7C0%7C637006995760557724sdata=FcMoNEliLn48OZfWca1TCQv%2BiIlRNqJNQvU52UfkbEs%3Dreserved=0
> >
> >
> >
> > On Fri, 10 Aug 2018 at 14:05, Sönke Liebau <
> soenke.lie...@opencore.com>
> > wrote:
> >
> >> Hi Viktor,
> >>
> >> thanks for your input! We could accommodate magic headers by
> removing any
> >> known fixed bytes pre-encryption, sticking them in a header field
> and
> >> prepending them after decryption. However, I am not sure whether
> this is
> >> actually necessary, as most modern (AES for sure) algorithms are
> considered
> >> to be resistant to known-plaintext types of attack. Even if the
> entire
> >> plaintext is known to the attacker he still needs to brute-force
> the key -
> >> which may take a while.
> >>
> >> Something different to consider in this context are compression
> >> sidechannel attacks like CRIME or BREACH, which may be relevant
> depending
>

Re: [VOTE] KIP-499 - Unify connection name flag for command line tool

2019-08-09 Thread Sönke Liebau
+1 (non-binding)



On Fri, 9 Aug 2019 at 04:45, Harsha Chintalapani  wrote:

> +1  (binding). much needed!!
>
>
> On Thu, Aug 08, 2019 at 6:43 PM, Gwen Shapira  wrote:
>
> > +1 (binding) THANK YOU. It would be +100 if I could.
> >
> > On Thu, Aug 8, 2019 at 6:37 PM Mitchell  wrote:
> >
> > Hello Dev,
> > After the discussion I would like to start the vote for KIP-499
> >
> > The following command line tools will have the `--bootstrap-server`
> > command line argument added: kafka-console-producer.sh,
> > kafka-consumer-groups.sh, kafka-consumer-perf-test.sh,
> > kafka-verifiable-consumer.sh, kafka-verifiable-producer.sh
> >
> > Thanks,
> > -Mitch
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-08 Thread Sönke Liebau
Thanks for your feedback both of you!

I've commented inline below.


On Thu, 8 Aug 2019 at 08:38, Jörn Franke  wrote:

> If you are doing batch encryption then you are more similar to a scenario
> of file encryption. The more frequent the messages are you are closer to
> the ssl/https scenarios. You may learn from those protocols on how they
> handle keys, how long they keep them etc. to implement your E2e solution .
>
> > Am 08.08.2019 um 08:11 schrieb Maulin Vasavada <
> maulin.vasav...@gmail.com>:
> >
> > Hi Sönke Liebau
> > <
> https://www.mail-archive.com/search?l=dev@kafka.apache.org=from:%22S%C3%B6nke+Liebau%22
> >
> >
> > Thanks for the great detailed documentation. However, I feel by leaving
> the
> > KMS outside of Kafka might simplify the whole thing to a great extent. If
> > the broker is not going to touch the encrypted messages, why would we put
> > any dependency of KMS interfaces on the Broker. We have experimented
> doing
> > end-to-end message encryption and we used topic level keys and message
> > encryption with serializer wrapper which encrypts each message before
> > serializing. The serializer wrapper have to integrate with required KMS
> we
> > use internally and that was all.
>
My idea by having the broker manage topic keys was that we keep the option
of actually making the encryption transparent to the clients. This way you
could configure a topic as encrypted on the broker and the broker would
then push everything to the client that it needs to know to encrypt
messages on startup - but still be unable to decrypt messages itself.

However, this is only one possible scenario. Another valid scenario is of
course that you want to configure clients directly with keys, which I hope
my proposal also covers, as everything is pluggable. And in this case the
broker would not need a dependency on the KMS, as it doesn't need to handle
keys.

Basically by making this pluggable I hope to be able to cover a wide
variety of use cases, the two described in the KIP are just the ones that
I'd implement initially.



> >
> > However one key observation we had was - if we could do encryption at
> > 'batch' level instead of 'per-message' it can perform much better
> > (depending upon batch sizing). We didn't experiment with that though.
>

I agree, batch encryption would make this perform much better, but it has
downsides as well. I am unsure of the security implications of larger vs
smaller payload to be honest, but will investigate this.
In addition however, we do not want to decrypt the batch on the broker, so
this will be handed to consumers as a batch as well, which has the same
implications as end-to-end compression like more complicated offset
committing for consumers. I have not looked into that in a long time and
that may not even be an issue anymore. I'll do some digging here as well.
Bottom line: I agree, but I think we should offer both modes of operation.


> >
> > Thanks
> > Maulin
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-504 - Add new Java Authorizer Interface

2019-08-07 Thread Sönke Liebau
Hi Rajini,

looks great and addresses a few gripes I had in the past, thanks for that!

One idea that I had while reading, but I am not sure if this is taking
"being flexible" a step too far maybe.. Would it make sense to make the
decision at which severity to log a decision pluggable/configurable? We
have a few customers that have different regulatory requirements for
auditing access depending on the type of data that is in topics. So for
some topics they might actually need to log every access, for some just the
denied ones and for some no one cares at all.

Best regards,
Sönke



On Wed, 7 Aug 2019 at 12:28, Ron Dagostino  wrote:

> Looks great, Rajini — a detailed and complete KIP with a great
> backwards-compatibility plan.  Nothing came to mind aside from how easy it
> was to read and understand.  Thanks for writing it so clearly.
>
> Ron
>
> > On Aug 6, 2019, at 5:31 PM, Rajini Sivaram 
> wrote:
> >
> > Hi all,
> >
> > I have created a KIP to replace the Scala Authorizer API with a new Java
> > API:
> >
> >   -
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-504+-+Add+new+Java+Authorizer+Interface
> >
> > This is replacement for KIP-50 which was accepted but never merged. Apart
> > from moving to a Java API consistent with other pluggable interfaces in
> the
> > broker, KIP-504 also attempts to address known limitations in the
> > authorizer. If you have come across other limitations that you would like
> > to see addressed in the new API, please raise these on the discussion
> > thread so that we can consider those too. All suggestions and feedback
> are
> > welcome.
> >
> > Thank you,
> >
> > Rajini
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-08-06 Thread Sönke Liebau
Hi,

I have so far received pretty much no comments on the technical details
outlined in the KIP. While I am happy to continue with my own ideas of how
to implement this, I would much prefer to at least get a very broad "looks
good in principle, but still lots to flesh out" from a few people before I
but more work into this.

Best regards,
Sönke




On Tue, 21 May 2019 at 14:15, Sönke Liebau 
wrote:

> Hi everybody,
>
> I'd like to rekindle the discussion around KIP-317.
> I have reworked the KIP a little bit in order to design everything as a
> pluggable implementation. During the course of that work I've also decided
> to rename the KIP, as encryption will only be transparent in some cases. It
> is now called "Add end to end data encryption functionality to Apache
> Kafka" [1].
>
> I'd very much appreciate it if you could give the KIP a quick read. This
> is not at this point a fully fleshed out design, as I would like to agree
> on the underlying structure that I came up with first, before spending time
> on details.
>
> TL/DR is:
> Create three pluggable classes:
> KeyManager runs on the broker and manages which keys to use, key rollover
> etc
> KeyProvider runs on the client and retrieves keys based on what the
> KeyManager tells it
> EncryptionEngine runs on the client andhandles the actual encryption
> First idea of control flow between these components can be seen at [2]
>
> Please let me know any thoughts or concerns that you may have!
>
> Best regards,
> Sönke
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> [2]
> https://cwiki.apache.org/confluence/download/attachments/85479936/kafka_e2e-encryption_control-flow.png?version=1=1558439227551=v2
>
>
>
> On Fri, 10 Aug 2018 at 14:05, Sönke Liebau 
> wrote:
>
>> Hi Viktor,
>>
>> thanks for your input! We could accommodate magic headers by removing any
>> known fixed bytes pre-encryption, sticking them in a header field and
>> prepending them after decryption. However, I am not sure whether this is
>> actually necessary, as most modern (AES for sure) algorithms are considered
>> to be resistant to known-plaintext types of attack. Even if the entire
>> plaintext is known to the attacker he still needs to brute-force the key -
>> which may take a while.
>>
>> Something different to consider in this context are compression
>> sidechannel attacks like CRIME or BREACH, which may be relevant depending
>> on what type of data is being sent through Kafka. Both these attacks depend
>> on the encrypted record containing a combination of secret and user
>> controlled data.
>> For example if Kafka was used to forward data that the user entered on a
>> website along with a secret API key that the website adds to a back-end
>> server and the user can obtain the Kafka messages, these attacks would
>> become relevant. Not much we can do about that except disallow encryption
>> when compression is enabled (TLS chose this approach in version 1.3)
>>
>> I agree with you, that we definitely need to clearly document any risks
>> and how much security can reasonably be expected in any given scenario. We
>> might even consider logging a warning message when sending data that is
>> compressed and encrypted.
>>
>> On a different note, I've started amending the KIP to make key management
>> and distribution pluggable, should hopefully be able to publish sometime
>> Monday.
>>
>> Best regards,
>> Sönke
>>
>>
>> On Thu, Jun 21, 2018 at 12:26 PM, Viktor Somogyi > > wrote:
>>
>>> Hi Sönke,
>>>
>>> Compressing before encrypting has its dangers as well. Suppose you have a
>>> known compression format which adds a magic header and you're using a
>>> block
>>> cipher with a small enough block, then it becomes much easier to figure
>>> out
>>> the encryption key. For instance you can look at Snappy's stream
>>> identifier:
>>> https://github.com/google/snappy/blob/master/framing_format.txt
>>> . Based on this you should only use block ciphers where block sizes are
>>> much larger then 6 bytes. AES for instance should be good with its 128
>>> bits
>>> = 16 bytes but even this isn't entirely secure as the first 6 bytes
>>> already
>>> leaked some information - and it depends on the cypher that how much it
>>> is.
>>> Also if we suppose that an adversary accesses a broker and takes all the
>>> data, they'll have much easier job to decrypt it as they'll have much
>>> more
>&g

Jira Cleanup

2019-07-19 Thread Sönke Liebau
All,

I left a few comments on some old but still open jiras in an attempt to
clean up a little bit.

Since probably no one would notice these comments I thought I'd quickly
list them here to give people a chance to check on them:

KAFKA-822 : Reassignment
of partitions needs a cleanup
KAFKA-1016 : Broker
should limit purgatory size
KAFKA-1099 :
StopReplicaRequest
and StopReplicaResponse should also carry the replica ids
KAFKA- : Broker
prematurely accepts TopicMetadataRequests on startup
KAFKA-1234:  All
kafka-run-class.sh to source in user config file (to set env vars like
KAFKA_OPTS)

 I'll wait a few days for objections and then close these issues if none
are forthcoming.

Best regards,
Sönke


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2019-05-21 Thread Sönke Liebau
Hi everybody,

I'd like to rekindle the discussion around KIP-317.
I have reworked the KIP a little bit in order to design everything as a
pluggable implementation. During the course of that work I've also decided
to rename the KIP, as encryption will only be transparent in some cases. It
is now called "Add end to end data encryption functionality to Apache
Kafka" [1].

I'd very much appreciate it if you could give the KIP a quick read. This is
not at this point a fully fleshed out design, as I would like to agree on
the underlying structure that I came up with first, before spending time on
details.

TL/DR is:
Create three pluggable classes:
KeyManager runs on the broker and manages which keys to use, key rollover
etc
KeyProvider runs on the client and retrieves keys based on what the
KeyManager tells it
EncryptionEngine runs on the client andhandles the actual encryption
First idea of control flow between these components can be seen at [2]

Please let me know any thoughts or concerns that you may have!

Best regards,
Sönke

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
[2]
https://cwiki.apache.org/confluence/download/attachments/85479936/kafka_e2e-encryption_control-flow.png?version=1=1558439227551=v2



On Fri, 10 Aug 2018 at 14:05, Sönke Liebau 
wrote:

> Hi Viktor,
>
> thanks for your input! We could accommodate magic headers by removing any
> known fixed bytes pre-encryption, sticking them in a header field and
> prepending them after decryption. However, I am not sure whether this is
> actually necessary, as most modern (AES for sure) algorithms are considered
> to be resistant to known-plaintext types of attack. Even if the entire
> plaintext is known to the attacker he still needs to brute-force the key -
> which may take a while.
>
> Something different to consider in this context are compression
> sidechannel attacks like CRIME or BREACH, which may be relevant depending
> on what type of data is being sent through Kafka. Both these attacks depend
> on the encrypted record containing a combination of secret and user
> controlled data.
> For example if Kafka was used to forward data that the user entered on a
> website along with a secret API key that the website adds to a back-end
> server and the user can obtain the Kafka messages, these attacks would
> become relevant. Not much we can do about that except disallow encryption
> when compression is enabled (TLS chose this approach in version 1.3)
>
> I agree with you, that we definitely need to clearly document any risks
> and how much security can reasonably be expected in any given scenario. We
> might even consider logging a warning message when sending data that is
> compressed and encrypted.
>
> On a different note, I've started amending the KIP to make key management
> and distribution pluggable, should hopefully be able to publish sometime
> Monday.
>
> Best regards,
> Sönke
>
>
> On Thu, Jun 21, 2018 at 12:26 PM, Viktor Somogyi 
> wrote:
>
>> Hi Sönke,
>>
>> Compressing before encrypting has its dangers as well. Suppose you have a
>> known compression format which adds a magic header and you're using a
>> block
>> cipher with a small enough block, then it becomes much easier to figure
>> out
>> the encryption key. For instance you can look at Snappy's stream
>> identifier:
>> https://github.com/google/snappy/blob/master/framing_format.txt
>> . Based on this you should only use block ciphers where block sizes are
>> much larger then 6 bytes. AES for instance should be good with its 128
>> bits
>> = 16 bytes but even this isn't entirely secure as the first 6 bytes
>> already
>> leaked some information - and it depends on the cypher that how much it
>> is.
>> Also if we suppose that an adversary accesses a broker and takes all the
>> data, they'll have much easier job to decrypt it as they'll have much more
>> examples.
>> So overall we should make sure to define and document the compatible
>> encryptions with the supported compression methods and the level of
>> security they provide to make sure the users are fully aware of the
>> security implications.
>>
>> Cheers,
>> Viktor
>>
>> On Tue, Jun 19, 2018 at 11:55 AM Sönke Liebau
>>  wrote:
>>
>> > Hi Stephane,
>> >
>> > thanks for pointing out the broken pictures, I fixed those.
>> >
>> > Regarding encrypting before or after batching the messages, you are
>> > correct, I had not thought of compression and how this changes things.
>> > Encrypted data does not really encrypt well. My reasoning at the time
>> > of writing was that if we en

Re: Cleaning up command line tools argument parsing a little

2019-05-10 Thread Sönke Liebau
Hi Viktor,

I'll admit I've only had an extremely brief look at your code and jira, but
it sounds absolutely awesome! Plus it will give us a chance at a fresh
start without worrying about breaking existing code using the tools. Of
course...at the risk of code duplication..

I'd be absolutely in favor of creating something like you have proposed!

Best regards,
Sönke

On Wed, May 8, 2019 at 11:38 AM Viktor Somogyi-Vass 
wrote:

> Just to add: I wasn't too pushy by raising a KIP for this as so far I had
> the experience that the community is fine with them or at least people
> learned to live with the current tools but if there's a need I'd be happy
> to jump back working on it or helping you if you have time :)
>
> On Wed, May 8, 2019 at 11:35 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com>
> wrote:
>
> > Hi Sönke,
> >
> > In KAFKA-1774 I have created a Kafka Shell java tool that unfortunately
> > didn't get much attention so far from the creators of the jira. It works
> > similarly to what Colin mentioned, like "kafka.sh topics create -n
> my-topic
> > -p 3 -r 3" or has long names like "kafka.sh topics create --name my-topic
> > --partitions 3 --replicas 3". The bootstrap server everywhere defaults to
> > :9092 or reads up the configs from a path so in many scenarios you can
> omit
> > it, also it uses the admin client of course and all in all provides a
> much
> > better experience than the current tools. It has interactive mode and
> help
> > too. Wanted to implement "code completion" too but for that I'd have to
> > exercise the code a little bit more :).
> > It is currently based on a quite old trunk but if you think it's
> > interesting I can rebase it and we can continue with raising a KIP.
> > https://github.com/viktorsomogyi/kafka/tree/kafka_shell
> >
> > Viktor
> >
> > On Tue, May 7, 2019 at 11:10 AM Sönke Liebau
> >  wrote:
> >
> >> Hi Colin,
> >>
> >> thanks!
> >> While I personally don't like the current command line tools I
> >> realistically think we'll be stuck with them for a while yet, so a
> cleanup
> >> might make sense.
> >> So I'll start looking into that.
> >>
> >> Regarding a central entrypoint, that would indeed be brilliant and I'd
> >> love
> >> to work on that, but I currently have enough other open things to look
> at,
> >> so I won't draw that one to myself as well for now.
> >>
> >> But I'll keep it in mind for when some time frees up :)
> >>
> >> Best regards,
> >> Sönke
> >>
> >>
> >>
> >> Colin McCabe  schrieb am Di., 7. Mai 2019, 00:56:
> >>
> >> > On Mon, May 6, 2019, at 10:21, Sönke Liebau wrote:
> >> > > Hi Colin,
> >> > >
> >> > > it was my intention to keep the structure of the commands mostly
> >> intact
> >> > > while doing the refactoring - if that is possible, have not really
> >> > checked
> >> > > yet to be honest.
> >> > >
> >> > > But what I wanted to try and do is recreate the current parsing with
> >> > > argparse as much as possible. And in the process simply adding
> >> synonyms,
> >> > > for example make the kafka-console-producer understand a
> >> > > bootstrap-parameter in addition to broker-list.
> >> > > There is a bit of custom logic about which parameters go together
> >> etc. in
> >> > > the current classes, so output may look different here and there,
> but
> >> in
> >> > > principle I do believe that it should be possible to recreate the
> >> current
> >> > > structure.
> >> >
> >> > Sounds like a good idea.  Thanks for the clarification.
> >> >
> >> > >
> >> > > If there is an appetite for a new, hadoop-like entrypoint anyway,
> then
> >> > all
> >> > > of this might be "wasted" effort, or rather effort better spent
> >> though,
> >> > you
> >> > > are right.
> >> >
> >> > I don't think anyone is working on a new entry point right now -- or
> if
> >> > they are, they haven't said anything yet :)
> >> >
> >> > I just wanted to mention it as a possible approach in case you wanted
> to
> >> > do a bigger project.
> >> >
> >> > best,
> >> > Colin
> >> >
> >> > >
> >> > > Best 

Re: Cleaning up command line tools argument parsing a little

2019-05-07 Thread Sönke Liebau
Hi Colin,

thanks!
While I personally don't like the current command line tools I
realistically think we'll be stuck with them for a while yet, so a cleanup
might make sense.
So I'll start looking into that.

Regarding a central entrypoint, that would indeed be brilliant and I'd love
to work on that, but I currently have enough other open things to look at,
so I won't draw that one to myself as well for now.

But I'll keep it in mind for when some time frees up :)

Best regards,
Sönke



Colin McCabe  schrieb am Di., 7. Mai 2019, 00:56:

> On Mon, May 6, 2019, at 10:21, Sönke Liebau wrote:
> > Hi Colin,
> >
> > it was my intention to keep the structure of the commands mostly intact
> > while doing the refactoring - if that is possible, have not really
> checked
> > yet to be honest.
> >
> > But what I wanted to try and do is recreate the current parsing with
> > argparse as much as possible. And in the process simply adding synonyms,
> > for example make the kafka-console-producer understand a
> > bootstrap-parameter in addition to broker-list.
> > There is a bit of custom logic about which parameters go together etc. in
> > the current classes, so output may look different here and there, but in
> > principle I do believe that it should be possible to recreate the current
> > structure.
>
> Sounds like a good idea.  Thanks for the clarification.
>
> >
> > If there is an appetite for a new, hadoop-like entrypoint anyway, then
> all
> > of this might be "wasted" effort, or rather effort better spent though,
> you
> > are right.
>
> I don't think anyone is working on a new entry point right now -- or if
> they are, they haven't said anything yet :)
>
> I just wanted to mention it as a possible approach in case you wanted to
> do a bigger project.
>
> best,
> Colin
>
> >
> > Best regards,
> > Sönke
> >
> >
> >
> > On Mon, May 6, 2019 at 7:13 PM Colin McCabe  wrote:
> >
> > > Hi Sönke,
> > >
> > > #2 is a bit tough because people have come to rely on the way the
> commands
> > > are structured right now.
> > >
> > > If we want to make big changes, it might be easier just to create a
> > > separate tool and deprecate the old one(s).  One thing we've talked
> about
> > > doing in the past is creating a single entry point for all the tool
> > > functionality, kind of like hadoop did with the "hadoop" command  Or
> git
> > > with the "git" command, etc.  Then we could deprecate the standalone
> > > commands and remove them after enough time had passed-- kind of like
> the
> > > old consumer.
> > >
> > > On the other hand, a more incremental change would be standardizing
> flags
> > > a bit.  So for example, at least setting it up so that there is a
> standard
> > > way of supplying bootstrap brokers, etc.  We could keep the old flags
> > > around for a while as variants to ease the transition.
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Sun, May 5, 2019, at 00:54, Sönke Liebau wrote:
> > > > Hi Colin,
> > > >
> > > > I totally agree! Especially the differently named bootstrap server
> > > options
> > > > have been annoying me a long time.
> > > >
> > > > I'd propose a two-step approach:
> > > > 1. Add new default options objects similar to CommandLineUtils and
> > > > CommandDefaultOptions (based on argparse4j) but in the clients
> project,
> > > as
> > > > this is referenced by all command line tools as far as I can tell
> > > > 2. Refactor tools one by one to use these new helper classes (and
> thus
> > > > argparse) and add standardized synonyms for parameters as necessary
> > > >
> > > > I think for step 1 we can get away with no KIP, as this doesn't
> change
> > > any
> > > > public interfaces or behavior.
> > > > Step 2 probably needs a KIP as we are adding new parameters? We can
> pick
> > > up
> > > > KIP-14 again for that I think. A lot of work has been done on that
> > > already.
> > > >
> > > > Does that sound useful to everybody?
> > > >
> > > > Best regards,
> > > > Sönke
> > > >
> > > >
> > > > On Thu, Apr 18, 2019 at 1:44 AM Colin McCabe 
> wrote:
> > > >
> > > > > If we are going to standardize on one argument parsing library, it
> > > should
> > &g

Re: Cleaning up command line tools argument parsing a little

2019-05-06 Thread Sönke Liebau
Hi Colin,

it was my intention to keep the structure of the commands mostly intact
while doing the refactoring - if that is possible, have not really checked
yet to be honest.

But what I wanted to try and do is recreate the current parsing with
argparse as much as possible. And in the process simply adding synonyms,
for example make the kafka-console-producer understand a
bootstrap-parameter in addition to broker-list.
There is a bit of custom logic about which parameters go together etc. in
the current classes, so output may look different here and there, but in
principle I do believe that it should be possible to recreate the current
structure.

If there is an appetite for a new, hadoop-like entrypoint anyway, then all
of this might be "wasted" effort, or rather effort better spent though, you
are right.

Best regards,
Sönke



On Mon, May 6, 2019 at 7:13 PM Colin McCabe  wrote:

> Hi Sönke,
>
> #2 is a bit tough because people have come to rely on the way the commands
> are structured right now.
>
> If we want to make big changes, it might be easier just to create a
> separate tool and deprecate the old one(s).  One thing we've talked about
> doing in the past is creating a single entry point for all the tool
> functionality, kind of like hadoop did with the "hadoop" command  Or git
> with the "git" command, etc.  Then we could deprecate the standalone
> commands and remove them after enough time had passed-- kind of like the
> old consumer.
>
> On the other hand, a more incremental change would be standardizing flags
> a bit.  So for example, at least setting it up so that there is a standard
> way of supplying bootstrap brokers, etc.  We could keep the old flags
> around for a while as variants to ease the transition.
>
> best,
> Colin
>
>
> On Sun, May 5, 2019, at 00:54, Sönke Liebau wrote:
> > Hi Colin,
> >
> > I totally agree! Especially the differently named bootstrap server
> options
> > have been annoying me a long time.
> >
> > I'd propose a two-step approach:
> > 1. Add new default options objects similar to CommandLineUtils and
> > CommandDefaultOptions (based on argparse4j) but in the clients project,
> as
> > this is referenced by all command line tools as far as I can tell
> > 2. Refactor tools one by one to use these new helper classes (and thus
> > argparse) and add standardized synonyms for parameters as necessary
> >
> > I think for step 1 we can get away with no KIP, as this doesn't change
> any
> > public interfaces or behavior.
> > Step 2 probably needs a KIP as we are adding new parameters? We can pick
> up
> > KIP-14 again for that I think. A lot of work has been done on that
> already.
> >
> > Does that sound useful to everybody?
> >
> > Best regards,
> > Sönke
> >
> >
> > On Thu, Apr 18, 2019 at 1:44 AM Colin McCabe  wrote:
> >
> > > If we are going to standardize on one argument parsing library, it
> should
> > > certainly be argparse4j, I think.
> > >  argparse4j is simply a better argument parsing library with support
> for
> > > more features.  One example is mutually exclusive options.  argparse4j
> > > supports this with MutuallyExclusiveGroup.  jopt doesn't support this,
> so
> > > when it is needed, we have to add extra code to manually check that
> > > mutually exclusive options are not set.
> > >
> > > argparse4j also has subcommands.  If you want something like "git add"
> > > with some set of flags, and "git remove" with another, you can do this
> with
> > > argparse4j, but not with jopt.  This would be very helpful for
> clearing up
> > > confusion in a lot of our shell scripts which have accumulated dozens
> of
> > > arguments, most of which are only relevant to a very specific
> operation.
> > > But you just can't do it with jopt.
> > >
> > > Just to give an example, argparse4j with subcommands would allow you to
> > > run something like ./kafka-topics.sh list --help and get just options
> that
> > > were relevant for listing topics, not the full dozens of options that
> might
> > > relate to adding topics, removing them, etc.
> > >
> > > To be honest, though, what would help users the most is standardizing
> the
> > > option flags across tools.  We should have a standard way of specifying
> > > bootstrap brokers, for example.  (We can continue to support the old
> > > synonyms for a while, of course.)
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Wed, Apr 17, 2019, at 08:56

Re: Cleaning up command line tools argument parsing a little

2019-05-05 Thread Sönke Liebau
Hi Colin,

I totally agree! Especially the differently named bootstrap server options
have been annoying me a long time.

I'd propose a two-step approach:
1. Add new default options objects similar to CommandLineUtils and
CommandDefaultOptions (based on argparse4j) but in the clients project, as
this is referenced by all command line tools as far as I can tell
2. Refactor tools one by one to use these new helper classes (and thus
argparse) and add standardized synonyms for parameters as necessary

I think for step 1 we can get away with no KIP, as this doesn't change any
public interfaces or behavior.
Step 2 probably needs a KIP as we are adding new parameters? We can pick up
KIP-14 again for that I think. A lot of work has been done on that already.

Does that sound useful to everybody?

Best regards,
Sönke


On Thu, Apr 18, 2019 at 1:44 AM Colin McCabe  wrote:

> If we are going to standardize on one argument parsing library, it should
> certainly be argparse4j, I think.
>  argparse4j is simply a better argument parsing library with support for
> more features.  One example is mutually exclusive options.  argparse4j
> supports this with MutuallyExclusiveGroup.  jopt doesn't support this, so
> when it is needed, we have to add extra code to manually check that
> mutually exclusive options are not set.
>
> argparse4j also has subcommands.  If you want something like "git add"
> with some set of flags, and "git remove" with another, you can do this with
> argparse4j, but not with jopt.  This would be very helpful for clearing up
> confusion in a lot of our shell scripts which have accumulated dozens of
> arguments, most of which are only relevant to a very specific operation.
> But you just can't do it with jopt.
>
> Just to give an example, argparse4j with subcommands would allow you to
> run something like ./kafka-topics.sh list --help and get just options that
> were relevant for listing topics, not the full dozens of options that might
> relate to adding topics, removing them, etc.
>
> To be honest, though, what would help users the most is standardizing the
> option flags across tools.  We should have a standard way of specifying
> bootstrap brokers, for example.  (We can continue to support the old
> synonyms for a while, of course.)
>
> best,
> Colin
>
>
> On Wed, Apr 17, 2019, at 08:56, Guozhang Wang wrote:
> > I took another look at the PR itself and I think it would be great to
> have
> > this cleanup too -- I cannot remember at the beginning why we gradually
> > moved to different mechanism (argparse4j) for different cmds, if there's
> no
> > rationales behind it we should just make them consistent.
> >
> > Thanks for driving this!
> >
> > Guozhang
> >
> > On Wed, Apr 17, 2019 at 7:19 AM Ryanne Dolan 
> wrote:
> >
> > > Sönke, I'd find this very helpful. It's annoying to keep track of which
> > > commands use which form -- I always seem to guess wrong.
> > >
> > > Though I don't think there is any reason to deprecate existing forms,
> e.g.
> > > consumer.config vs consumer-config. I think it's perfectly reasonable
> to
> > > have multiple spellings of the same arguments. I don't really see a
> > > downside to keeping the aliases around indefinitely.
> > >
> > > Ryanne
> > >
> > >
> > >
> > >
> > > On Wed, Apr 17, 2019, 7:07 AM Sönke Liebau
> > >  wrote:
> > >
> > > > Hi everybody,
> > > >
> > > > Jason and I were recently discussing command line argument parsing on
> > > > KAFKA-8131 (or rather the related pull request) [1].
> > > >
> > > > Command line tools and their arguments are somewhat diverse at the
> > > moment.
> > > > Most of the tools use joptsimple for argument parsing, some newer
> java
> > > > tools use argparse4j instead and some tools use nothing at all.
> > > > I've looked for a reason as to why there are two libraries being
> used,
> > > but
> > > > couldn't really find anything. Paolo brought up the same question on
> the
> > > > mailing list a while back [7], but got no response either.
> > > > Does anybody know why this is the case?
> > > >
> > > > This results in no central place to add universal parameters like
> help
> > > and
> > > > version, as well as the help output looking different between some
> of the
> > > > tools.
> > > > Also, there are a number of parameters that should be renamed to
> adhere
> > > to
> > > > defaults.
> > > >
> > > > Ther

Re: Cleaning up command line tools argument parsing a little

2019-04-17 Thread Sönke Liebau
I actually have a theory how that came about.

All classes that use argparse4j are situated in the tools and connect
projects, which doesn't have a dependency on core. But that's where all the
CommandLine stuff that uses joptsimple is located. So to gain access to
that (not joptsimple itself, but all the helper classes) would have meant
adding a dependency on core to those - that may have triggered the search
for something else that ended up with argparse4j.

Not sure if that was what happened, but so far its the only reason I could
come up with.

On Wed, Apr 17, 2019 at 5:55 PM Guozhang Wang  wrote:

> I took another look at the PR itself and I think it would be great to have
> this cleanup too -- I cannot remember at the beginning why we gradually
> moved to different mechanism (argparse4j) for different cmds, if there's no
> rationales behind it we should just make them consistent.
>
> Thanks for driving this!
>
> Guozhang
>
> On Wed, Apr 17, 2019 at 7:19 AM Ryanne Dolan 
> wrote:
>
> > Sönke, I'd find this very helpful. It's annoying to keep track of which
> > commands use which form -- I always seem to guess wrong.
> >
> > Though I don't think there is any reason to deprecate existing forms,
> e.g.
> > consumer.config vs consumer-config. I think it's perfectly reasonable to
> > have multiple spellings of the same arguments. I don't really see a
> > downside to keeping the aliases around indefinitely.
> >
> > Ryanne
> >
> >
> >
> >
> > On Wed, Apr 17, 2019, 7:07 AM Sönke Liebau
> >  wrote:
> >
> > > Hi everybody,
> > >
> > > Jason and I were recently discussing command line argument parsing on
> > > KAFKA-8131 (or rather the related pull request) [1].
> > >
> > > Command line tools and their arguments are somewhat diverse at the
> > moment.
> > > Most of the tools use joptsimple for argument parsing, some newer java
> > > tools use argparse4j instead and some tools use nothing at all.
> > > I've looked for a reason as to why there are two libraries being used,
> > but
> > > couldn't really find anything. Paolo brought up the same question on
> the
> > > mailing list a while back [7], but got no response either.
> > > Does anybody know why this is the case?
> > >
> > > This results in no central place to add universal parameters like help
> > and
> > > version, as well as the help output looking different between some of
> the
> > > tools.
> > > Also, there are a number of parameters that should be renamed to adhere
> > to
> > > defaults.
> > >
> > > There have been a few discussions and initiatives around this in the
> > past.
> > > Just of the top of my head (and a 5 minute jira search) there are:
> > > - KIP-14 [2]
> > > - KAFKA-2111 [3]
> > > - KIP-316 [4]
> > > - KAFKA-1292 [5]
> > > - KAFKA-3530 [6]
> > > - and probably many more
> > >
> > > Would people generally be in favor of revisiting this topic?
> > >
> > > What I'd propose to do is:
> > > - comb through jira and KIPs, clean up old stuff and creae a new
> umbrella
> > > issue to track this  (maybe reuse KIP-4 as well)
> > > - agree on one library for parsing command line arguments (don't care
> > which
> > > one, but two is one too many I think)
> > > - refactor tools to use one library and default way of argument parsing
> > > with central help and version parameter
> > > - add aliases for options that should be renamed according to KIP-4
> (and
> > > maybe others) so that both new and old work for a while, deprecate old
> > > parameters for a cycle or two and then remove them
> > >
> > > I'll shut up now and see if people would consider this useful or have
> any
> > > other input :)
> > >
> > > Best regards,
> > > Sönke
> > >
> > > [1] https://github.com/apache/kafka/pull/6481#discussion_r273773003
> > > <https://issues.apache.org/jira/browse/KAFKA-8131>
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-14+-+Tools+Standardization
> > > [3] https://issues.apache.org/jira/browse/KAFKA-2111
> > > [4]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-316%3A+Command-line+overrides+for+ConnectDistributed+worker+properties
> > > [5] https://issues.apache.org/jira/browse/KAFKA-1292
> > > [6] https://issues.apache.org/jira/browse/KAFKA-3530
> > > [7]
> > >
> > >
> >
> https://sematext.com/opensee/m/Kafka/uyzND10ObP01p77VS?subj=From+Scala+to+Java+based+tools+joptsimple+vs+argparse4j
> > >
> >
>
>
> --
> -- Guozhang
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Cleaning up command line tools argument parsing a little

2019-04-17 Thread Sönke Liebau
Hi everybody,

Jason and I were recently discussing command line argument parsing on
KAFKA-8131 (or rather the related pull request) [1].

Command line tools and their arguments are somewhat diverse at the moment.
Most of the tools use joptsimple for argument parsing, some newer java
tools use argparse4j instead and some tools use nothing at all.
I've looked for a reason as to why there are two libraries being used, but
couldn't really find anything. Paolo brought up the same question on the
mailing list a while back [7], but got no response either.
Does anybody know why this is the case?

This results in no central place to add universal parameters like help and
version, as well as the help output looking different between some of the
tools.
Also, there are a number of parameters that should be renamed to adhere to
defaults.

There have been a few discussions and initiatives around this in the past.
Just of the top of my head (and a 5 minute jira search) there are:
- KIP-14 [2]
- KAFKA-2111 [3]
- KIP-316 [4]
- KAFKA-1292 [5]
- KAFKA-3530 [6]
- and probably many more

Would people generally be in favor of revisiting this topic?

What I'd propose to do is:
- comb through jira and KIPs, clean up old stuff and creae a new umbrella
issue to track this  (maybe reuse KIP-4 as well)
- agree on one library for parsing command line arguments (don't care which
one, but two is one too many I think)
- refactor tools to use one library and default way of argument parsing
with central help and version parameter
- add aliases for options that should be renamed according to KIP-4 (and
maybe others) so that both new and old work for a while, deprecate old
parameters for a cycle or two and then remove them

I'll shut up now and see if people would consider this useful or have any
other input :)

Best regards,
Sönke

[1] https://github.com/apache/kafka/pull/6481#discussion_r273773003

[2]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-14+-+Tools+Standardization
[3] https://issues.apache.org/jira/browse/KAFKA-2111
[4]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-316%3A+Command-line+overrides+for+ConnectDistributed+worker+properties
[5] https://issues.apache.org/jira/browse/KAFKA-1292
[6] https://issues.apache.org/jira/browse/KAFKA-3530
[7]
https://sematext.com/opensee/m/Kafka/uyzND10ObP01p77VS?subj=From+Scala+to+Java+based+tools+joptsimple+vs+argparse4j


Re: Proposal to Auto Close Inactive Tickets

2019-04-12 Thread Sönke Liebau
Hi Addison,

in general, I am totally in favor of closing unnecessary tickets! However,
over the past few weeks, I have spent quite a bit of time looking at old
tickets and evaluating whether those should be closed or are still
relevant. For a surprising number of these tickets, I've found that there
might actually still be something useful to do (the majority was easily
closeable though).

I totally agree that a cleanup makes sense, but I am a bit hesitant about
doing it automatically, even if no one feels responsible for the ticket
anymore, there may still be merit to it.

So personally I'd prefer a concerted cleanup effort to an automated
solution - but that is just my opinion :)

We have just discussed the jira workflow in the Apache Training project as
well and agreed on a workflow that has an initial "triage" state instead of
moving tickets directly to "open" - which should serve as an initial check
if the ticket is "valid" or something better suited to the mailing list,
not an issue, ...
Something similar might be an option to help keeping jira "clean" after an
initial cleanup effort.

Best regards,
Sönke

Am Do., 11. Apr. 2019, 20:55 hat Addison Huddy 
geschrieben:

> Hi Kafka Developers,
>
> The Apache Kafka JIRA currently has 2138 open JIRA tickets. As Charlie
> Munger  once said,
> “Simplicity has a way of improving performance through enabling us to
> better understand what we are doing.”
>
> What are everyone’s thoughts on adopting what the k8s community is doing
> and auto close any ticket that has not seen any updates for 90 days.
>
>
> https://github.com/kubernetes/community/blob/master/contributors/devel/automation.md
>
> Prow will close pull-requests that don't have human activity in the last 90 
> days. It will warn about this process 60 days before closing the 
> pull-request, and warn again 30 days later. One way to prevent this from 
> happening is to add the lifecycle/frozen label on the pull-request.
>
> If we were to adopt this practice, we could reduce our open ticket count
> to 553, a 74% decrease.
> project = KAFKA AND resolution = Unresolved AND updated >= "-90d" ORDER BY
> created DESC
>
> So how might this work?
>
>- a bot, let’s call it Bender, would ping the ticket reporter after 30
>days of inactivity
>- After 60 days, Bender would again ping the reporter warning them
>that the ticket will be closed due to inactivity
>- After 90 days of inactivity, bender would resolve the ticket with
>the status Auto Closed and a comment that the ticket was resolved due to
>inactivity.
>- Bender would ignore all tickets with the label bender-ignore
>
>
> [image: image.png]
>
> Let me know what you think?
>
> \ah
>
>


Re: Kafka Jenkins not showing recent builds in history

2019-04-11 Thread Sönke Liebau
Quick update: everything seems to be back to normal.

On Mon, Apr 8, 2019 at 9:57 AM Sönke Liebau 
wrote:

> I've opened an INFRA ticket about this [1]. Apparently, it is a known
> issue that is solved by a restart of Jenkins. One will be scheduled in the
> near future, I'll update the list once the restart has happened.
>
> Best regards,
> Sönke
>
> [1] https://issues.apache.org/jira/browse/INFRA-18161
>
> On Wed, Apr 3, 2019 at 10:41 PM Sönke Liebau 
> wrote:
>
>> Hi everybody,
>>
>> I looked through recent Jenkins builds for a while today and it somehow
>> looks off to me.
>>
>> Both jobs [1] [2] don't show any builds that are more recent than March
>> 19th in the "build history".
>> Only the "last failed" and "last unsuccessful" permalinks show recent
>> dates. Build pages can be accessed by changing the build id in the link
>> though.
>>
>> That seems weird to me, I would have expected builds to how up in the
>> history, no matter if they were successful or not.
>>
>> Can someone shed some light on this for me? I am probably missing
>> something obvious.
>>
>> Best regards,
>> Sönke
>>
>>
>>
>> [1] https://builds.apache.org/job/kafka-pr-jdk11-scala2.12
>> [2] https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/
>>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2019-04-08 Thread Sönke Liebau
Hi Colin,

quick summary up front: I totally agree, and always have! I think we
misunderstood each other a little :)
I was never really opposed the idea of restricting which ACL features can
be used, I was just opposed to doing it specifically for this change, while
not introducing real versioning. But when I suggested a few things around
how to implement versioning back in January [1] and got no replies on those
suggestions I figured no one was really interested in that, so I fell back
to the minimal idea of not doing anything about it. I am absolutely happy
to pick this up.

I've done some more digging through the code and feel plenty stupid at the
moment, because there actually already is an ACL version [2]. Then again,
good news, we can just use this. Currently loading ACLs from Zookeeper
fails if the hard-coded version in the authorizer differs from what is
stored in Zookeeper [3].
Following your idea of using the inter-broker protocol, we probably need to
create a mapping of IBP -> ACL version somewhere, so we would have
something like ACL_2 -> KAFKA_2_3 which defines that this version requires
the IBP to be at least the stated version. When we make a change that
requires a new ACL version, we simply add a new mapping to the appropriate
API version and change the hard-coded version value in the authorizer.
One potential issue that I see with this is that I am unsure if the
Authorizer will actually have access to the configured IBP version when it
is running from the kafka-acls.sh tool. I'll look for a way, but I think
this might actually not be possible, in "legacy" mode, when writing ACLs
directly to Zookeeper.

The biggest take away that I have from this is, that we will probably need
to change the zk path after all. If we do not do this, older brokers using
the ACLAuthorizer will simply fail to start if someone creates an ACL with
a newer version. And I am not sure we can actually prohibit them from doing
that, as mentioned above.

Maybe an alternative would be to keep the current logic as is, i.e. fail on
presence of an ACL version != current version and add a "migrate" command
to the acl commandline tool - at the cost of not being able to downgrade
again, unless we make the migrate tool bi-directional..

I am unsure of the best way to be honest, will do some more looking through
the code and thinking. But thought I'd share this in case you have some
thoughts.

Best regards,
Sönke

[1]
https://lists.apache.org/thread.html/4936d5205852d0db19653d17aa9821d4ba30f077aee528bb90955ce4@%3Cdev.kafka.apache.org%3E
[2]
https://github.com/apache/kafka/blob/b4fa87cc51f0a7d640dc6ae2cc8f89f006aae652/core/src/main/scala/kafka/security/auth/Acl.scala#L37
[3]
https://github.com/apache/kafka/blob/b4fa87cc51f0a7d640dc6ae2cc8f89f006aae652/core/src/main/scala/kafka/security/auth/Acl.scala#L66


On Fri, Apr 5, 2019 at 8:09 AM Colin McCabe  wrote:

> On Thu, Apr 4, 2019, at 01:52, Sönke Liebau wrote:
> > Hi Colin,
> >
> > I agree, we need to have a way of failing incorrect ranges server-side,
> > I'll amend the KIP and look into that. I think INVALID_REQUEST should fit
> > fine, afaik we can send a message along with that code, so that could
> > explain the actual reason.
>
> Hi Sonke,
>
> Sounds good.
>
> >
> > Regarding prohibiting these ACLs from being created before the
> inter-broker
> > protocol is updated, I am a bit hesitant about this for two reasons.
> >
> > 1. I can't help but feel that we are mixing things in with each other
> here
> > that don't really belong together. The broker protocol and ACL versions
> and
> > potential ACL versioning are to my mind only loosely linked, because
> broker
> > and authorizer are usually updated at the same time. But if we look at
> > Sentry, Ranger or the Confluent LDAP authorizer I am not sure that this
> > will hold true forever.
>
> I strongly doubt that anyone actually wants to update the authorizer
> plugins on a separate schedule than the brokers. I worked on Hadoop for a
> while, so I have some perspective on this. Even in an upgrade scenario, you
> have to bounce the broker process at some point, and at the point, it's not
> clear that you gain anything except a headache by bringing it up again with
> the old authorizer code and the new broker code.
>
> > Also, this doesn't address the issue of potential future ACL changes that
> > might actually be incompatible with existing ACLs. If we go down this
> road
> > I think we should go all the way and come up with a full ACL
> > versioning-scheme. I've written down a few thoughts/suggestions/questions
> > on that in this thread.
>
> I think an ACL versioning scheme is a good idea, but mainly just so that
> we can provide clean error messages when someone with old software tries to
> access ACLs in th

Re: Kafka Jenkins not showing recent builds in history

2019-04-08 Thread Sönke Liebau
I've opened an INFRA ticket about this [1]. Apparently, it is a known issue
that is solved by a restart of Jenkins. One will be scheduled in the near
future, I'll update the list once the restart has happened.

Best regards,
Sönke

[1] https://issues.apache.org/jira/browse/INFRA-18161

On Wed, Apr 3, 2019 at 10:41 PM Sönke Liebau 
wrote:

> Hi everybody,
>
> I looked through recent Jenkins builds for a while today and it somehow
> looks off to me.
>
> Both jobs [1] [2] don't show any builds that are more recent than March
> 19th in the "build history".
> Only the "last failed" and "last unsuccessful" permalinks show recent
> dates. Build pages can be accessed by changing the build id in the link
> though.
>
> That seems weird to me, I would have expected builds to how up in the
> history, no matter if they were successful or not.
>
> Can someone shed some light on this for me? I am probably missing
> something obvious.
>
> Best regards,
> Sönke
>
>
>
> [1] https://builds.apache.org/job/kafka-pr-jdk11-scala2.12
> [2] https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2019-04-04 Thread Sönke Liebau
Hi Colin,

I agree, we need to have a way of failing incorrect ranges server-side,
I'll amend the KIP and look into that. I think INVALID_REQUEST should fit
fine, afaik we can send a message along with that code, so that could
explain the actual reason.

Regarding prohibiting these ACLs from being created before the inter-broker
protocol is updated, I am a bit hesitant about this for two reasons.

1. I can't help but feel that we are mixing things in with each other here
that don't really belong together. The broker protocol and ACL versions and
potential ACL versioning are to my mind only loosely linked, because broker
and authorizer are usually updated at the same time. But if we look at
Sentry, Ranger or the Confluent LDAP authorizer I am not sure that this
will hold true forever.
Also, this doesn't address the issue of potential future ACL changes that
might actually be incompatible with existing ACLs.  If we go down this road
I think we should go all the way and come up with a full ACL
versioning-scheme. I've written down a few thoughts/suggestions/questions
on that in this thread.

2. I agree that not preventing this creates a security risk, however in a
very limited scenario.
The same scenario applied when we introduced prefixed-ACLs in version 2 and
back then it was not addressed as far as I can tell. The only reference I
can find is one sentence in the KIP that states the risk.
For people using the Confluent LDAP authorizer (or any other authorizer
that is based on the SimpleACLAuthorizer) there is not much we can do, as
once the cluster is updated these ACLs could be created, but the
Authorizers would only honor them after they themselves are upgraded (which
will probably be a rolling restart as well, hence have the same issue).

I am really not trying to duck the work here, but just addressing this
specific change feels like it misses the larger issue to me.

Best regards,
Sönke

On Wed, Apr 3, 2019 at 4:36 PM Colin McCabe  wrote:

> Hi Sönke,
>
> Maybe a reasonable design here would be to not allow creating ACLs based
> on ip ranges and subnets unless the inter-broker protocol setting has been
> upgraded.  If an upgrade is done correctly, the IBP should not be upgraded
> until all the brokers have been upgraded, so there shouldn't be older
> brokers in the cluster erroneously giving access to things they shouldn't.
> In that case, perhaps we can hold off on introducing an ACL versioning
> scheme for now.
>
> Another thing that is important here is having some way of rejecting
> malformed ip address ranges in the CreateAcls call.  This is probably not
> too difficult, but it should be spelled out.  We could use INVALID_REQUEST
> as the error code for this situation, or maybe create a new one to be more
> specific.
>
> best,
> Colin
>
>
> On Wed, Apr 3, 2019, at 04:58, Sönke Liebau wrote:
> > All,
> >
> > as this thread has now been dormant for about three months again I'll am
> > willing to consider the attempt at looking at a larger versioning scheme
> > for ACLs as failed.
> >
> > I am away for a long weekend tomorrow and will start a [VOTE] thread on
> > implementing this as is on Monday, as I personally consider the security
> > implications of these ACLs in a mixed version cluster quite minimal and
> > addressable via the release notes.
> >
> > Best,
> > Sönke
> >
> > On Sat, Mar 16, 2019 at 1:32 PM Sönke Liebau  >
> > wrote:
> >
> > > Just a quick bump, as this has been quiet for a while again.
> > >
> > > On Tue, Jan 8, 2019 at 12:44 PM Sönke Liebau <
> soenke.lie...@opencore.com>
> > > wrote:
> > >
> > >> Hi Colin,
> > >>
> > >> thanks for your response!
> > >>
> > >> in theory we could get away without any additional path changes I
> > >> think.. I am still somewhat unsure about the best way of addressing
> > >> this. I'll outline my current idea and concerns that I still have,
> > >> maybe you have some thoughts on it.
> > >>
> > >> ACLs are currently stored in two places in ZK: /kafka-acl and
> > >> /kafka-acl-extended based on whether they make use of prefixes or not.
> > >> The reasoning[1] for this is not fundamentally changed by anything we
> > >> are discussing here, so I think that split will need to remain.
> > >>
> > >> ACLs are then stored in the form of a json array:
> > >> [zk: 127.0.0.1:2181(CONNECTED) 9] get /kafka-acl/Topic/*
> > >>
> > >>
> {"version":1,"acls":[{"principal":"User:sliebau","permissionType":"Allow","operation":"Read","

Kafka Jenkins not showing recent builds in history

2019-04-03 Thread Sönke Liebau
Hi everybody,

I looked through recent Jenkins builds for a while today and it somehow
looks off to me.

Both jobs [1] [2] don't show any builds that are more recent than March
19th in the "build history".
Only the "last failed" and "last unsuccessful" permalinks show recent
dates. Build pages can be accessed by changing the build id in the link
though.

That seems weird to me, I would have expected builds to how up in the
history, no matter if they were successful or not.

Can someone shed some light on this for me? I am probably missing something
obvious.

Best regards,
Sönke



[1] https://builds.apache.org/job/kafka-pr-jdk11-scala2.12
[2] https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/


Re: [VOTE] KIP-443: Return to default segment.ms and segment.index.bytes in Streams repartition topics

2019-04-03 Thread Sönke Liebau
Hi Guozhang,

I've left a comment on the merged pull request, but not sure if you'll get
notified about that since the PR was already merged, so I'll write here as
well.
Setting this to -1 needs to be reflected in the test
shouldAddInternalTopicConfigForRepartitionTopics
as well, as this currently checks for Long.MAX_VALUE [1] and consistently
fails.

Best regards,
Sönke

[1]
https://github.com/apache/kafka/blob/213466b3d4fd21b332c0b6882fea36cf1affef1c/streams/src/test/java/org/apache/kafka/streams/processor/internals/InternalTopologyBuilderTest.java#L647


On Wed, Apr 3, 2019 at 2:07 AM Guozhang Wang  wrote:

> Hello folks,
>
> I'm closing this voting thread now, thanks to all who have provided your
> feedbacks!
>
> Here's a quick tally:
>
> Binding +1: 4 (Damian, Bill, Manikumar, Guozhang)
> Non-binding +1: (John, Mickael).
>
>
> Guozhang
>
> On Fri, Mar 29, 2019 at 11:32 AM Guozhang Wang  wrote:
>
> > Ah I see, my bad :) Yes that was the documented value in `TopicConfig`,
> > and I agree we should just change that as well.
> >
> > Will update the KIP.
> >
> >
> >
> > On Fri, Mar 29, 2019 at 11:27 AM Mickael Maison <
> mickael.mai...@gmail.com>
> > wrote:
> >
> >> Hi Guozhang,
> >>
> >> I know the KIP is about segments configuration but I'm talking about
> >> retention.ms which is also explicitly set on repartition topics
> >>
> >>
> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RepartitionTopicConfig.java#L39
> >> Streams is setting it to Long.MAX_VALUE, but -1 is the "documented"
> >> way to disable the time limit. That's why I said "for consistency" as
> >> in practice it's not going to change anything.
> >>
> >> On Fri, Mar 29, 2019 at 5:09 PM Guozhang Wang 
> wrote:
> >> >
> >> > Hello Mickael,
> >> >
> >> > segment.ms default value in TopicConfig is 7 days, I think this is a
> >> > sufficient default value. Do you have any motivations to set it to -1?
> >> >
> >> >
> >> > Guozhang
> >> >
> >> > On Fri, Mar 29, 2019 at 9:42 AM Mickael Maison <
> >> mickael.mai...@gmail.com>
> >> > wrote:
> >> >
> >> > > +1 (non binding)
> >> > > For consistency, should we also set retention.ms to -1 instead of
> >> > > Long.MAX_VALUE?
> >> > >
> >> > > On Fri, Mar 29, 2019 at 3:59 PM Manikumar <
> manikumar.re...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > +1 (binding)
> >> > > >
> >> > > > Thanks for the KIP.
> >> > > >
> >> > > > On Fri, Mar 29, 2019 at 9:04 PM Damian Guy 
> >> wrote:
> >> > > >
> >> > > > > +1
> >> > > > >
> >> > > > > On Fri, 29 Mar 2019 at 01:59, John Roesler 
> >> wrote:
> >> > > > >
> >> > > > > > +1 (nonbinding) from me.
> >> > > > > >
> >> > > > > > On Thu, Mar 28, 2019 at 7:08 PM Guozhang Wang <
> >> wangg...@gmail.com>
> >> > > > > wrote:
> >> > > > > >
> >> > > > > > > Hello folks,
> >> > > > > > >
> >> > > > > > > I'd like to directly start a voting thread on this simple
> KIP
> >> to
> >> > > change
> >> > > > > > the
> >> > > > > > > default override values for repartition topics:
> >> > > > > > >
> >> > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-443%3A+Return+to+default+segment.ms+and+segment.index.bytes+in+Streams+repartition+topics
> >> > > > > > >
> >> > > > > > > The related PR can be found here as well:
> >> > > > > > > https://github.com/apache/kafka/pull/6511
> >> > > > > > >
> >> > > > > > > If you have any thoughts or feedbacks, they are more than
> >> welcomed
> >> > > as
> >> > > > > > well.
> >> > > > > > >
> >> > > > > > >
> >> > > > > > > -- Guozhang
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > -- Guozhang
> >>
> >
> >
> > --
> > -- Guozhang
> >
>
>
> --
> -- Guozhang
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2019-04-03 Thread Sönke Liebau
All,

as this thread has now been dormant for about three months again I'll am
willing to consider the attempt at looking at a larger versioning scheme
for ACLs as failed.

I am away for a long weekend tomorrow and will start a [VOTE] thread on
implementing this as is on Monday, as I personally consider the security
implications of these ACLs in a mixed version cluster quite minimal and
addressable via the release notes.

Best,
Sönke

On Sat, Mar 16, 2019 at 1:32 PM Sönke Liebau 
wrote:

> Just a quick bump, as this has been quiet for a while again.
>
> On Tue, Jan 8, 2019 at 12:44 PM Sönke Liebau 
> wrote:
>
>> Hi Colin,
>>
>> thanks for your response!
>>
>> in theory we could get away without any additional path changes I
>> think.. I am still somewhat unsure about the best way of addressing
>> this. I'll outline my current idea and concerns that I still have,
>> maybe you have some thoughts on it.
>>
>> ACLs are currently stored in two places in ZK: /kafka-acl and
>> /kafka-acl-extended based on whether they make use of prefixes or not.
>> The reasoning[1] for this is not fundamentally changed by anything we
>> are discussing here, so I think that split will need to remain.
>>
>> ACLs are then stored in the form of a json array:
>> [zk: 127.0.0.1:2181(CONNECTED) 9] get /kafka-acl/Topic/*
>>
>> {"version":1,"acls":[{"principal":"User:sliebau","permissionType":"Allow","operation":"Read","host":"*"},{"principal":"User:sliebau","permissionType":"Allow","operation":"Describe","host":"*"},{"principal":"User:sliebau2","permissionType":"Allow","operation":"Describe","host":"*"},{"principal":"User:sliebau2","permissionType":"Allow","operation":"Read","host":"*"}]}
>>
>> What we could do is add a version property to the individual ACL
>> elements like so:
>> [
>>   {
>> "principal": "User:sliebau",
>> "permissionType": "Allow",
>> "operation": "Read",
>> "host": "*",
>> "acl_version": "1"
>>   }
>> ]
>>
>> We define the current state of ACLs as version 0 and the Authorizer
>> will default a missing "acl_version" element to this value for
>> backwards compatibility. So there should hopefully be no need to
>> migrate existing ACLs (concerns notwithstanding, see later).
>>
>> Additionally the authorizer will get a max_supported_acl_version
>> setting which will cause it to ignore any ACLs larger than what is set
>> here, hence allowing for controlled upgrading similar to the process
>> using inter broker protocol version. If this happens we should
>> probably log a warning in case this was unintentional. Maybe even have
>> a setting that controls whether startup is even possible when not all
>> ACLs are in effect.
>>
>> As I mentioned I have a few concerns, question marks still outstanding on
>> this:
>> - This approach would necessitate being backwards compatible with all
>> earlier versions of ACLs unless we also add a min_acl_version setting
>> - which would put the topic of ACL migrations back on the agenda.
>> - Do we need to touch the wire protocol for the admin client for this?
>> In theory I think not, as the authorizer would write ACLs in the most
>> current (unless forced down by max_acl_version) version it knows, but
>> this takes any control over this away from the user.
>> - This adds json parsing logic to the Authorizer, as it would have to
>> check the version first, look up the proper ACL schema for that
>> version and then re-parse the ACL string with that schema - should not
>> be a real issue if the initial parsing is robust, but strictly
>> speaking we are parsing something that we don't know the schema for
>> which might create issues with updates down the line.
>>
>> Beyond the practical concerns outlined above there are also some
>> broader things maybe worth thinking about. The long term goal is to
>> move away from Zookeeper and other data like consumer group offsets
>> has already been moved into Kafka topics - is that something that we'd
>> want to consider for ACLs as well? With the current storage model we'd
>> need more than one topic for this to cleanly separate resources and
>> prefixed ACLs - if we consider 

Re: [VOTE] KIP-349 Priorities for Source Topics

2019-03-25 Thread Sönke Liebau
Hi Colin,

that is definitely a good option and will cover 90% of all use cases
(probaby more).

However strictly speaking it only addresses one half of the issue unless I
am mistaken. The internal behavior of the KafkaConsumer (which partition
the fetcher gets data from next and which buffered data is returned on the
next poll) is not affected by this. So records will only "jump the queue"
once they leave the KafkaConsumer, until then they will need to fairly
queue just like the rest of the messages.
Again, this will be sufficient in most cases, but if you want high priority
messages to actually jump to the front of the queue you would probably want
to combine both approaches and have a consumer for high prio topics and one
for the rest, both feeding into the same prioritized queue.

Best regards,
Sönke

On Mon, Mar 25, 2019 at 5:43 AM Colin McCabe  wrote:

> On Sat, Mar 23, 2019, at 18:41, nathank...@gmail.com wrote:
> >
> >
> > On 2019/01/28 02:26:31, n...@afshartous.com wrote:
> > > Hi Sönke,
> > >
> > > Thanks for taking the time to review.  I’ve put KIP-349 into
> hibernation.
> > >
> > > Thanks also to everyone who participated in the discussion.
> > >
> > > Best regards,
> > > --
> > >   Nick
> > >
> > > > On Jan 25, 2019, at 5:51 AM, Sönke Liebau <
> soenke.lie...@opencore.com.INVALID> wrote:
> > > >
> > > > a bit late to the party, sorry. I recently spent some time looking
> > > > into this / a similar issue [1].
> > > > After some investigation and playing around with settings I think
> that
> > > > the benefit that could be gained from this is somewhat limited and
> > > > probably outweighed by the implementation effort.
> > > >
> > > > The consumer internal are already geared towards treating partitions
> > > > fairly so that no partition has to wait an undue amount of time and
> > > > this can be further tuned for latency over throughput. Additionally,
> > > > if this is a large issue for someone, there is always the option of
> > > > having a dedicated consumer reading only from the control topic,
> which
> > > > would mean that messages from that topic are received "immediately".
> > > > For a Kafka Streams job it would probably make sense to create two
> > > > input streams and then merging those as a first step.
> > > >
> > > > I think with these knobs a fairly large amount of flexibility can be
> > > > achieved so that there is no urgent need to implement priorities.
> > > >
> > > > So my personal preference would be to set this KIP to dormant for
> now.
> > >
> > >
> > >
> > >
> > >
> > >
> > Hello Nick,
> >
> > I'm extremely new to Kafka, but I was attempting to set up a per-topic
> > priority application, and ended up finding this thread. I'm having
> > difficulty seeing how one can implement it with pause/resume. Would you
> > elaborate?
> >
> > Since those operations are per-partition, and when you stop a
> > partition, it attempts to re-balance, I would need to stop all
> > partitions. Even then, it would try to finish the current transactions
> > instead of immediately putting it on hold and processing other topics.
>
> Hi nathankski,
>
> Calling pause() on a partition doesn't trigger a re-balance or try to
> finish the current transactions.  It just means that you won't get more
> records for that partition until you call resume() on it.
>
> >
> > It also looks like in order to determine if I had received messages
> > from the pri-1 topic, I would need to loop through all records, and
> > ignore those that weren't pri-1 until a poll failed to retrieve any,
> > which seems like it would screw up the other topics.
>
> One way to do this would be to have two threads.  The first thread calls
> poll() on the Kafka consumer.  It puts the records it retrieves into a
> PriorityBlockingQueue.  Records from pri-1 have the priority within the
> queue.
>
> The second thread retrieves records from the queue.  pri-1 records will
> always be pulled out of the PriorityBlockingQueue ahead of any other
> records, so they will be processed first.
>
> If the priority queue gets too big, you pause partitions until thread 2
> can clear the backlog.  The low-priority partition is paused first.
>
> best,
> Colin
>
> >
> > Thank you,
> >
> > Nathan
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-03-24 Thread Sönke Liebau
Hi Nathan,

I have a couple of remarks/questions about your mail, if I may.

First of all, the javadoc for the pause operation of KafkaConsumer states:
"Suspend fetching from the requested partitions. Future calls to
poll(Duration)
<https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#poll-java.time.Duration->
 will not return any records from these partitions until they have been
resumed using resume(Collection)
<https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#resume-java.util.Collection->.
Note that this method does not affect partition subscription. In
particular, it does not cause a group rebalance when automatic assignment
is used." [1]
You mentioned that "those operations" cause a rebalance, can you perhaps
elaborate on that some more?

Second, you state that "it would try to finish the current transactions",
which confuses me a little as well, since the consumer is not really aware
of transactions in a meaningful way. Or does "transaction" in this case
refer to your last call to poll()?

Have you looked into splitting your subscription across two consumers, one
for high priority topics, one for low(er) priority topics? Unless you are
looking for a dynamic, multi-tier priority system across many topics, that
might be your best bet. This works quite well for scenarios where you have
one topic that acts as a control plane (think start,stop processing type of
messages) and a second topic contains the actual data.

Best regards,
Sönke






[1]
https://kafka.apache.org/20/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html#pause-java.util.Collection-


On Sun, Mar 24, 2019 at 2:41 AM nathank...@gmail.com 
wrote:

>
>
> On 2019/01/28 02:26:31, n...@afshartous.com wrote:
> > Hi Sönke,
> >
> > Thanks for taking the time to review.  I’ve put KIP-349 into
> hibernation.
> >
> > Thanks also to everyone who participated in the discussion.
> >
> > Best regards,
> > --
> >   Nick
> >
> > > On Jan 25, 2019, at 5:51 AM, Sönke Liebau 
> > > 
> wrote:
> > >
> > > a bit late to the party, sorry. I recently spent some time looking
> > > into this / a similar issue [1].
> > > After some investigation and playing around with settings I think that
> > > the benefit that could be gained from this is somewhat limited and
> > > probably outweighed by the implementation effort.
> > >
> > > The consumer internal are already geared towards treating partitions
> > > fairly so that no partition has to wait an undue amount of time and
> > > this can be further tuned for latency over throughput. Additionally,
> > > if this is a large issue for someone, there is always the option of
> > > having a dedicated consumer reading only from the control topic, which
> > > would mean that messages from that topic are received "immediately".
> > > For a Kafka Streams job it would probably make sense to create two
> > > input streams and then merging those as a first step.
> > >
> > > I think with these knobs a fairly large amount of flexibility can be
> > > achieved so that there is no urgent need to implement priorities.
> > >
> > > So my personal preference would be to set this KIP to dormant for now.
> >
> >
> >
> >
> >
> >
> Hello Nick,
>
> I'm extremely new to Kafka, but I was attempting to set up a per-topic
> priority application, and ended up finding this thread. I'm having
> difficulty seeing how one can implement it with pause/resume. Would you
> elaborate?
>
> Since those operations are per-partition, and when you stop a partition,
> it attempts to re-balance, I would need to stop all partitions. Even then,
> it would try to finish the current transactions instead of immediately
> putting it on hold and processing other topics.
>
> It also looks like in order to determine if I had received messages from
> the pri-1 topic, I would need to loop through all records, and ignore those
> that weren't pri-1 until a poll failed to retrieve any, which seems like it
> would screw up the other topics.
>
> Thank you,
>
> Nathan
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2019-03-16 Thread Sönke Liebau
Just a quick bump, as this has been quiet for a while again.

On Tue, Jan 8, 2019 at 12:44 PM Sönke Liebau 
wrote:

> Hi Colin,
>
> thanks for your response!
>
> in theory we could get away without any additional path changes I
> think.. I am still somewhat unsure about the best way of addressing
> this. I'll outline my current idea and concerns that I still have,
> maybe you have some thoughts on it.
>
> ACLs are currently stored in two places in ZK: /kafka-acl and
> /kafka-acl-extended based on whether they make use of prefixes or not.
> The reasoning[1] for this is not fundamentally changed by anything we
> are discussing here, so I think that split will need to remain.
>
> ACLs are then stored in the form of a json array:
> [zk: 127.0.0.1:2181(CONNECTED) 9] get /kafka-acl/Topic/*
>
> {"version":1,"acls":[{"principal":"User:sliebau","permissionType":"Allow","operation":"Read","host":"*"},{"principal":"User:sliebau","permissionType":"Allow","operation":"Describe","host":"*"},{"principal":"User:sliebau2","permissionType":"Allow","operation":"Describe","host":"*"},{"principal":"User:sliebau2","permissionType":"Allow","operation":"Read","host":"*"}]}
>
> What we could do is add a version property to the individual ACL
> elements like so:
> [
>   {
> "principal": "User:sliebau",
> "permissionType": "Allow",
> "operation": "Read",
> "host": "*",
> "acl_version": "1"
>   }
> ]
>
> We define the current state of ACLs as version 0 and the Authorizer
> will default a missing "acl_version" element to this value for
> backwards compatibility. So there should hopefully be no need to
> migrate existing ACLs (concerns notwithstanding, see later).
>
> Additionally the authorizer will get a max_supported_acl_version
> setting which will cause it to ignore any ACLs larger than what is set
> here, hence allowing for controlled upgrading similar to the process
> using inter broker protocol version. If this happens we should
> probably log a warning in case this was unintentional. Maybe even have
> a setting that controls whether startup is even possible when not all
> ACLs are in effect.
>
> As I mentioned I have a few concerns, question marks still outstanding on
> this:
> - This approach would necessitate being backwards compatible with all
> earlier versions of ACLs unless we also add a min_acl_version setting
> - which would put the topic of ACL migrations back on the agenda.
> - Do we need to touch the wire protocol for the admin client for this?
> In theory I think not, as the authorizer would write ACLs in the most
> current (unless forced down by max_acl_version) version it knows, but
> this takes any control over this away from the user.
> - This adds json parsing logic to the Authorizer, as it would have to
> check the version first, look up the proper ACL schema for that
> version and then re-parse the ACL string with that schema - should not
> be a real issue if the initial parsing is robust, but strictly
> speaking we are parsing something that we don't know the schema for
> which might create issues with updates down the line.
>
> Beyond the practical concerns outlined above there are also some
> broader things maybe worth thinking about. The long term goal is to
> move away from Zookeeper and other data like consumer group offsets
> has already been moved into Kafka topics - is that something that we'd
> want to consider for ACLs as well? With the current storage model we'd
> need more than one topic for this to cleanly separate resources and
> prefixed ACLs - if we consider pursuing this option it might be a
> chance for a "larger" change to the format which introduces versioning
> and allows storing everything in one compacted topic.
>
> Any thoughts on this?
>
> Best regards,
> Sönke
>
>
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-290%3A+Support+for+Prefixed+ACLs
>
>
> On Sat, Dec 22, 2018 at 5:51 AM Colin McCabe  wrote:
> >
> > Hi Sönke,
> >
> > One path forward would be to forbid the new ACL types from being created
> until the inter-broker protocol had been upgraded.  We'd also have to
> figure out how the new ACLs were stored in ZooKeeper.  There are a bunch of
> proposals in this thread that could work for that-- I really 

Re: Apache Kafka Memory Leakage???

2019-03-06 Thread Sönke Liebau
Hi Syed,

no worries, glad I could help!

If it helps in any way, I still think that this may just be a case of a
misunderstood metric. The documentation for the value from your screenshot
is a bit weird. If this is a paid subscription for SnapLogic, it might be
worthwhile reaching out to them for clarification.

Best regards and good luck,
Sönke

On Wed, Mar 6, 2019 at 10:29 AM Syed Mudassir Ahmed <
syed.mudas...@gaianconsultants.com> wrote:

> Sonke,
>   Thanks so much again.
>   As per your suggestion, I created an executable jar file for kafka
> consumer that runs infinitely.
>   And I have used VisualVM tool to capture the memory stats.
>   Below are the three memory stats I have captured:
>   Stats1: https://i.imgur.com/JfnmNRX.png
>   Stats2: https://i.imgur.com/8F1xn8o.png
>   Stats3: https://i.imgur.com/ba98uhN.png
>
>   And I see that the heap memory goes to about 60 or 70MB, and then falls
> back, and the pattern continues.  Max heap consumed at an instant is about
> 70MB.
>
>   Maybe there is a problem in my platform on updating the stats.  I shall
> convey it to my team.
>
>   thanks so much for the help.
> Thanks,
>
>
>
> On Tue, Mar 5, 2019 at 8:02 PM Sönke Liebau
>  wrote:
>
>> My questions about input parameters are mostly targeted at finding out
>> whether your code enters the conditional block calling commit at the
>> bottom
>> of your code. Endless loop with 100ms poll duration is fine for everything
>> else.
>>
>> Regarding the screenshot that seem to be a snaplogic gui again. Didn't you
>> say you ran the code without snaplogic for your last test?
>> Can you please run that code in isolation either directly from your IDE or
>> compile it to a jar file and then do "java -jar ... " then run jconsole
>> and
>> attach to the process to capture heap size data.
>> Just click on the memory tab and let that run for an hour or so. I am
>> still
>> a bit mistrustful of that snaplogic memory allocated metric to be honest.
>>
>> Best,
>> Sönke
>>
>> On Tue, Mar 5, 2019 at 12:32 PM Syed Mudassir Ahmed <
>> syed.mudas...@gaianconsultants.com> wrote:
>>
>> > I already shared the code snippet earlier.  What do you mean by input
>> > params?  I am just running the consumer in an infinite loop polling for
>> new
>> > messages.Polling time is 100 millisecs.  If the topic is empty, the
>> > memory consumption is gradually increasing over the time and reaching to
>> > about 4GB in about 48 to 64 hours though no messages in the topic.
>> >
>> > Coming to the image, I have uploaded it to imgur.  Please find it here.
>> > https://i.imgur.com/dtAyded.png
>> >
>> > Let me know if anything else is needed.
>> >
>> > Just to make it fast, can we have a one-one call via zoom meeting.  I am
>> > just requesting as I can share all the details and my screen as well.
>> > After that, we can see what to do next.  If the call is not possible
>> then
>> > its ok but it would be great to have it.
>> > Thanks,
>> >
>> >
>> >
>> > On Tue, Mar 5, 2019 at 4:29 PM Sönke Liebau
>> >  wrote:
>> >
>> >> Hi Syed,
>> >>
>> >> the next step would be for someone else to be able to reproduce the
>> issue.
>> >>
>> >> If you could give me the values for the input parameters that your code
>> >> runs with (as per my last mail) then I am happy to run this again and
>> take
>> >> another look at memory consumption.
>> >> I'll then upload the exact code I used to github, maybe you can run
>> that
>> >> in
>> >> your environment and provide a jconsole screenshot of memory
>> consumption
>> >> over time, then we can compare those patterns.
>> >>
>> >> Also, could you please upload the image from your last mail to imgur
>> or a
>> >> similar service, it seems to have been lost in the mailing list.
>> >>
>> >> Best regards,
>> >> Sönke
>> >>
>> >> On Tue, Mar 5, 2019 at 11:48 AM Syed Mudassir Ahmed <
>> >> syed.mudas...@gaianconsultants.com> wrote:
>> >>
>> >> > Sonke,
>> >> >   I am not blaming apache-kafka for the tickets raised by our
>> customers.
>> >> > I am saying there could be an issue in kafka-clients library causing
>> >> > resource/memory leak.  If that issue is resolved, I can resolve my
>> >> tickets
>> >> > as well aut

Re: Apache Kafka Memory Leakage???

2019-03-05 Thread Sönke Liebau
My questions about input parameters are mostly targeted at finding out
whether your code enters the conditional block calling commit at the bottom
of your code. Endless loop with 100ms poll duration is fine for everything
else.

Regarding the screenshot that seem to be a snaplogic gui again. Didn't you
say you ran the code without snaplogic for your last test?
Can you please run that code in isolation either directly from your IDE or
compile it to a jar file and then do "java -jar ... " then run jconsole and
attach to the process to capture heap size data.
Just click on the memory tab and let that run for an hour or so. I am still
a bit mistrustful of that snaplogic memory allocated metric to be honest.

Best,
Sönke

On Tue, Mar 5, 2019 at 12:32 PM Syed Mudassir Ahmed <
syed.mudas...@gaianconsultants.com> wrote:

> I already shared the code snippet earlier.  What do you mean by input
> params?  I am just running the consumer in an infinite loop polling for new
> messages.Polling time is 100 millisecs.  If the topic is empty, the
> memory consumption is gradually increasing over the time and reaching to
> about 4GB in about 48 to 64 hours though no messages in the topic.
>
> Coming to the image, I have uploaded it to imgur.  Please find it here.
> https://i.imgur.com/dtAyded.png
>
> Let me know if anything else is needed.
>
> Just to make it fast, can we have a one-one call via zoom meeting.  I am
> just requesting as I can share all the details and my screen as well.
> After that, we can see what to do next.  If the call is not possible then
> its ok but it would be great to have it.
> Thanks,
>
>
>
> On Tue, Mar 5, 2019 at 4:29 PM Sönke Liebau
>  wrote:
>
>> Hi Syed,
>>
>> the next step would be for someone else to be able to reproduce the issue.
>>
>> If you could give me the values for the input parameters that your code
>> runs with (as per my last mail) then I am happy to run this again and take
>> another look at memory consumption.
>> I'll then upload the exact code I used to github, maybe you can run that
>> in
>> your environment and provide a jconsole screenshot of memory consumption
>> over time, then we can compare those patterns.
>>
>> Also, could you please upload the image from your last mail to imgur or a
>> similar service, it seems to have been lost in the mailing list.
>>
>> Best regards,
>> Sönke
>>
>> On Tue, Mar 5, 2019 at 11:48 AM Syed Mudassir Ahmed <
>> syed.mudas...@gaianconsultants.com> wrote:
>>
>> > Sonke,
>> >   I am not blaming apache-kafka for the tickets raised by our customers.
>> > I am saying there could be an issue in kafka-clients library causing
>> > resource/memory leak.  If that issue is resolved, I can resolve my
>> tickets
>> > as well automatically.  I don't find any issue with the snaplogic code.
>> >   Since I am in touch with developers of kafka-clients thru this email,
>> I
>> > am looking forward to contribute as much as I can to betterize the
>> > kafka-clients library.
>> >   What the steps next to confirm its a bug in kafka-clients?  And if
>> its a
>> > bug whats the process to get it resolved?
>> >
>> > Thanks,
>> >
>> >
>> >
>> > On Tue, Mar 5, 2019 at 2:43 PM Sönke Liebau
>> >  wrote:
>> >
>> >> Hi Syed,
>> >>
>> >> Apache Kafka is an open source software that comes as is without any
>> >> support attached to it. It may well be that this is a bug in the Kafka
>> >> client library, though tbh I doubt that from what my tests have shown
>> and
>> >> since I think someone else would have noticed this as well.
>> >> Even if this is a bug though, there is no obligation on anyone to fix
>> >> this. Any bugs your customer raised with you are between you and them
>> and
>> >> nothing to do with Apache Kafka.
>> >>
>> >> While I am happy to assist you with this I, like most people on this
>> >> list, do this in my spare time as well, which means that my time to
>> spend
>> >> on this is limited.
>> >>
>> >> That being said, could you please host the image externally somewhere
>> >> (imgur or something similar), it doesn't appear to have gone through
>> the
>> >> list.
>> >>
>> >> What input parameters are you using for isSuggest, messageCount and
>> >> synccommit when you run the code?
>> >>
>> >> Best regards,
>> >> Sönke
>> >>
>> >>
>> >&

Re: Apache Kafka Memory Leakage???

2019-03-05 Thread Sönke Liebau
Hi Syed,

the next step would be for someone else to be able to reproduce the issue.

If you could give me the values for the input parameters that your code
runs with (as per my last mail) then I am happy to run this again and take
another look at memory consumption.
I'll then upload the exact code I used to github, maybe you can run that in
your environment and provide a jconsole screenshot of memory consumption
over time, then we can compare those patterns.

Also, could you please upload the image from your last mail to imgur or a
similar service, it seems to have been lost in the mailing list.

Best regards,
Sönke

On Tue, Mar 5, 2019 at 11:48 AM Syed Mudassir Ahmed <
syed.mudas...@gaianconsultants.com> wrote:

> Sonke,
>   I am not blaming apache-kafka for the tickets raised by our customers.
> I am saying there could be an issue in kafka-clients library causing
> resource/memory leak.  If that issue is resolved, I can resolve my tickets
> as well automatically.  I don't find any issue with the snaplogic code.
>   Since I am in touch with developers of kafka-clients thru this email, I
> am looking forward to contribute as much as I can to betterize the
> kafka-clients library.
>   What the steps next to confirm its a bug in kafka-clients?  And if its a
> bug whats the process to get it resolved?
>
> Thanks,
>
>
>
> On Tue, Mar 5, 2019 at 2:43 PM Sönke Liebau
>  wrote:
>
>> Hi Syed,
>>
>> Apache Kafka is an open source software that comes as is without any
>> support attached to it. It may well be that this is a bug in the Kafka
>> client library, though tbh I doubt that from what my tests have shown and
>> since I think someone else would have noticed this as well.
>> Even if this is a bug though, there is no obligation on anyone to fix
>> this. Any bugs your customer raised with you are between you and them and
>> nothing to do with Apache Kafka.
>>
>> While I am happy to assist you with this I, like most people on this
>> list, do this in my spare time as well, which means that my time to spend
>> on this is limited.
>>
>> That being said, could you please host the image externally somewhere
>> (imgur or something similar), it doesn't appear to have gone through the
>> list.
>>
>> What input parameters are you using for isSuggest, messageCount and
>> synccommit when you run the code?
>>
>> Best regards,
>> Sönke
>>
>>
>>
>>
>> On Tue, Mar 5, 2019 at 9:14 AM Syed Mudassir Ahmed <
>> syed.mudas...@gaianconsultants.com> wrote:
>>
>>> Sonke,
>>>   This issue seems serious.  Customers raised bug with our product.  And
>>> I suspect the bug is in apache-kafka clients library.
>>>   I executed the kafka reader without any snaplogic-specific code.
>>> There were hardly about twenty messages in the topics.  The code consumed
>>> about 300MB of memory in about 2 hours.
>>>   Please find attached the screenshot.
>>>   Can we pls get on a call and arrive at the conclusion?  I still argue
>>> its a bug in the kafka-clients library.
>>>
>>> Thanks,
>>>
>>>
>>>
>>> On Mon, Mar 4, 2019 at 8:33 PM Sönke Liebau
>>>  wrote:
>>>
>>>> Hi Syed,
>>>>
>>>> and you are sure that this memory is actually allocated? I still have
>>>> my reservations about that metric to be honest. Is there any way to connect
>>>> to the process with for example jconsole and having a look at memory
>>>> consumption in there?
>>>> Or alternatively, since the code you have sent is not relying on
>>>> SnapLogic anymore, can you just run it as a standalone application and
>>>> check memory consumption?
>>>>
>>>> That code looks very similar to what I ran (without knowing your input
>>>> parameters for issuggest et. al of course) and for me memory consumption
>>>> stayed between 120mb and 200mb.
>>>>
>>>> Best regards,
>>>> Sönke
>>>>
>>>>
>>>> On Mon, Mar 4, 2019 at 1:44 PM Syed Mudassir Ahmed <
>>>> syed.mudas...@gaianconsultants.com> wrote:
>>>>
>>>>> Sonke,
>>>>>   thanks again.
>>>>>   Yes, I replaced the non-kafka code from our end with a simple Sysout
>>>>> statement as follows:
>>>>>
>>>>> do {
>>>>> ConsumerRecords records = 
>>>>> consumer.poll(Duration.of(timeout, ChronoUnit.MILLIS));
>>>>>  

Re: Apache Kafka Memory Leakage???

2019-03-05 Thread Sönke Liebau
Hi Syed,

Apache Kafka is an open source software that comes as is without any
support attached to it. It may well be that this is a bug in the Kafka
client library, though tbh I doubt that from what my tests have shown and
since I think someone else would have noticed this as well.
Even if this is a bug though, there is no obligation on anyone to fix this.
Any bugs your customer raised with you are between you and them and nothing
to do with Apache Kafka.

While I am happy to assist you with this I, like most people on this list,
do this in my spare time as well, which means that my time to spend on this
is limited.

That being said, could you please host the image externally somewhere
(imgur or something similar), it doesn't appear to have gone through the
list.

What input parameters are you using for isSuggest, messageCount and
synccommit when you run the code?

Best regards,
Sönke




On Tue, Mar 5, 2019 at 9:14 AM Syed Mudassir Ahmed <
syed.mudas...@gaianconsultants.com> wrote:

> Sonke,
>   This issue seems serious.  Customers raised bug with our product.  And I
> suspect the bug is in apache-kafka clients library.
>   I executed the kafka reader without any snaplogic-specific code.  There
> were hardly about twenty messages in the topics.  The code consumed about
> 300MB of memory in about 2 hours.
>   Please find attached the screenshot.
>   Can we pls get on a call and arrive at the conclusion?  I still argue
> its a bug in the kafka-clients library.
>
> Thanks,
>
>
>
> On Mon, Mar 4, 2019 at 8:33 PM Sönke Liebau
>  wrote:
>
>> Hi Syed,
>>
>> and you are sure that this memory is actually allocated? I still have my
>> reservations about that metric to be honest. Is there any way to connect to
>> the process with for example jconsole and having a look at memory
>> consumption in there?
>> Or alternatively, since the code you have sent is not relying on
>> SnapLogic anymore, can you just run it as a standalone application and
>> check memory consumption?
>>
>> That code looks very similar to what I ran (without knowing your input
>> parameters for issuggest et. al of course) and for me memory consumption
>> stayed between 120mb and 200mb.
>>
>> Best regards,
>> Sönke
>>
>>
>> On Mon, Mar 4, 2019 at 1:44 PM Syed Mudassir Ahmed <
>> syed.mudas...@gaianconsultants.com> wrote:
>>
>>> Sonke,
>>>   thanks again.
>>>   Yes, I replaced the non-kafka code from our end with a simple Sysout
>>> statement as follows:
>>>
>>> do {
>>> ConsumerRecords records = 
>>> consumer.poll(Duration.of(timeout, ChronoUnit.MILLIS));
>>> for (final ConsumerRecord record : records) {
>>> if (!infiniteLoop && !oneTimeMode) {
>>> --msgCount;
>>> if (msgCount < 0) {
>>> break;
>>> }
>>> }
>>> Debugger.doPrint("value read:<" + record.value() + ">");
>>> /*outputViews.write(new BinaryOutput() {
>>> @Override
>>> public Document getHeader() {
>>> return generateHeader(record, oldHeader);
>>> }
>>>
>>> @Override
>>> public void write(WritableByteChannel writeChannel) throws 
>>> IOException {
>>> try (OutputStream os = 
>>> Channels.newOutputStream(writeChannel)) {
>>> os.write(record.value());
>>> }
>>> }
>>> });*/
>>> //The offset to commit should be the next offset of the current one,
>>> // according to the API
>>> offsets.put(new TopicPartition(record.topic(), record.partition()),
>>> new OffsetAndMetadata(record.offset() + 1));
>>> //In suggest mode, we should not change the current offset
>>> if (isSyncCommit && isSuggest) {
>>> commitOffset(offsets);
>>> offsets.clear();
>>> }
>>> }
>>> } while ((msgCount > 0 || infiniteLoop) && isRunning.get());
>>>
>>>
>>> *Note: *Debugger is a wrapper class that just writes the given string to a 
>>> local file using PrintStream's println() method.
>>>
>>> And I don't see any diff in the metrics.  I still see the huge amount of
>>> memory allocated.
>>>
>>> See the image attached.
>>>
>>>
>>> Thanks,
>>>
>>>
>>>

Re: Apache Kafka Memory Leakage???

2019-03-04 Thread Sönke Liebau
Hi Syed,

and you are sure that this memory is actually allocated? I still have my
reservations about that metric to be honest. Is there any way to connect to
the process with for example jconsole and having a look at memory
consumption in there?
Or alternatively, since the code you have sent is not relying on SnapLogic
anymore, can you just run it as a standalone application and check memory
consumption?

That code looks very similar to what I ran (without knowing your input
parameters for issuggest et. al of course) and for me memory consumption
stayed between 120mb and 200mb.

Best regards,
Sönke


On Mon, Mar 4, 2019 at 1:44 PM Syed Mudassir Ahmed <
syed.mudas...@gaianconsultants.com> wrote:

> Sonke,
>   thanks again.
>   Yes, I replaced the non-kafka code from our end with a simple Sysout
> statement as follows:
>
> do {
> ConsumerRecords records = 
> consumer.poll(Duration.of(timeout, ChronoUnit.MILLIS));
> for (final ConsumerRecord record : records) {
> if (!infiniteLoop && !oneTimeMode) {
> --msgCount;
> if (msgCount < 0) {
> break;
> }
> }
> Debugger.doPrint("value read:<" + record.value() + ">");
> /*outputViews.write(new BinaryOutput() {
> @Override
> public Document getHeader() {
> return generateHeader(record, oldHeader);
> }
>
> @Override
> public void write(WritableByteChannel writeChannel) throws 
> IOException {
> try (OutputStream os = 
> Channels.newOutputStream(writeChannel)) {
> os.write(record.value());
> }
> }
> });*/
> //The offset to commit should be the next offset of the current one,
> // according to the API
> offsets.put(new TopicPartition(record.topic(), record.partition()),
> new OffsetAndMetadata(record.offset() + 1));
> //In suggest mode, we should not change the current offset
> if (isSyncCommit && isSuggest) {
> commitOffset(offsets);
> offsets.clear();
> }
> }
> } while ((msgCount > 0 || infiniteLoop) && isRunning.get());
>
>
> *Note: *Debugger is a wrapper class that just writes the given string to a 
> local file using PrintStream's println() method.
>
> And I don't see any diff in the metrics.  I still see the huge amount of
> memory allocated.
>
> See the image attached.
>
>
> Thanks,
>
>
>
> On Mon, Mar 4, 2019 at 5:17 PM Sönke Liebau
>  wrote:
>
>> Hi Syed,
>>
>> let's keep it on the list for now so that everybody can participate :)
>>
>> The different .poll() method was just an unrelated observation, the
>> main points of my mail were the question about whether this is the
>> correct metric you are looking at and replacing the payload of your
>> code with a println statement to remove non-Kafka code from your
>> program and make sure that the leak is not in there. Have you tried
>> that?
>>
>> Best regards,
>> Sönke
>>
>> On Mon, Mar 4, 2019 at 7:21 AM Syed Mudassir Ahmed
>>  wrote:
>> >
>> > Sonke,
>> >   Thanks so much for the reply.  I used the new version of
>> poll(Duration) method.  Still, I see memory issue.
>> >   Is there a way we can get on a one-one call and discuss this pls?
>> Let me know your availability.  I can share zoom meeting link.
>> >
>> > Thanks,
>> >
>> >
>> >
>> > On Sat, Mar 2, 2019 at 2:15 AM Sönke Liebau 
>> > 
>> wrote:
>> >>
>> >> Hi Syed,
>> >>
>> >> from your screenshot I assume that you are using SnapLogic to run your
>> >> code (full disclosure: I do not have the faintest idea of this
>> >> product!). I've just had a look at the docs and am a bit confused by
>> >> their explanation of the metric that you point out in your image
>> >> "Memory Allocated". The docs say: "The Memory Allocated reflects the
>> >> number of bytes that were allocated by the Snap.  Note that this
>> >> number does not reflect the amount of memory that was freed and it is
>> >> not the peak memory usage of the Snap.  So, it is not necessarily a
>> >> metric that can be used to estimate the required size of a Snaplex
>> >> node.  Rather, the number provides an insight into how much memory had
>> >> to be allocated to process all of the documents.  For example, if the
>> >> total allocate

Re: Apache Kafka Memory Leakage???

2019-03-04 Thread Sönke Liebau
Hi Syed,

let's keep it on the list for now so that everybody can participate :)

The different .poll() method was just an unrelated observation, the
main points of my mail were the question about whether this is the
correct metric you are looking at and replacing the payload of your
code with a println statement to remove non-Kafka code from your
program and make sure that the leak is not in there. Have you tried
that?

Best regards,
Sönke

On Mon, Mar 4, 2019 at 7:21 AM Syed Mudassir Ahmed
 wrote:
>
> Sonke,
>   Thanks so much for the reply.  I used the new version of poll(Duration) 
> method.  Still, I see memory issue.
>   Is there a way we can get on a one-one call and discuss this pls?  Let me 
> know your availability.  I can share zoom meeting link.
>
> Thanks,
>
>
>
> On Sat, Mar 2, 2019 at 2:15 AM Sönke Liebau 
>  wrote:
>>
>> Hi Syed,
>>
>> from your screenshot I assume that you are using SnapLogic to run your
>> code (full disclosure: I do not have the faintest idea of this
>> product!). I've just had a look at the docs and am a bit confused by
>> their explanation of the metric that you point out in your image
>> "Memory Allocated". The docs say: "The Memory Allocated reflects the
>> number of bytes that were allocated by the Snap.  Note that this
>> number does not reflect the amount of memory that was freed and it is
>> not the peak memory usage of the Snap.  So, it is not necessarily a
>> metric that can be used to estimate the required size of a Snaplex
>> node.  Rather, the number provides an insight into how much memory had
>> to be allocated to process all of the documents.  For example, if the
>> total allocated was 5MB and the Snap processed 32 documents, then the
>> Snap allocated roughly 164KB per document.  When combined with the
>> other statistics, this number can help to identify the potential
>> causes of performance issues."
>> The part about not reflecting memory that was freed makes me somewhat
>> doubtful whether this actually reflects how much memory the process
>> currently holds.  Can you give some more insight there?
>>
>> Apart from that, I just ran your code somewhat modified to make it
>> work without dependencies for 2 hours and saw no unusual memory
>> consumption, just a regular garbage collection sawtooth pattern. That
>> being said, I had to replace your actual processing with a simple
>> println, so if there is a memory leak in there I would of course not
>> have noticed.
>> I've uploaded the code I ran [1] for reference. For further analysis,
>> maybe you could run something similar with just a println or noop and
>> see if the symptoms persist, to localize the leak (if it exists).
>>
>> Also, two random observations on your code:
>>
>> KafkaConsumer.poll(Long timeout) is deprecated, you should consider
>> using the overloaded version with a Duration parameter instead.
>>
>> The comment at [2] seems to contradict the following code, as the
>> offsets are only changed when in suggest mode. But as I have no idea
>> what suggest mode even is or all this means this observation may be
>> miles of point :)
>>
>> I hope that helps a little.
>>
>> Best regards,
>> Sönke
>>
>> [1] https://gist.github.com/soenkeliebau/e77e8665a1e7e49ade9ec27a6696e983
>> [2] 
>> https://gist.github.com/soenkeliebau/e77e8665a1e7e49ade9ec27a6696e983#file-memoryleak-java-L86
>>
>>
>> On Fri, Mar 1, 2019 at 7:35 AM Syed Mudassir Ahmed
>>  wrote:
>> >
>> >
>> > Thanks,
>> >
>> >
>> >
>> > -- Forwarded message -
>> > From: Syed Mudassir Ahmed 
>> > Date: Tue, Feb 26, 2019 at 12:40 PM
>> > Subject: Apache Kafka Memory Leakage???
>> > To: 
>> > Cc: Syed Mudassir Ahmed 
>> >
>> >
>> > Hi Team,
>> >   I have a java application based out of latest Apache Kafka version 2.1.1.
>> >   I have a consumer application that runs infinitely to consume messages 
>> > whenever produced.
>> >   Sometimes there are no messages produced for hours.  Still, I see that 
>> > the memory allocated to consumer program is drastically increasing.
>> >   My code is as follows:
>> >
>> > AtomicBoolean isRunning = new AtomicBoolean(true);
>> >
>> > Properties kafkaProperties = new Properties();
>> >
>> > kafkaProperties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
>> > brokers);
>> >
>> > kafkaProperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupID);
>>

Re: Apache Kafka Memory Leakage???

2019-03-01 Thread Sönke Liebau
erride
> public void write(WritableByteChannel writeChannel) 
> throws IOException {
> try (OutputStream os = 
> Channels.newOutputStream(writeChannel)) {
> os.write(record.value());
> }
> }
> });
> //The offset to commit should be the next offset of the 
> current one,
> // according to the API
> offsets.put(new TopicPartition(record.topic(), 
> record.partition()),
> new OffsetAndMetadata(record.offset() + 1));
> //In suggest mode, we should not change the current offset
> if (isSyncCommit && isSuggest) {
> commitOffset(offsets);
> offsets.clear();
> }
> }
>  } while ((msgCount > 0 || infiniteLoop) && isRunning.get());
>
>
> See the screenshot below.  In about nineteen hours, it just consumed 5 
> messages but the memory allocated is 1.6GB.
>
>
> Any clues on how to get rid of memory issue?  Anything I need to do in the 
> program or is it a bug in the kafka library?
>
> Please rever ASAP.
>
>
> Thanks,
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Speeding up integration tests

2019-02-27 Thread Sönke Liebau
Hi,

while I am also extremely annoyed at times by the amount of coffee I
have to drink before tests finish I think the argument about flaky
tests is valid! The current setup has the benefit that every test case
runs on a pristine cluster, if we changed this we'd need to go through
all tests and ensure that topic names are different, which can
probably be abstracted to include a timestamp in the name or something
like that, but it is an additional failure potential.
Add to this the fact that "JUnit runs tests using a deterministic, but
unpredictable order" and the water gets even muddier. Potentially this
might mean that adding an additional test case changes the order that
existing test cases are executed in which might mean that all of a
sudden something breaks that you didn't even touch.

Best regards,
Sönke


On Wed, Feb 27, 2019 at 2:36 PM Stanislav Kozlovski
 wrote:
>
> Hey Viktor,
>
> I am all up for the idea of speeding up the tests. Running the
> `:core:integrationTest` command takes an absurd amount of time as is and is
> continuously going to go up if we don't do anything about it.
> Having said that, I am very scared that your proposal might significantly
> increase the test flakiness of current and future tests - test flakiness is
> a huge problem we're battling. We don't get green PR builds too often - it
> is very common that one or two flaky tests fail in each PR.
> We have also found it hard to get a green build for the 2.2 release (
> https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/).
>
> On Wed, Feb 27, 2019 at 11:09 AM Viktor Somogyi-Vass <
> viktorsomo...@gmail.com> wrote:
>
> > Hi Folks,
> >
> > I've been observing lately that unit tests usually take 2.5 hours to run
> > and a very big portion of these are the core tests where a new cluster is
> > spun up for every test. This takes most of the time. I ran a test
> > (TopicCommandWithAdminClient with 38 test inside) through the profiler and
> > it shows for instance that running the whole class itself took 10 minutes
> > and 37 seconds where the useful time was 5 minutes 18 seconds. That's a
> > 100% overhead. Without profiler the whole class takes 7 minutes and 48
> > seconds, so the useful time would be between 3-4 minutes. This is a bigger
> > test though, most of them won't take this much.
> > There are 74 classes that implement KafkaServerTestHarness and just running
> > :core:integrationTest takes almost 2 hours.
> >
> > I think we could greatly speed up these integration tests by just creating
> > the cluster once per class and perform the tests on separate methods. I
> > know that this a little bit contradicts to the principle that tests should
> > be independent but it seems like recreating clusters for each is a very
> > expensive operation. Also if the tests are acting on different resources
> > (different topics, etc.) then it might not hurt their independence. There
> > might be cases of course where this is not possible but I think there could
> > be a lot where it is.
> >
> > In the optimal case we could cut the testing time back by approximately an
> > hour. This would save resources and give quicker feedback for PR builds.
> >
> > What are your thoughts?
> > Has anyone thought about this or were there any attempts made?
> >
> > Best,
> > Viktor
> >
>
>
> --
> Best,
> Stanislav



--
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSSION] KIP-422: Add support for user/client configuration in the Kafka Admin Client

2019-02-07 Thread Sönke Liebau
Hi Yaodong,

thanks for the KIP!

If I understand your intentions correctly then this KIP would only
address a fairly specific use case, namely SASL-PLAIN with users
defined in Zookeeper. For all other authentication mechanisms like
SSL, SASL-GSSAPI or SASL-PLAIN with users defined in jaas files I
don't see how the AdminClient could directly create new users.
Is this correct, or am I missing something?

Best regards,
Sönke

On Thu, Feb 7, 2019 at 2:47 PM Stanislav Kozlovski
 wrote:
>
> This KIP seems to duplicate some of the functionality proposed in KIP-248
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-248+-+Create+New+ConfigCommand+That+Uses+The+New+AdminClient>.
> KIP-248 has been stuck in a vote thread since July 2018.
>
> Viktor, do you plan on working on the KIP?
>
> On Thu, Feb 7, 2019 at 1:27 PM Stanislav Kozlovski 
> wrote:
>
> > Hey there Yaodong, thanks for the KIP!
> >
> > I'm not too familiar with the user/client configurations we currently
> > allow, is there a KIP describing the initial feature? If there is, it would
> > be useful to include in KIP-422.
> >
> > I also didn't see any authorization in the PR, have we thought about
> > needing to authorize the alter/describe requests per the user/client?
> >
> > Thanks,
> > Stanislav
> >
> > On Fri, Jan 25, 2019 at 5:47 PM Yaodong Yang 
> > wrote:
> >
> >> Hi folks,
> >>
> >> I've published KIP-422 which is about adding support for user/client
> >> configurations in the Kafka Admin Client.
> >>
> >> Basically the story here is to allow KafkaAdminClient to configure the
> >> user
> >> or client configurations for users, instead of requiring users to directly
> >> talk to ZK.
> >>
> >> The link for this KIP is
> >> following:
> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97555704
> >>
> >> I'd be happy to receive some feedback about the KIP I published.
> >>
> >> --
> >> Best,
> >> Yaodong Yang
> >>
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Some jira cleanup

2019-01-28 Thread Sönke Liebau
All,

I left a few comments on some old but still open jiras in an attempt to
clean up a little bit.

Since probably no one would notice these comments I thought I'd quickly
list them here to give people a chance to check on them:

KAFKA-217 : Client test
suite
KAFKA-517 : Ensure that we
escape the metric names if they include user strings
KAFKA-659 : Support
request pipelining in the network server
KAFKA-817 : Implement a
zookeeper path-based controlled shutdown tool
KAFKA-859 : support basic
auth protection of mx4j console
KAFKA-1015 :
documentation for inbuilt offset management
KAFKA-1021 : Write a tool
to check replica lag for individual topic partitions


I'll wait a few days for objections and then close these issues.

Also, as a heads up, I'll try and spent some time on this more or less
regularly in the future, is the approach of notifying everybody on the
mailing list fine for everybody?

Best regards,
Sönke


Re: [VOTE] KIP-349 Priorities for Source Topics

2019-01-25 Thread Sönke Liebau
Hi Nick,

a bit late to the party, sorry. I recently spent some time looking
into this / a similar issue [1].
After some investigation and playing around with settings I think that
the benefit that could be gained from this is somewhat limited and
probably outweighed by the implementation effort.

The consumer internal are already geared towards treating partitions
fairly so that no partition has to wait an undue amount of time and
this can be further tuned for latency over throughput. Additionally,
if this is a large issue for someone, there is always the option of
having a dedicated consumer reading only from the control topic, which
would mean that messages from that topic are received "immediately".
For a Kafka Streams job it would probably make sense to create two
input streams and then merging those as a first step.

I think with these knobs a fairly large amount of flexibility can be
achieved so that there is no urgent need to implement priorities.

So my personal preference would be to set this KIP to dormant for now.

Best regards,
Sönke

[1] 
https://lists.apache.org/thread.html/07b5a9f1232ea139e01c477481a9d77208d8e8b2e55d8608a0271417@%3Cdev.kafka.apache.org%3E

On Fri, Jan 25, 2019 at 1:21 AM  wrote:
>
>
> Hi Colin,
>
> > On Jan 24, 2019, at 12:14 PM, Colin McCabe  wrote:
> >
> > Users almost always like the idea of new features, whatever they are.  But 
> > that doesn't mean that the feature would necessarily work well or be 
> > necessary.
>
> Yes, though we should certainly consider the responses on the user list as 
> input (Subject: Prioritized Topics for Kafka).
>
> > If you still want to pursue this, then I suggest gathering a set of 
> > use-cases that can't be addressed through the means we discussed here 
> > previously.  So, something that can't effectively be addressed through 
> > using the pause and resume API.
>
> We’ve discussed this point before.  I accept you point that a user could 
> implement this behavior with pause and resume.  This KIP is about creating a 
> higher-level API to make it easier to do so.
>
> > Then come up with a concrete proposal that addresses all the questions we 
> > have, including about starvation, incremental fetch requests, and so on.
>
> To me it seems like there’s only one outstanding issue here (incremental 
> fetch), and we could just pick one of the options.  Starvation is by design.  
> I’m not sure what “and so on” references.
>
> > This could be a lot of work.  If you're looking for a way to make more 
> > contributions, I'd recommend getting started with something easier.
>
>
> Yes it does.  And after 6 months of (sometimes circular) discussion I’d like 
> to either move towards a vote or set the status of this KIP to dormant until 
> if and when someone else picks up it up.
>
> Does anybody else have input on either having a vote or setting the KIP 
> dormant ?
>
> Cheers,
> --
>   Nick
>
>
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: Delayed messages with skewed data when subscribing to many partitions

2019-01-11 Thread Sönke Liebau
After some further research I think I answered at least part of this myself.

KIP-74 [1] states the following about ordering of partitions in the
fetch request:
"The solution is to reorder partitions in fetch request in round-robin
fashion to continue fetching from first empty partition received or to
perform random shuffle of partitions before each request."

Which explains the delay in my listing until data is read from the
topics with only one partition. Initially both these topics are
fetched last and then they move forward in every subsequent fetch
request until at some point they are among the first 50 (if run with
default settings for max.partition.fetch.bytes and fetch.max.bytes and
assume all partitions contain enough data to satisfy
max.partition.fetch.bytes) and receive data. In my test scenario this
takes 24 fetch requests. To better illustrate this we can set
max.fetch.bytes = max.partition.fetch.bytes which causes every fetch
request to only contain data from one partition.

The consumer logs the order at debug level, so we can check progress
in the output:

2019-01-11 14:10:52 DEBUG Fetcher:195 - [Consumer clientId=consumer-1,
groupId=cg1547212251445] Sending READ_UNCOMMITTED fetch for partitions
[aaa-33, aaa-34, aaa-35, ... , zzz-90, zzz-24, mmm-0, 000-0] to broker
localhost:9092 (id: 0 rack: null)
2019-01-11 14:10:52 DEBUG Fetcher:195 - [Consumer clientId=consumer-1,
groupId=cg1547212251445] Sending READ_UNCOMMITTED fetch for partitions
[aaa-34, aaa-35, aaa-36, ... , zzz-24, mmm-0, 000-0, aaa-33] to broker
localhost:9092 (id: 0 rack: null)
2019-01-11 14:10:53 DEBUG Fetcher:195 - [Consumer clientId=consumer-1,
groupId=cg1547212251445] Sending READ_UNCOMMITTED fetch for partitions
[aaa-35, aaa-36, aaa-37, ... , mmm-0, 000-0, aaa-33, aaa-34] to broker
localhost:9092 (id: 0 rack: null)
...
2019-01-11 14:12:58 DEBUG Fetcher:195 - [Consumer clientId=consumer-1,
groupId=cg1547212251445] Sending READ_UNCOMMITTED fetch for partitions
[zzz-90, zzz-24, mmm-0, 000-0, ... , zzz-88, zzz-22, zzz-89, zzz-23]
to broker localhost:9092 (id: 0 rack: null)
2019-01-11 14:12:58 DEBUG Fetcher:195 - [Consumer clientId=consumer-1,
groupId=cg1547212251445] Sending READ_UNCOMMITTED fetch for partitions
[zzz-24, mmm-0, 000-0, aaa-33, ... , zzz-22, zzz-89, zzz-23, zzz-90]
to broker localhost:9092 (id: 0 rack: null)
2019-01-11 14:12:58 DEBUG Fetcher:195 - [Consumer clientId=consumer-1,
groupId=cg1547212251445] Sending READ_UNCOMMITTED fetch for partitions
[mmm-0, 000-0, aaa-33, aaa-34, ... , zzz-89, zzz-23, zzz-90, zzz-24]
to broker localhost:9092 (id: 0 rack: null)

So I'll withdraw my suggestions around code improvements, as this is
already being handled well. The question around best practice for
handling something like this remains though. If anybody has any
suggestions I'd love to hear them!

Best regards,
Sönke

[1] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+Add+Fetch+Response+Size+Limit+in+Bytes

On Wed, Jan 9, 2019 at 4:35 PM Sönke Liebau  wrote:
>
> Hi all,
>
> we've just had a case where we suspect that messages get delayed from
> being consumed under certain circumstances. I don't necessarily think
> this is a bug, hence have not opened a jira yet but wanted to discuss
> here - there's probably a best practice that I just don't know about.
>
> The scenario is having one consumer that is subscribed to a large
> number of partitions, some of which are very busy and some of which
> only receive messages sporadically. When the consumer now sends a
> fetchrequest for all subscribed partitions the broker starts filling
> these partition by partition while honoring two parameters:
>
> max.partition.fetch.bytes - controls the maximum size of the data that
> is returned for one individual partition - default: 1 * 1024 * 1024 =
> 1048576 bytes
> fetch.max.bytes - controls the overall maximum size of data that is
> returned for the entire fetchrequest - default: 50 * 1024 * 1024 =
> 52428800 bytes
>
> So by default a fetchresponse can contain data from a maximum of 50
> partitions, which creates the possibility of "freezing out" partitions
> if there are a lot of busy partitions in the subscriptions.
> I've created a small test for this to illustrate my concern:
>
> Topics:
> 000 - 1 partition - 1 message
> aaa - 100 partitions - 10 Mill. messages
> bbb - 1000 partitions - 50 Mill. messages
> mmm - 1 partition - 1 message
> zzz - 100 partitions - 10 Mill messages
>
> When I consume from these with default settings and simply print the
> time I first receive a message from a topic I get the following:
> Got first record from topic aaa after 747 ms
> Got first record from topic bbb after 2764 ms
> Got first record from topic zzz after 15068 ms
> Got first record from topic 000 after 16588 ms
> Got first record from topic mmm after 16588 ms
>
> So as we can

Delayed messages with skewed data when subscribing to many partitions

2019-01-09 Thread Sönke Liebau
Hi all,

we've just had a case where we suspect that messages get delayed from
being consumed under certain circumstances. I don't necessarily think
this is a bug, hence have not opened a jira yet but wanted to discuss
here - there's probably a best practice that I just don't know about.

The scenario is having one consumer that is subscribed to a large
number of partitions, some of which are very busy and some of which
only receive messages sporadically. When the consumer now sends a
fetchrequest for all subscribed partitions the broker starts filling
these partition by partition while honoring two parameters:

max.partition.fetch.bytes - controls the maximum size of the data that
is returned for one individual partition - default: 1 * 1024 * 1024 =
1048576 bytes
fetch.max.bytes - controls the overall maximum size of data that is
returned for the entire fetchrequest - default: 50 * 1024 * 1024 =
52428800 bytes

So by default a fetchresponse can contain data from a maximum of 50
partitions, which creates the possibility of "freezing out" partitions
if there are a lot of busy partitions in the subscriptions.
I've created a small test for this to illustrate my concern:

Topics:
000 - 1 partition - 1 message
aaa - 100 partitions - 10 Mill. messages
bbb - 1000 partitions - 50 Mill. messages
mmm - 1 partition - 1 message
zzz - 100 partitions - 10 Mill messages

When I consume from these with default settings and simply print the
time I first receive a message from a topic I get the following:
Got first record from topic aaa after 747 ms
Got first record from topic bbb after 2764 ms
Got first record from topic zzz after 15068 ms
Got first record from topic 000 after 16588 ms
Got first record from topic mmm after 16588 ms

So as we can see the topics with only one partition get stuck behind
the larger topics with data to be read. I am unsure in what order the
broker iterates over the partitions, but I've always seen the same
general order in the output, so there seems to be some factor
influencing this.
One potential fix that I identified was to reduce the
max.partition.fetch.bytes parameter, so that more partitions can be
included in a fetchresponse. If I rerun the test with a value of 1024
I get:

Got first record from topic aaa after 5446 ms
Got first record from topic bbb after 5469 ms
Got first record from topic zzz after 5744 ms
Got first record from topic mmm after 5762 ms
Got first record from topic 000 after 5762 ms

Which looks much better, but I have doubts whether this is the actual
solution as this could lead to an increase in the number of fetch
requests that are being sent, when only a few partitions have new
data:
5 Partitions with 10mb of new data each would fit in 10 requests with
default settings, but need 10240 with my adjusted settings.

This topic is currently also being discussed in the thread on KIP-349
[1] but consensus seems to be that there is no real need for a feature
like this.

Are there common patterns to get around this? The obvious solution
would be scaling the load across more consumers of course, either by
adding them to the consumer group or by splitting the topics over
consumers, but that sort of just makes it a question of scale until it
may happen again.

Would it potentially be worthwhile looking into code changes to
improve handling for these edge cases?
Keeping track of the last time partitions returned data for a consumer
group and prioritizing "oldest" partitions for example. This would
need memory on the broker though which might turn out to be quite a
lot since it would scale with partition count and consumer groups.
Alternatively some sort of feedback to the consumer could be added
about partitions that were not checked due to the limits, but that
would need a wire protocol change.
Perhaps a little consumer side logic that starts fragmenting fetch
requests if it notices that responses always have data from the
maximum number of partitions.

Best regards,
Sönke

[1] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-349%3A+Priorities+for+Source+Topics


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2019-01-08 Thread Sönke Liebau
Hi Colin,

thanks for your response!

in theory we could get away without any additional path changes I
think.. I am still somewhat unsure about the best way of addressing
this. I'll outline my current idea and concerns that I still have,
maybe you have some thoughts on it.

ACLs are currently stored in two places in ZK: /kafka-acl and
/kafka-acl-extended based on whether they make use of prefixes or not.
The reasoning[1] for this is not fundamentally changed by anything we
are discussing here, so I think that split will need to remain.

ACLs are then stored in the form of a json array:
[zk: 127.0.0.1:2181(CONNECTED) 9] get /kafka-acl/Topic/*
{"version":1,"acls":[{"principal":"User:sliebau","permissionType":"Allow","operation":"Read","host":"*"},{"principal":"User:sliebau","permissionType":"Allow","operation":"Describe","host":"*"},{"principal":"User:sliebau2","permissionType":"Allow","operation":"Describe","host":"*"},{"principal":"User:sliebau2","permissionType":"Allow","operation":"Read","host":"*"}]}

What we could do is add a version property to the individual ACL
elements like so:
[
  {
"principal": "User:sliebau",
"permissionType": "Allow",
"operation": "Read",
"host": "*",
"acl_version": "1"
  }
]

We define the current state of ACLs as version 0 and the Authorizer
will default a missing "acl_version" element to this value for
backwards compatibility. So there should hopefully be no need to
migrate existing ACLs (concerns notwithstanding, see later).

Additionally the authorizer will get a max_supported_acl_version
setting which will cause it to ignore any ACLs larger than what is set
here, hence allowing for controlled upgrading similar to the process
using inter broker protocol version. If this happens we should
probably log a warning in case this was unintentional. Maybe even have
a setting that controls whether startup is even possible when not all
ACLs are in effect.

As I mentioned I have a few concerns, question marks still outstanding on this:
- This approach would necessitate being backwards compatible with all
earlier versions of ACLs unless we also add a min_acl_version setting
- which would put the topic of ACL migrations back on the agenda.
- Do we need to touch the wire protocol for the admin client for this?
In theory I think not, as the authorizer would write ACLs in the most
current (unless forced down by max_acl_version) version it knows, but
this takes any control over this away from the user.
- This adds json parsing logic to the Authorizer, as it would have to
check the version first, look up the proper ACL schema for that
version and then re-parse the ACL string with that schema - should not
be a real issue if the initial parsing is robust, but strictly
speaking we are parsing something that we don't know the schema for
which might create issues with updates down the line.

Beyond the practical concerns outlined above there are also some
broader things maybe worth thinking about. The long term goal is to
move away from Zookeeper and other data like consumer group offsets
has already been moved into Kafka topics - is that something that we'd
want to consider for ACLs as well? With the current storage model we'd
need more than one topic for this to cleanly separate resources and
prefixed ACLs - if we consider pursuing this option it might be a
chance for a "larger" change to the format which introduces versioning
and allows storing everything in one compacted topic.

Any thoughts on this?

Best regards,
Sönke



[1] 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-290%3A+Support+for+Prefixed+ACLs


On Sat, Dec 22, 2018 at 5:51 AM Colin McCabe  wrote:
>
> Hi Sönke,
>
> One path forward would be to forbid the new ACL types from being created 
> until the inter-broker protocol had been upgraded.  We'd also have to figure 
> out how the new ACLs were stored in ZooKeeper.  There are a bunch of 
> proposals in this thread that could work for that-- I really hope we don't 
> keep changing the ZK path each time there is a version bump.
>
> best,
> Colin
>
>
> On Thu, Nov 29, 2018, at 14:25, Sönke Liebau wrote:
> > This has been dormant for a while now, can I interest anybody in chiming in
> > here?
> >
> > I think we need to come up with an idea of how to handle changes to ACLs
> > going forward, i.e. some sort of versioning scheme. Not necessarily what I
> > proposed in my previous mail, but something.
> > Currently this 

Re: [VOTE] KIP-382 MirrorMaker 2.0

2018-12-21 Thread Sönke Liebau
+1 (non-binding)

Thanks for your effort Ryanne!

On Fri, Dec 21, 2018 at 2:23 AM Srinivas Reddy
 wrote:
>
> +1 (non binding)
>
> Thank you Ryan for the KIP, let me know if you need support in implementing
> it.
>
> -
> Srinivas
>
> - Typed on tiny keys. pls ignore typos.{mobile app}
>
>
> On Fri, 21 Dec, 2018, 08:26 Ryanne Dolan 
> > Thanks for the votes so far!
> >
> > Due to recent discussions, I've removed the high-level REST API from the
> > KIP.
> >
> > On Thu, Dec 20, 2018 at 12:42 PM Paul Davidson 
> > wrote:
> >
> > > +1
> > >
> > > Would be great to see the community build on the basic approach we took
> > > with Mirus. Thanks Ryanne.
> > >
> > > On Thu, Dec 20, 2018 at 9:01 AM Andrew Psaltis  > >
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Really looking forward to this and to helping in any way I can. Thanks
> > > for
> > > > kicking this off Ryanne.
> > > >
> > > > On Thu, Dec 20, 2018 at 10:18 PM Andrew Otto 
> > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > This looks like a huge project! Wikimedia would be very excited to
> > have
> > > > > this. Thanks!
> > > > >
> > > > > On Thu, Dec 20, 2018 at 9:52 AM Ryanne Dolan 
> > > > > wrote:
> > > > >
> > > > > > Hey y'all, please vote to adopt KIP-382 by replying +1 to this
> > > thread.
> > > > > >
> > > > > > For your reference, here are the highlights of the proposal:
> > > > > >
> > > > > > - Leverages the Kafka Connect framework and ecosystem.
> > > > > > - Includes both source and sink connectors.
> > > > > > - Includes a high-level driver that manages connectors in a
> > dedicated
> > > > > > cluster.
> > > > > > - High-level REST API abstracts over connectors between multiple
> > > Kafka
> > > > > > clusters.
> > > > > > - Detects new topics, partitions.
> > > > > > - Automatically syncs topic configuration between clusters.
> > > > > > - Manages downstream topic ACL.
> > > > > > - Supports "active/active" cluster pairs, as well as any number of
> > > > active
> > > > > > clusters.
> > > > > > - Supports cross-data center replication, aggregation, and other
> > > > complex
> > > > > > topologies.
> > > > > > - Provides new metrics including end-to-end replication latency
> > > across
> > > > > > multiple data centers/clusters.
> > > > > > - Emits offsets required to migrate consumers between clusters.
> > > > > > - Tooling for offset translation.
> > > > > > - MirrorMaker-compatible legacy mode.
> > > > > >
> > > > > > Thanks, and happy holidays!
> > > > > > Ryanne
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Paul Davidson
> > > Principal Engineer, Ajna Team
> > > Big Data & Monitoring
> > >
> >



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-12-21 Thread Sönke Liebau
Hi Ryanne,

just to briefly check in, am I understanding your mail correctly, that
you want to pick up the "multi-cluster/herder/worker features" in a
different KIP at some time? If yes, please feel free to let me know if
I can provide any help on that front. Otherwise, I am also happy to
draft a proposal as basis for discussion.

Best regards,
Sönke

On Fri, Dec 21, 2018 at 1:11 AM Ryanne Dolan  wrote:
>
> Jun, let's leave the REST API out of the KIP then.
>
> I have been arguing that Connect wouldn't benefit from the 
> multi-cluster/herder/worker features we need in MM2, and that the effort 
> would result in a needlessly complex Connect REST API. But certainly two 
> separate APIs is inherently more complex than a single API. If we can add 
> these features to Connect itself without breaking things, I'm onboard. I have 
> some ideas on this front, but that's for another KIP :)
>
> The REST API is non-essential for a MirrorMaker replacement, and I can easily 
> divorce that from the high-level driver. We still want to support running MM 
> without an existing Connect cluster, but we don't really need a REST API to 
> do that. Legacy MirrorMaker doesn't have a REST API after all. For 
> organizations that want on-the-fly configuration of their replication flows, 
> there's Connect.
>
> This has been brought up by nearly everyone, so I'm happy to oblige.
>
> Ryanne
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-12-20 Thread Sönke Liebau
t; > > >
> > > > > > > >  would like to see support for this to be done by hops,
> as
> > > well
> > > > > > [...]
> > > > > > > This then allows ring (hops = number of brokers in the
> ring),
> > > > mesh
> > > > > > > (every
> > > > > > > cluster interconnected so hop=1), or even a tree (more fine
> > > > grained
> > > > > > > setup)
> > > > > > > cluster topology.
> > > > > > >
> > > > > > > That's a good idea, though we can do this at the topic
> level
> > > > > without
> > > > > > > tagging individual records. A max.hop of 1 would mean
> > > "A.topic1"
> > > > is
> > > > > > > allowed, but not "B.A.topic1". I think the default behavior
> > > would
> > > > > > need
> > > > > > > to
> > > > > > > be max.hops = 1 to avoid unexpectedly creating a bunch of
> > > > > D.C.B.A...
> > > > > > > topics
> > > > > > > when you create a fully-connected mesh topology.
> > > > > > >
> > > > > > > Looking ahead a bit, I can imagine an external tool
> computing
> > > the
> > > > > > > spanning
> > > > > > > tree of topics among a set of clusters based on
> inter-cluster
> > > > > > > replication
> > > > > > > lag, and setting up MM2 accordingly. But that's probably
> > > outside
> > > > > the
> > > > > > > scope
> > > > > > > of this KIP :)
> > > > > > >
> > > > > > > >  ...standalone MirrorMaker connector...
> > > > > > > > ./bin/kafka-mirror-maker-2.sh --consumer
> > > > consumer.properties
> > > > > > > --producer producer.properties
> > > > > > >
> > > > > > > Eventually, I'd like MM2 to completely replace legacy MM,
> > > > including
> > > > > > the
> > > > > > > ./bin/kafka-mirror-maker.sh script. In the meantime, it's a
> > > good
> > > > > idea
> > > > > > > to
> > > > > > > include a standalone driver. Something like
> > > > > > > ./bin/connect-mirror-maker-standalone.sh with the same
> > > high-level
> > > > > > > configuration file. I'll do that, thanks.
> > > > > > >
> > > > > > > > I see no section on providing support for mirror maker
> > > > Handlers,
> > > > > > > today
> > > > > > > people can add handlers to have a little extra custom logic
> > if
> > > > > > needed,
> > > > > > > and
> > > > > > > the handler api is public today so should be supported
> going
> > > > > forwards
> > > > > > > so
> > > > > > > people are not on mass re-writing these.
> > > > > > >
> > > > > > > Great point. Connect offers single-message transformations
> > and
> > > > > > > converters
> > > > > > > for this purpose, but I agree that we should honor the
> > existing
> > > > API
> > > > > > if
> > > > > > > possible. This might be as easy as providing an adapter
> class
> > > > > between
> > > > > > > connect's Transformation and mirror-maker's Handler. Maybe
> > > file a
> > > > > > Jira
> > > > > > > ticket to track this?
> > > > > > >
> > > > > > > Really appreciate your feedback!
> > > > > > >
> > > > > > >     Ryanne
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Dec 6, 2018 at 7:03 PM Michael Pearce <
> > > > > michael.pea...@ig.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Re hops to stop the cycle and to allow a range of multi
> > > cluster
> > > > > > > > topologies, see
&g

Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-12-14 Thread Sönke Liebau
Hi Jun,

I believe Ryanne's idea is to run multiple workers per MM cluster-node, one
per target cluster. So in essence you'd specify three clusters in the MM
config and MM would then instantiate one worker per cluster. Every MM
connector would then be deployed to the appropriate (internal) worker that
is configured for the cluster in question. Thus there would be no changes
necessary to the Connect framework itself, everything would be handled by a
new layer around existing Connect code (probably a sibling implementation
to the DistributedHerder if I understood him correctly). Ryanne, please
correct/expand if I misunderstood your intentions.

To briefly summarize the discussion that Ryanne and I had around this
earlier, my opinion was that the extra layer could potentially be avoided
by extending Connect instead, which would benefit all connectors.

My proposal was to add a configuration option to the worker config that
allows defining "external clusters" which can then be referenced from the
connector config.

For example:

# Core cluster config stays the same and is used for status, config and
offsets as usual
bootstrap.servers=localkafka1:9092,localkafka2:9092

# Allow defining extra remote clusters
externalcluster.kafka_europe.bootstrap.servers=europekafka1:9092,europekafka2:9092
externalcluster.kafka_europe.security.protocol=SSL
externalcluster.kafka_europe.ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks
...
externalcluster.kafka_asia.bootstrap.servers=asiakafka1:9092,asiakafka2:9092


When starting a connector you could now reference these pre-configured
clusters in the config:
{
  "name": "file-source",
  "config": {
"connector.class": "FileStreamSource",
"file": "/tmp/test.txt",
"topic": "connect-test",
"name": "file-source",
"cluster": "kafka_asia"
  }
}

When omitting the "cluster" parameter current behavior of Connect remains
unchanged. This way we could address multiple remote clusters from within a
single worker without adding the extra layer for MirrorMaker. I believe
that this could be done without major structural changes to the Connect
codebase, but I freely admit that this opinion is based on 10 minutes
poking through the code not any real expertise.

Ryanne's main concern with this approach was that there are additional
worker setting that apply to all connectors and that no truly universal
approach would be feasible while running a single worker per Connect node.
Also he feels that from a development perspective it would be preferable to
have independent MM code and contribute applicable features back to
Connect.
While I agree that this would make development of MM easier it will also
create a certain amount of extra code (can probably be kept at a minimum,
but still) that could be avoided by using "vanilla" Connect for MM.

I hope I summarized your views accurately Ryanne, if not please feel free
to correct me!

Best regards,
Sönke


On Fri, Dec 14, 2018 at 1:55 AM Jun Rao  wrote:

> Hi, Ryanne,
>
> Regarding the single connect cluster model, yes, the co-existence of a MM2
> REST API and the nearly identical Connect API is one of my concerns.
> Implementation wise, my understanding is that the producer URL in a
> SourceTask is always obtained from the connect worker's configuration. So,
> not sure how you would customize the producer URL for individual SourceTask
> w/o additional support from the Connect framework.
>
> Thanks,
>
> Jun
>
>
> On Mon, Dec 10, 2018 at 1:17 PM Ryanne Dolan 
> wrote:
>
> > Jun, thanks for your time reviewing the KIP.
> >
> > > In a MirrorSourceConnector, it seems that the offsets of the source
> will
> > be stored in a different cluster from the target cluster?
> >
> > Jan Filipiak raised this issue as well, and suggested that no state be
> > tracked in the source cluster. I've since implemented
> MirrorSourceConnector
> > accordingly. And actually, this issue coincides with another major
> weakness
> > of legacy MirrorMaker: "rebalance storm". In both cases, the problem is
> due
> > to MirrorMaker using high-level consumer groups for replication.
> >
> > MM2 does not use consumer groups at all, but instead manages its own
> > partition assignments and offsets. MirrorSourceConnector monitors
> > topic-partitions and assigns them to MirrorSourceTasks directly -- there
> > are no high-level subscriptions and therefore no rebalances. Likewise,
> > MirrorSourceConnector stores its own offsets in the target cluster, so no
> > state information is lost if the source cluster disappears. Both of these
> > features are facilitated by the Connect framework and were inspired by
> &

Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-12-05 Thread Sönke Liebau
Hi Ryanne,

when you say "Currently worker configs apply across the entire cluster,
which is limiting even for use-cases involving a single Kafka cluster.",
may I ask you to elaborate on those limitations a little?
The only thing that I could come up with is the limitation to a single
offset commit interval value for all running connectors.
Maybe also the limitation to shared config providers..

But you sound like you had painful experiences with this before, maybe
you'd like to share the burden :)

Best regards,
Sönke

On Wed, Dec 5, 2018 at 5:15 AM Ryanne Dolan  wrote:

> Sönke,
>
> I think so long as we can keep the differences at a very high level (i.e.
> the "control plane"), there is little downside to MM2 and Connect
> coexisting. I do expect them to converge to some extent, with features from
> MM2 being pulled into Connect whenever this is possible without breaking
> things.
>
> I could definitely see your idea re hierarchies or groups of connectors
> being useful outside MM2. Currently "worker configs" apply across the
> entire cluster, which is limiting even for use-cases involving a single
> Kafka cluster. If Connect supported multiple workers in the same cluster,
> it would start to look a lot like a MM2 cluster.
>
> Ryanne
>
> On Tue, Dec 4, 2018 at 3:26 PM Sönke Liebau
>  wrote:
>
> > Hi Ryanne,
> >
> > thanks for your response!
> >
> > It seems like you have already done a lot of investigation into the
> > existing code and the solution design and all of what you write makes
> sense
> > to me. Would it potentially be worth adding this to the KIP, now that you
> > had to write it up because of me anyway?
> >
> > However, I am afraid that I am still not entirely convinced of the
> > fundamental benefit this provides over an extended Connect that has the
> > following functionality:
> > - allow for organizing connectors into a hierarchical structure -
> > "clusters/us-west/..."
> > - allow defining external Kafka clusters to be used by Source and Sink
> > connectors instead of the local cluster
> >
> > Personally I think both of these features are useful additions to
> Connect,
> > I'll address both separately below.
> >
> > Allowing to structure connectors in a hierarchy
> > Organizing running connectors will grow more important as corporate
> > customers adapt Connect and installations grow in size. Additionally this
> > could be useful for ACLs in case they are ever added to Connect, as you
> > could allow specific users access only to specific namespaces (and until
> > ACLs are added it would facilitate using a reverse proxy for the same
> > effect).
> >
> > Allow accessing multiple external clusters
> > The reasoning for this feature is pretty much the same as for a central
> > Mirror Maker cluster, if a company has multiple clusters for whatever
> > reason but wants to have ingest centralized in one system aka one Connect
> > cluster they would need the ability to read from and write to an
> arbitrary
> > number of Kafka clusters.
> > I haven't really looked at the code, just poked around a couple of
> minutes,
> > but it appears like this could be done with fairly low effort. My general
> > idea would be to leave the existing configuration options untouched -
> > Connect will always need a "primary" cluster that is used for storage of
> > internal data (config, offsets, status) there is no need to break
> existing
> > configs. But additionally allow adding named extra clusters by specifying
> > options like
> >   external.sales_cluster.bootstrap_servers=...
> >   external.sales_cluster.ssl.keystore.location=...
> >   external.marketing_cluster.bootstrap_servers=...
> >
> > The code for status, offset and config storage is mostly isolated in the
> > Kafka[Offset|Status|Config]BackingStore classes and could remain pretty
> > much unchanged.
> >
> > Producer and consumer creation for Tasks is done in the Worker as of
> > KAFKA-7551 and is isolated in two functions. We could add a two more
> > functions with an extra argument for the external cluster name to be used
> > and return fitting consumers/producers.
> > The source and sink config would then simply gain an optional setting to
> > specify the cluster name.
> >
> > I am very sure that I am missing a few large issues with these ideas, I'm
> > mostly back-of-the-napkin designing here, but it might be worth a second
> > look.
> >
> > Once we decide to diverge into two clusters: MirrorMaker and Connect, I
> > think realistically the chance of tho

Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-12-04 Thread Sönke Liebau
quests like:
>
> GET /clusters
>
> GET /clusters/us-west/connectors
>
> PUT /clusters/us-west/connectors/us-east/config
> { "topics" : "topic1" }
>
> etc.
>
> So on the whole, very little code is involved in implementing "MirrorMaker
> clusters". I won't rule out adding additional features on top of this basic
> API, but nothing should require re-implementing what is already in Connect.
>
> > Wouldn't it be a viable alternative to look into extending Connect itself
>
> Maybe Connect will evolve to the point where Connect clusters and
> MirrorMaker clusters are indistinguishable, but I think this is unlikely,
> since really no use-case outside replication would benefit from the added
> complexity. Moreover, I think support for multiple Kafka clusters would be
> hard to add without significant changes to the existing APIs and configs,
> which all assume a single Kafka cluster. I think Connect-as-a-Service and
> Replication-as-a-Service are sufficiently different use-cases that we
> should expect the APIs and configuration files to be at least slightly
> different, even if both use the same framework underneath. That said, I do
> plan to contribute a few improvements to the Connect framework in support
> of MM2 -- just nothing within the scope of the current KIP.
>
> Thanks again!
> Ryanne
>
>
> On Fri, Nov 30, 2018 at 3:47 AM Sönke Liebau
>  wrote:
>
> > Hi Ryanne,
> >
> > thanks. I missed the remote to remote replication scenario in my train of
> > thought, you are right.
> >
> > That being said I have to admit that I am not yet fully on board with the
> > concept, sorry. But I might just be misunderstanding what your intention
> > is. Let me try and explain what I think it is you are trying to do and
> why
> > I am on the fence about that and take it from there.
> >
> > You want to create an extra mirrormaker driver class which will take
> > multiple clusters as configuration options. Based on these clusters it
> will
> > then reuse the connect workers and create as many as necessary to be able
> > to replicate to/from each of those configured clusters.  It will then
> > expose a rest api (since you stated subset of Connect rest api I assume
> it
> > will be a new / own one?)  that allows users to send requests like
> > "replicate topic a from cluster 1 to cluster 1" and start a connector on
> > the relevant worker that can offer this "route".
> > This can be extended to a cluster by starting mirror maker drivers on
> other
> > nodes with the same config and it would offer all the connect features of
> > balancing restarting in case of failure etc.
> >
> > If this understanding is correct then it just feels to me like an awful
> lot
> > of Connect functionality would need to be reimplemented or at least
> > wrapped, which potentially could mean additional effort for maintaining
> and
> > extending Connect down the line. Wouldn't it be a viable alternative to
> > look into extending Connect itself to allow defining "remote clusters"
> > which can then be specified in the connector config to be used instead of
> > the local cluster? I imagine that change itself would not be too
> extensive,
> > the main effort would probably be in coming up with a sensible config
> > structure and ensuring backwards compatibility with existing connector
> > configs.
> > This would still allow to use a regular Connect cluster for an arbitrary
> > number of clusters, thus still having a dedicated MirrorMaker cluster by
> > running only MirrorMaker Connectors in there if you want the isolation. I
> > agree that it would not offer the level of abstraction around replication
> > that your concept would enable to implement, but I think if would be far
> > less implementation and maintenance effort.
> >
> > But again, all of that is based on my, potentially flawed, understanding
> of
> > your proposal, please feel free to correct me :)
> >
> > Best regards,
> > Sönke
> >
> > On Fri, Nov 30, 2018 at 1:39 AM Ryanne Dolan 
> > wrote:
> >
> > > Sönke, thanks for the feedback!
> > >
> > > >  the renaming policy [...] can be disabled [...] The KIP itself does
> > not
> > > mention this
> > >
> > > Good catch. I've updated the KIP to call this out.
> > >
> > > > "MirrorMaker clusters" I am not sure I fully understand the issue you
> > > are trying to solve
> > >
> > > MirrorMaker today is not scalable from an operational perspective.
> Celia
&g

Re: [DISCUSS] KIP-398: Support reading trust store from classpath

2018-12-04 Thread Sönke Liebau
Hi Neo,

thanks for the KIP, the proposal sounds useful!
Also I agree on both assumptions that you made:
 - users whose current truststore location starts with classpath: should be
very few and extremely far between (and arguably made questionable choices
when naming their files/directories), I personally think it is safe to
ignore these
 - this could also be useful for loading keystores, not just truststores

One additional idea maybe, looking at the Spring documentation they seem to
support filesystem, classpath and URL resources. Would it make sense to add
something to allow loading the truststore from a url as well when touching
this functionality?

Best regards,
Sönke


On Fri, Nov 30, 2018 at 6:01 PM Noa Resare  wrote:

> I wrote a KIP for my minimal suggested change to support reading a
> truststore from the classpath as well as from a file.
>
> The KIP is available here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-398%3A+Support+reading+trust+store+from+classpath
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-398:+Support+reading+trust+store+from+classpath
> >
>
> Any feedback or comments would be most welcome.
>
> Cheers
> Noa



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-11-30 Thread Sönke Liebau
Hi Ryanne,

thanks. I missed the remote to remote replication scenario in my train of
thought, you are right.

That being said I have to admit that I am not yet fully on board with the
concept, sorry. But I might just be misunderstanding what your intention
is. Let me try and explain what I think it is you are trying to do and why
I am on the fence about that and take it from there.

You want to create an extra mirrormaker driver class which will take
multiple clusters as configuration options. Based on these clusters it will
then reuse the connect workers and create as many as necessary to be able
to replicate to/from each of those configured clusters.  It will then
expose a rest api (since you stated subset of Connect rest api I assume it
will be a new / own one?)  that allows users to send requests like
"replicate topic a from cluster 1 to cluster 1" and start a connector on
the relevant worker that can offer this "route".
This can be extended to a cluster by starting mirror maker drivers on other
nodes with the same config and it would offer all the connect features of
balancing restarting in case of failure etc.

If this understanding is correct then it just feels to me like an awful lot
of Connect functionality would need to be reimplemented or at least
wrapped, which potentially could mean additional effort for maintaining and
extending Connect down the line. Wouldn't it be a viable alternative to
look into extending Connect itself to allow defining "remote clusters"
which can then be specified in the connector config to be used instead of
the local cluster? I imagine that change itself would not be too extensive,
the main effort would probably be in coming up with a sensible config
structure and ensuring backwards compatibility with existing connector
configs.
This would still allow to use a regular Connect cluster for an arbitrary
number of clusters, thus still having a dedicated MirrorMaker cluster by
running only MirrorMaker Connectors in there if you want the isolation. I
agree that it would not offer the level of abstraction around replication
that your concept would enable to implement, but I think if would be far
less implementation and maintenance effort.

But again, all of that is based on my, potentially flawed, understanding of
your proposal, please feel free to correct me :)

Best regards,
Sönke

On Fri, Nov 30, 2018 at 1:39 AM Ryanne Dolan  wrote:

> Sönke, thanks for the feedback!
>
> >  the renaming policy [...] can be disabled [...] The KIP itself does not
> mention this
>
> Good catch. I've updated the KIP to call this out.
>
> > "MirrorMaker clusters" I am not sure I fully understand the issue you
> are trying to solve
>
> MirrorMaker today is not scalable from an operational perspective. Celia
> Kung at LinkedIn does a great job of explaining this problem [1], which has
> caused LinkedIn to drop MirrorMaker in favor of Brooklin. With Brooklin, a
> single cluster, single API, and single UI controls replication flows for an
> entire data center. With MirrorMaker 2.0, the vision is much the same.
>
> If your data center consists of a small number of Kafka clusters and an
> existing Connect cluster, it might make more sense to re-use the Connect
> cluster with MirrorSource/SinkConnectors. There's nothing wrong with this
> approach for small deployments, but this model also doesn't scale. This is
> because Connect clusters are built around a single Kafka cluster -- what I
> call the "primary" cluster -- and all Connectors in the cluster must either
> consume from or produce to this single cluster. If you have more than one
> "active" Kafka cluster in each data center, you'll end up needing multiple
> Connect clusters there as well.
>
> The problem with Connect clusters for replication is way less severe
> compared to legacy MirrorMaker. Generally you need one Connect cluster per
> active Kafka cluster. As you point out, MM2's SinkConnector means you can
> get away with a single Connect cluster for topologies that center around a
> single primary cluster. But each Connector within each Connect cluster must
> be configured independently, with no high-level view of your replication
> flows within and between data centers.
>
> With MirrorMaker 2.0, a single MirrorMaker cluster manages replication
> across any number of Kafka clusters. Much like Brooklin, MM2 does the work
> of setting up connectors between clusters as needed. This
> Replication-as-a-Service is a huge win for larger deployments, as well as
> for organizations that haven't adopted Connect.
>
> [1]
> https://www.slideshare.net/ConfluentInc/more-data-more-problems-scaling-kafkamirroring-pipelines-at-linkedin
>
> Keep the questions coming! Thanks.
> Ryanne
>
> On Thu, Nov 29, 2018 at 3:30 AM Sönke Liebau 
> wr

Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-11-29 Thread Sönke Liebau
This has been dormant for a while now, can I interest anybody in chiming in
here?

I think we need to come up with an idea of how to handle changes to ACLs
going forward, i.e. some sort of versioning scheme. Not necessarily what I
proposed in my previous mail, but something.
Currently this fairly simple change is stuck due to this being unsolved.

I am happy to move forward without addressing the larger issue (I think the
issue raised by Colin is valid but could be mitigated in the release
notes), but that would mean that the next KIP to touch ACLs would inherit
the issue, which somehow doesn't seem right.

Looking forward to your input :)

Best regards,
Sönke

On Tue, Jun 19, 2018 at 5:32 PM Sönke Liebau 
wrote:

> Picking this back up, now that KIP-290 has been merged..
>
> As Colin mentioned in an earlier mail this change could create a
> potential security issue if not all brokers are upgraded and a DENY
> Acl based on an IP range is created, as old brokers won't match this
> rule and still allow requests. As I stated earlier I am not sure
> whether for this specific change this couldn't be handled via the
> release notes (see also this comment [1] from Jun Rao on a similar
> topic), but in principle I think some sort of versioning system around
> ACLs would be useful. As seen in KIP-290 there were a few
> complications around where to store ACLs. To avoid adding ever new
> Zookeeper paths for future ACL changes a versioning system is probably
> useful.
>
> @Andy: I've copied you directly in this mail, since you did a bulk of
> the work around KIP-290 and mentioned potentially picking up the
> follow up work, so I think your input would be very valuable here. Not
> trying to shove extra work your way, I'm happy to contribute, but we'd
> be touching a lot of the same areas I think.
>
> If we want to implement a versioning system for ACLs I see the
> following todos (probably incomplete & missing something at the same
> time):
> 1. ensure that the current Authorizer doesn't pick up newer ACLs
> 2. add a version marker to new ACLs
> 3. change SimpleACLAuthorizer to know what version of ACLs it is
> compatible with and only load ACLs of this / smaller version
> 4. Decide how to handle if incompatible (newer version) ACLs are
> present: log warning, fail broker startup, ...
>
>
> Post-KIP-290 ACLs are stored in two places in Zookeeper:
> /kafka-acl-extended   - for ACLs with wildcards in the resource
> /kafka-acl   -  for literal ACLs without wildcards (i.e. * means * not
> any character)
>
> To ensure 1 we probably need to move to a new directory once more,
> call it /kafka-acl-extended-new for arguments sake. Any ACL stored
> here would get a version number stored with it, and only
> SimpleAuthorizers that actually know to look here would find these
> ACLs and also know to check for a version number. I think Andy
> mentioned moving the resource definition in the new ACL format to JSON
> instead of simple string in a follow up PR, maybe these pieces of work
> are best tackled together - and if a new znode can be avoided even
> better.
>
> This would allow us to recognize situations where ACLs are defined
> that not all Authorizers can understand, as those Authorizers would
> notice that there are ACLs with a larger version than the one they
> support (not applicable to legacy ACLs up until now). How we want to
> treat this scenario is up for discussion, I think make it
> configurable, as customers have different requirements around
> security. Some would probably want to fail a broker that encounters
> unknown ACLs so as to not create potential security risks t others
> might be happy with just a warning in the logs. This should never
> happen, if users fully upgrade their clusters before creating new ACLs
> - but to counteract the situation that Colin described it would be
> useful.
>
> Looking forward, a migration option might be added to the kafka-acl
> tool to migrate all legacy ACLs once into the new structure once the
> user is certain that no old brokers will come online again.
>
> If you think this sounds like a convoluted way to go about things ...
> I agree :) But I couldn't come up with a better way yet.
>
> Any thoughts?
>
> Best regards,
> Sönke
>
> [1] https://github.com/apache/kafka/pull/5079#pullrequestreview-124512689
>
> On Thu, May 3, 2018 at 10:57 PM, Sönke Liebau
>  wrote:
> > Technically I absolutely agree with you, this would indeed create
> > issues. If we were just talking about this KIP I think I'd argue that
> > it is not too harsh of a requirement for users to refrain from using
> > new features until they have fully upgraded their entire cluster. I
> > think in that case it could have been solved in the release notes 

Re: [DISCUSS] KIP-382: MirrorMaker 2.0

2018-11-29 Thread Sönke Liebau
, a KafkaSourceConnector would
> coordinate
> >> via
> >> > > the
> >> > > > >>> > source cluster. We can do better than this, but I'm
> deferring
> >> this
> >> > > > >>> > optimization for now.
> >> > > > >>> >
> >> > > > >>> > 2) exactly-once between two clusters is mind-bending. But
> >> keep in
> >> > > > mind
> >> > > > >>> that
> >> > > > >>> > transactions are managed by the producer, not the consumer.
> In
> >> > > fact,
> >> > > > >>> it's
> >> > > > >>> > the producer that requests that offsets be committed for the
> >> > > current
> >> > > > >>> > transaction. Obviously, these offsets are committed in
> >> whatever
> >> > > > >>> cluster the
> >> > > > >>> > producer is sending to.
> >> > > > >>> >
> >> > > > >>> > These two issues are closely related. They are both resolved
> >> by not
> >> > > > >>> > coordinating or committing via the source cluster. And in
> >> fact,
> >> > > this
> >> > > > >>> is the
> >> > > > >>> > general model of SourceConnectors anyway, since most
> >> > > SourceConnectors
> >> > > > >>> > _only_ have a destination cluster.
> >> > > > >>> >
> >> > > > >>> > If there is a lot of interest here, I can expound further on
> >> this
> >> > > > >>> aspect of
> >> > > > >>> > MM2, but again I think this is premature until this first
> KIP
> >> is
> >> > > > >>> approved.
> >> > > > >>> > I intend to address each of these in separate KIPs following
> >> this
> >> > > > one.
> >> > > > >>> >
> >> > > > >>> > Ryanne
> >> > > > >>> >
> >> > > > >>> > On Wed, Oct 17, 2018 at 7:09 AM Jan Filipiak <
> >> > > > jan.filip...@trivago.com
> >> > > > >>> >
> >> > > > >>> > wrote:
> >> > > > >>> >
> >> > > > >>> > > This is not a performance optimisation. Its a fundamental
> >> design
> >> > > > >>> choice.
> >> > > > >>> > >
> >> > > > >>> > >
> >> > > > >>> > > I never really took a look how streams does exactly once.
> >> (its a
> >> > > > trap
> >> > > > >>> > > anyways and you usually can deal with at least once
> >> donwstream
> >> > > > pretty
> >> > > > >>> > > easy). But I am very certain its not gonna get somewhere
> if
> >> > > offset
> >> > > > >>> > > commit and record produce cluster are not the same.
> >> > > > >>> > >
> >> > > > >>> > > Pretty sure without this _design choice_ you can skip on
> >> that
> >> > > > exactly
> >> > > > >>> > > once already
> >> > > > >>> > >
> >> > > > >>> > > Best Jan
> >> > > > >>> > >
> >> > > > >>> > > On 16.10.2018 18:16, Ryanne Dolan wrote:
> >> > > > >>> > > >  >  But one big obstacle in this was
> >> > > > >>> > > > always that group coordination happened on the source
> >> cluster.
> >> > > > >>> > > >
> >> > > > >>> > > > Jan, thank you for bringing up this issue with legacy
> >> > > > MirrorMaker.
> >> > > > >>> I
> >> > > > >>> > > > totally agree with you. This is one of several problems
> >> with
> >> > > > >>> MirrorMaker
> >> > > > >>> > > > I intend to solve in MM2, and I already have a design
> and
> >> > > > >>> prototype that
> >> > > > >>> > > > solves this and related issues. But as you pointed out,
> >> this
> >> > > KIP
> >> > > > is
> >> > > > >>> > > > already rather complex, and I want to focus on the core
> >> feature
> >> > > > set
> >> > > > >>> > > > rather than performance optimizations for now. If we can
> >> agree
> >> > > on
> >> > > > >>> what
> >> > > > >>> > > > MM2 looks like, it will be very easy to agree to improve
> >> its
> >> > > > >>> performance
> >> > > > >>> > > > and reliability.
> >> > > > >>> > > >
> >> > > > >>> > > > That said, I look forward to your support on a
> subsequent
> >> KIP
> >> > > > that
> >> > > > >>> > > > addresses consumer coordination and rebalance issues.
> Stay
> >> > > tuned!
> >> > > > >>> > > >
> >> > > > >>> > > > Ryanne
> >> > > > >>> > > >
> >> > > > >>> > > > On Tue, Oct 16, 2018 at 6:58 AM Jan Filipiak <
> >> > > > >>> jan.filip...@trivago.com
> >> > > > >>> > > > <mailto:jan.filip...@trivago.com>> wrote:
> >> > > > >>> > > >
> >> > > > >>> > > > Hi,
> >> > > > >>> > > >
> >> > > > >>> > > > Currently MirrorMaker is usually run collocated with
> >> the
> >> > > > target
> >> > > > >>> > > > cluster.
> >> > > > >>> > > > This is all nice and good. But one big obstacle in
> >> this was
> >> > > > >>> > > > always that group coordination happened on the
> source
> >> > > > cluster.
> >> > > > >>> So
> >> > > > >>> > > when
> >> > > > >>> > > > then network was congested, you sometimes loose
> group
> >> > > > >>> membership and
> >> > > > >>> > > > have to rebalance and all this.
> >> > > > >>> > > >
> >> > > > >>> > > > So one big request from we would be the support of
> >> having
> >> > > > >>> > > coordination
> >> > > > >>> > > > cluster != source cluster.
> >> > > > >>> > > >
> >> > > > >>> > > > I would generally say a LAN is better than a WAN for
> >> doing
> >> > > > >>> group
> >> > > > >>> > > > coordinaton and there is no reason we couldn't have
> a
> >> group
> >> > > > >>> consuming
> >> > > > >>> > > > topics from a different cluster and committing
> >> offsets to
> >> > > > >>> another
> >> > > > >>> > > > one right?
> >> > > > >>> > > >
> >> > > > >>> > > > Other than that. It feels like the KIP has too much
> >> > > features
> >> > > > >>> where
> >> > > > >>> > > many
> >> > > > >>> > > > of them are not really wanted and counter productive
> >> but I
> >> > > > >>> will just
> >> > > > >>> > > > wait and see how the discussion goes.
> >> > > > >>> > > >
> >> > > > >>> > > > Best Jan
> >> > > > >>> > > >
> >> > > > >>> > > >
> >> > > > >>> > > > On 15.10.2018 18:16, Ryanne Dolan wrote:
> >> > > > >>> > > >  > Hey y'all!
> >> > > > >>> > > >  >
> >> > > > >>> > > >  > Please take a look at KIP-382:
> >> > > > >>> > > >  >
> >> > > > >>> > > >  >
> >> > > > >>> > > >
> >> > > > >>> > >
> >> > > > >>>
> >> > > >
> >> > >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0
> >> > > > >>> > > >  >
> >> > > > >>> > > >  > Thanks for your feedback and support.
> >> > > > >>> > > >  >
> >> > > > >>> > > >  > Ryanne
> >> > > > >>> > > >  >
> >> > > > >>> > > >
> >> > > > >>> > >
> >> > > > >>>
> >> > > > >>
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Best,
> >> > > Alex Mironov
> >> > >
> >> >
> >>
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [ANNOUNCE] New Committer: Manikumar Reddy

2018-10-11 Thread Sönke Liebau
Great news, congratulations Manikumar!!

On Thu, Oct 11, 2018 at 9:08 PM Vahid Hashemian 
wrote:

> Congrats Manikumar!
>
> On Thu, Oct 11, 2018 at 11:49 AM Ryanne Dolan 
> wrote:
>
> > Bravo!
> >
> > On Thu, Oct 11, 2018 at 1:48 PM Ismael Juma  wrote:
> >
> > > Congratulations Manikumar! Thanks for your continued contributions.
> > >
> > > Ismael
> > >
> > > On Thu, Oct 11, 2018 at 10:39 AM Jason Gustafson 
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > The PMC for Apache Kafka has invited Manikumar Reddy as a committer
> and
> > > we
> > > > are
> > > > pleased to announce that he has accepted!
> > > >
> > > > Manikumar has contributed 134 commits including significant work to
> add
> > > > support for delegation tokens in Kafka:
> > > >
> > > > KIP-48:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka
> > > > KIP-249
> > > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+KafkaKIP-249
> > > >
> > > > :
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient
> > > >
> > > > He has broad experience working with many of the core components in
> > Kafka
> > > > and he has reviewed over 80 PRs. He has also made huge progress
> > > addressing
> > > > some of our technical debt.
> > > >
> > > > We appreciate the contributions and we are looking forward to more.
> > > > Congrats Manikumar!
> > > >
> > > > Jason, on behalf of the Apache Kafka PMC
> > > >
> > >
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [Review] - Pull request #5681 review

2018-10-08 Thread Sönke Liebau
Hi Srinivas,

thanks for your interest in contributing and the work you put into the PR!
I've looked it over and left a small comment.

In general you will probably need to be a bit patient with reviews. Kafka
is a project which a lot of people are interested in and a fairly small
group of committers, so it is hard for them to keep up with pull requests.

That being said, you did exactly the right thing in waiting a little while
and then bumping on the mailing list, just trying to manage expectations
here ;)

Best regards,
Sönke

On Mon, Oct 8, 2018 at 8:03 AM Srinivas Reddy 
wrote:

> Hi,
>
> After working on Kafka for sometime, I got motivated to start contribute to
> Kafka. As this is my first commit here thought of refactored some unit
> tests with minimal changes after going through contribution guidelines.
>
> I worked over weekend for code changes and local testing before raising the
> PR. But suprisingly no review comments on it.
>
> Here is the link my initial attempt:
> https://github.com/apache/kafka/pull/5681
>
> Please let me know if there is anything needs to be done from my side to
> get it reviewed.
>
> Thank you in advance.
>
> -
> Srinivas
>
> - Typed on tiny keys. pls ignore typos.{mobile app}
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317: Transparent Data Encryption

2018-08-10 Thread Sönke Liebau
Hi Viktor,

thanks for your input! We could accommodate magic headers by removing any
known fixed bytes pre-encryption, sticking them in a header field and
prepending them after decryption. However, I am not sure whether this is
actually necessary, as most modern (AES for sure) algorithms are considered
to be resistant to known-plaintext types of attack. Even if the entire
plaintext is known to the attacker he still needs to brute-force the key -
which may take a while.

Something different to consider in this context are compression sidechannel
attacks like CRIME or BREACH, which may be relevant depending on what type
of data is being sent through Kafka. Both these attacks depend on the
encrypted record containing a combination of secret and user controlled
data.
For example if Kafka was used to forward data that the user entered on a
website along with a secret API key that the website adds to a back-end
server and the user can obtain the Kafka messages, these attacks would
become relevant. Not much we can do about that except disallow encryption
when compression is enabled (TLS chose this approach in version 1.3)

I agree with you, that we definitely need to clearly document any risks and
how much security can reasonably be expected in any given scenario. We
might even consider logging a warning message when sending data that is
compressed and encrypted.

On a different note, I've started amending the KIP to make key management
and distribution pluggable, should hopefully be able to publish sometime
Monday.

Best regards,
Sönke


On Thu, Jun 21, 2018 at 12:26 PM, Viktor Somogyi 
wrote:

> Hi Sönke,
>
> Compressing before encrypting has its dangers as well. Suppose you have a
> known compression format which adds a magic header and you're using a block
> cipher with a small enough block, then it becomes much easier to figure out
> the encryption key. For instance you can look at Snappy's stream
> identifier: https://github.com/google/snappy/blob/master/framing_
> format.txt
> . Based on this you should only use block ciphers where block sizes are
> much larger then 6 bytes. AES for instance should be good with its 128 bits
> = 16 bytes but even this isn't entirely secure as the first 6 bytes already
> leaked some information - and it depends on the cypher that how much it is.
> Also if we suppose that an adversary accesses a broker and takes all the
> data, they'll have much easier job to decrypt it as they'll have much more
> examples.
> So overall we should make sure to define and document the compatible
> encryptions with the supported compression methods and the level of
> security they provide to make sure the users are fully aware of the
> security implications.
>
> Cheers,
> Viktor
>
> On Tue, Jun 19, 2018 at 11:55 AM Sönke Liebau
>  wrote:
>
> > Hi Stephane,
> >
> > thanks for pointing out the broken pictures, I fixed those.
> >
> > Regarding encrypting before or after batching the messages, you are
> > correct, I had not thought of compression and how this changes things.
> > Encrypted data does not really encrypt well. My reasoning at the time
> > of writing was that if we encrypt the entire batch we'd have to wait
> > for the batch to be full before starting to encrypt. Whereas with per
> > message encryption we can encrypt them as they come in and more or
> > less have them ready for sending when the batch is complete.
> > However I think the difference will probably not be that large (will
> > do some testing) and offset by just encrypting once instead of many
> > times, which has a certain overhead every time. Also, from a security
> > perspective encrypting longer chunks of data is preferable - another
> > benefit.
> >
> > This does however take away the ability of the broker to see the
> > individual records inside the encrypted batch, so this would need to
> > be stored and retrieved as a single record - just like is done for
> > compressed batches. I am not 100% sure that this won't create issues,
> > especially when considering transactions, I will need to look at the
> > compression code some more. In essence though, since it works for
> > compression I see no reason why it can't be made to work here.
> >
> > On a different note, going down this route might make us reconsider
> > storing the key with the data, as this might significantly reduce
> > storage overhead - still much higher than just storing them once
> > though.
> >
> > Best regards,
> > Sönke
> >
> > On Tue, Jun 19, 2018 at 5:59 AM, Stephane Maarek
> >  wrote:
> > > Hi Sonke
> > >
> > > Very much needed feature and discussion. FYI the image links seem
> broken.
> > >
> > > My 2 cents (if 

Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-06-19 Thread Sönke Liebau
Picking this back up, now that KIP-290 has been merged..

As Colin mentioned in an earlier mail this change could create a
potential security issue if not all brokers are upgraded and a DENY
Acl based on an IP range is created, as old brokers won't match this
rule and still allow requests. As I stated earlier I am not sure
whether for this specific change this couldn't be handled via the
release notes (see also this comment [1] from Jun Rao on a similar
topic), but in principle I think some sort of versioning system around
ACLs would be useful. As seen in KIP-290 there were a few
complications around where to store ACLs. To avoid adding ever new
Zookeeper paths for future ACL changes a versioning system is probably
useful.

@Andy: I've copied you directly in this mail, since you did a bulk of
the work around KIP-290 and mentioned potentially picking up the
follow up work, so I think your input would be very valuable here. Not
trying to shove extra work your way, I'm happy to contribute, but we'd
be touching a lot of the same areas I think.

If we want to implement a versioning system for ACLs I see the
following todos (probably incomplete & missing something at the same
time):
1. ensure that the current Authorizer doesn't pick up newer ACLs
2. add a version marker to new ACLs
3. change SimpleACLAuthorizer to know what version of ACLs it is
compatible with and only load ACLs of this / smaller version
4. Decide how to handle if incompatible (newer version) ACLs are
present: log warning, fail broker startup, ...


Post-KIP-290 ACLs are stored in two places in Zookeeper:
/kafka-acl-extended   - for ACLs with wildcards in the resource
/kafka-acl   -  for literal ACLs without wildcards (i.e. * means * not
any character)

To ensure 1 we probably need to move to a new directory once more,
call it /kafka-acl-extended-new for arguments sake. Any ACL stored
here would get a version number stored with it, and only
SimpleAuthorizers that actually know to look here would find these
ACLs and also know to check for a version number. I think Andy
mentioned moving the resource definition in the new ACL format to JSON
instead of simple string in a follow up PR, maybe these pieces of work
are best tackled together - and if a new znode can be avoided even
better.

This would allow us to recognize situations where ACLs are defined
that not all Authorizers can understand, as those Authorizers would
notice that there are ACLs with a larger version than the one they
support (not applicable to legacy ACLs up until now). How we want to
treat this scenario is up for discussion, I think make it
configurable, as customers have different requirements around
security. Some would probably want to fail a broker that encounters
unknown ACLs so as to not create potential security risks t others
might be happy with just a warning in the logs. This should never
happen, if users fully upgrade their clusters before creating new ACLs
- but to counteract the situation that Colin described it would be
useful.

Looking forward, a migration option might be added to the kafka-acl
tool to migrate all legacy ACLs once into the new structure once the
user is certain that no old brokers will come online again.

If you think this sounds like a convoluted way to go about things ...
I agree :) But I couldn't come up with a better way yet.

Any thoughts?

Best regards,
Sönke

[1] https://github.com/apache/kafka/pull/5079#pullrequestreview-124512689

On Thu, May 3, 2018 at 10:57 PM, Sönke Liebau
 wrote:
> Technically I absolutely agree with you, this would indeed create
> issues. If we were just talking about this KIP I think I'd argue that
> it is not too harsh of a requirement for users to refrain from using
> new features until they have fully upgraded their entire cluster. I
> think in that case it could have been solved in the release notes -
> similarly to the way a binary protocol change is handled.
> However looking at the discussion on KIP-290 and thinking ahead to
> potential other changes on ACLs it would really just mean putting off
> a proper solution which is a versioning system for ACLs makes sense.
>
> At least from the point of view of this KIP versioning should be a
> separate KIP as otherwise we don't solve the issue you mentioned above
> - not sure about 290..
>
> I thought about this for a little while, would something like the
> following make sense?
>
> ACLs are either stored in a separate Zookeeper node or get a version
> stored with them (separate node is probably easier). So current ACLs
> would default to v0 and post-KIP252 would be an explicit v1 for
> example.
> Authorizers declare which versions they are compatible with (though
> I'd say i  backwards compatibility is what we shoud shoot for) and
> load ACLs of those versions.
> Introduce a new parameter authorizer.acl.maxversion which controls
> which ACLs are loaded by the authorizer - nothing wit

Re: [DISCUSS] KIP-317: Transparent Data Encryption

2018-06-19 Thread Sönke Liebau
Hi Stephane,

thanks for pointing out the broken pictures, I fixed those.

Regarding encrypting before or after batching the messages, you are
correct, I had not thought of compression and how this changes things.
Encrypted data does not really encrypt well. My reasoning at the time
of writing was that if we encrypt the entire batch we'd have to wait
for the batch to be full before starting to encrypt. Whereas with per
message encryption we can encrypt them as they come in and more or
less have them ready for sending when the batch is complete.
However I think the difference will probably not be that large (will
do some testing) and offset by just encrypting once instead of many
times, which has a certain overhead every time. Also, from a security
perspective encrypting longer chunks of data is preferable - another
benefit.

This does however take away the ability of the broker to see the
individual records inside the encrypted batch, so this would need to
be stored and retrieved as a single record - just like is done for
compressed batches. I am not 100% sure that this won't create issues,
especially when considering transactions, I will need to look at the
compression code some more. In essence though, since it works for
compression I see no reason why it can't be made to work here.

On a different note, going down this route might make us reconsider
storing the key with the data, as this might significantly reduce
storage overhead - still much higher than just storing them once
though.

Best regards,
Sönke

On Tue, Jun 19, 2018 at 5:59 AM, Stephane Maarek
 wrote:
> Hi Sonke
>
> Very much needed feature and discussion. FYI the image links seem broken.
>
> My 2 cents (if I understood correctly): you say "This process will be
> implemented after Serializer and Interceptors are done with the message
> right before it is added to the batch to be sent, in order to ensure that
> existing serializers and interceptors keep working with encryption just
> like without it."
>
> I think encryption should happen AFTER a batch is created, right before it
> is sent. Reason is that if we want to still keep advantage of compression,
> encryption needs to happen after it (and I believe compression happens on a
> batch level).
> So to me for a producer: serializer / interceptors => batching =>
> compression => encryption => send.
> and the inverse for a consumer.
>
> Regards
> Stephane
>
> On 19 June 2018 at 06:46, Sönke Liebau 
> wrote:
>
>> Hi everybody,
>>
>> I've created a draft version of KIP-317 which describes the addition
>> of transparent data encryption functionality to Kafka.
>>
>> Please consider this as a basis for discussion - I am aware that this
>> is not at a level of detail sufficient for implementation, but I
>> wanted to get some feedback from the community on the general idea
>> before spending more time on this.
>>
>> Link to the KIP is:
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 317%3A+Add+transparent+data+encryption+functionality
>>
>> Best regards,
>> Sönke
>>



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


[DISCUSS] KIP-317: Transparent Data Encryption

2018-06-18 Thread Sönke Liebau
Hi everybody,

I've created a draft version of KIP-317 which describes the addition
of transparent data encryption functionality to Kafka.

Please consider this as a basis for discussion - I am aware that this
is not at a level of detail sufficient for implementation, but I
wanted to get some feedback from the community on the general idea
before spending more time on this.

Link to the KIP is:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+transparent+data+encryption+functionality

Best regards,
Sönke


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-05-03 Thread Sönke Liebau
Technically I absolutely agree with you, this would indeed create
issues. If we were just talking about this KIP I think I'd argue that
it is not too harsh of a requirement for users to refrain from using
new features until they have fully upgraded their entire cluster. I
think in that case it could have been solved in the release notes -
similarly to the way a binary protocol change is handled.
However looking at the discussion on KIP-290 and thinking ahead to
potential other changes on ACLs it would really just mean putting off
a proper solution which is a versioning system for ACLs makes sense.

At least from the point of view of this KIP versioning should be a
separate KIP as otherwise we don't solve the issue you mentioned above
- not sure about 290..

I thought about this for a little while, would something like the
following make sense?

ACLs are either stored in a separate Zookeeper node or get a version
stored with them (separate node is probably easier). So current ACLs
would default to v0 and post-KIP252 would be an explicit v1 for
example.
Authorizers declare which versions they are compatible with (though
I'd say i  backwards compatibility is what we shoud shoot for) and
load ACLs of those versions.
Introduce a new parameter authorizer.acl.maxversion which controls
which ACLs are loaded by the authorizer - nothing with a version
higher than specified here gets loaded, even if the Authorizer would
be able to.

So the process for a cluster update would be similar to a binary
protocol change, set authorizer.acl.maxversion to new_version - 1.
Upgrade brokers one by one. Once you are done, change/remove parameter
and restart cluster.

I'm sure I missed something, but sound good in principle?

Best regards,
Sönke


On Thu, May 3, 2018 at 8:15 PM, Colin McCabe <co...@cmccabe.xyz> wrote:
> There are still some problems with compatibility here, right?
>
> One example is if we construct a DENY ACL with an IP range and then install 
> it.  If all of our brokers have been upgraded, it will work.  But if there 
> are some that still haven't been upgraded, they will not honor the DENY ACL, 
> possibly causing a security issue.
>
> In general, it seems like we need some kind of versioning system in ACLs to 
> handle these cases.
>
> best,
> Colin
>
> On Thu, May 3, 2018, at 08:11, Sönke Liebau wrote:
>> Hi all,
>>
>> I'd like to readopt this KIP, I got a bit sidetracked by other stuff
>> after posting the initial version and discussion, sorry for that.
>>
>> I've added IPv6 to the KIP, but decided to forego the other scope
>> extensions that I mentioned in my previous mail, as there are other
>> efforts underway in KIP-290 that cover most of the suggestions
>> already.
>>
>> Does anybody have any other objections to starting a vote on this KIP?
>>
>> Regards,
>> Sönke
>>
>> On Fri, Feb 2, 2018 at 5:11 PM, Sönke Liebau <soenke.lie...@opencore.com> 
>> wrote:
>> > Hi Manikumar,
>> >
>> > you are right, 5713 is a bit ambiguous about which fields are considered in
>> > scope, but I agree that wildcards for Ips are not necessary when we have
>> > ranges.
>> >
>> > I am wondering though, if we might want to extend the scope of this KIP a
>> > bit while we are changing acl and authorizer classes anyway.
>> >
>> > After considering this a bit on a flihht with no wifi yesterday I came up
>> > with the following:
>> >
>> > * wildcards or regular expressions for principals, groups and topics
>> > * extend the KafkaPrincipal object to allow adding custom key-value pairs 
>> > in
>> > principalbuilder implementations
>> > * extend SimpleAclAuthorizer and the ACL tools to authorize on these
>> > key/value pairs
>> >
>> > The second and third bullet points would allow easy creation of for example
>> > a principalbuilder that adds groups the user belongs to in the active
>> > directory to its principal, without requiring the user to also extend the
>> > authorizer and create custom ACL storage. This would significantly lower 
>> > the
>> > technical debt incurred by custom authorizer mechanisms I think.
>> >
>> > There are a few issues to hash out of course, but I'd think in general this
>> > should work work nicely and be a step towards meeting corporate
>> > authorization requirements.
>> >
>> > Best regards,
>> > Sönke
>> >
>> > Am 01.02.2018 18:46 schrieb "Manikumar" <manikumar.re...@gmail.com>:
>> >
>> > Hi,
>> >
>> > They are few deployments using IPv6.  It is good to support IPv6 also.
>> >
>>

Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-05-03 Thread Sönke Liebau
Hi all,

I'd like to readopt this KIP, I got a bit sidetracked by other stuff
after posting the initial version and discussion, sorry for that.

I've added IPv6 to the KIP, but decided to forego the other scope
extensions that I mentioned in my previous mail, as there are other
efforts underway in KIP-290 that cover most of the suggestions
already.

Does anybody have any other objections to starting a vote on this KIP?

Regards,
Sönke

On Fri, Feb 2, 2018 at 5:11 PM, Sönke Liebau <soenke.lie...@opencore.com> wrote:
> Hi Manikumar,
>
> you are right, 5713 is a bit ambiguous about which fields are considered in
> scope, but I agree that wildcards for Ips are not necessary when we have
> ranges.
>
> I am wondering though, if we might want to extend the scope of this KIP a
> bit while we are changing acl and authorizer classes anyway.
>
> After considering this a bit on a flihht with no wifi yesterday I came up
> with the following:
>
> * wildcards or regular expressions for principals, groups and topics
> * extend the KafkaPrincipal object to allow adding custom key-value pairs in
> principalbuilder implementations
> * extend SimpleAclAuthorizer and the ACL tools to authorize on these
> key/value pairs
>
> The second and third bullet points would allow easy creation of for example
> a principalbuilder that adds groups the user belongs to in the active
> directory to its principal, without requiring the user to also extend the
> authorizer and create custom ACL storage. This would significantly lower the
> technical debt incurred by custom authorizer mechanisms I think.
>
> There are a few issues to hash out of course, but I'd think in general this
> should work work nicely and be a step towards meeting corporate
> authorization requirements.
>
> Best regards,
> Sönke
>
> Am 01.02.2018 18:46 schrieb "Manikumar" <manikumar.re...@gmail.com>:
>
> Hi,
>
> They are few deployments using IPv6.  It is good to support IPv6 also.
>
> I think KAFKA-5713 is about adding regular expression support to resource
> names (topic. consumer etc..).
> Yes, wildcards (*) in hostname doesn't makes sense. Range and subnet
> support will give us the flexibility.
>
> On Thu, Feb 1, 2018 at 5:56 PM, Sönke Liebau <
> soenke.lie...@opencore.com.invalid> wrote:
>
>> Hi Manikumar,
>>
>> the current proposal indeed leaves out IPv6 addresses, as I was unsure
>> whether Kafka fully supports that yet to be honest. But it would be
>> fairly easy to add these to the proposal - I'll update it over the
>> weekend.
>>
>> Regarding KAFKA-5713, I simply listed it as related, since it is
>> similar in spirit, if not exact wording.  Parts of that issue
>> (wildcards in hosts) would be covered by this kip - just in a slightly
>> different way. Do we really need wildcard support in IP addresses if
>> we can specify ranges and subnets? I considered it, but only came up
>> with scenarios that seemed fairly academic to me, like allowing the
>> same host from multiple subnets (10.0.*.1) for example.
>>
>> Allowing wildcards has the potential to make the code more complex,
>> depending on how we decide to implement this feature, hance I decided
>> to leave wildcards out for now.
>>
>> What do you think?
>>
>> Best regards,
>> Sönke
>>
>> On Thu, Feb 1, 2018 at 10:14 AM, Manikumar <manikumar.re...@gmail.com>
>> wrote:
>> > Hi,
>> >
>> > 1. Do we support IPv6 CIDR/ranges?
>> >
>> > 2. KAFKA-5713 is mentioned in Related JIRAs section. But there is no
>> > mention of wildcard support in the KIP.
>> >
>> >
>> > Thanks,
>> >
>> > On Thu, Feb 1, 2018 at 4:05 AM, Sönke Liebau <
>> > soenke.lie...@opencore.com.invalid> wrote:
>> >
>> >> Hey everybody,
>> >>
>> >> following a brief inital discussion a couple of days ago on this list
>> >> I'd like to get a discussion going on KIP-252 which would allow
>> >> specifying ip ranges and subnets for the -allow-host and --deny-host
>> >> parameters of the acl tool.
>> >>
>> >> The KIP can be found at
>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >> 252+-+Extend+ACLs+to+allow+filtering+based+on+ip+ranges+and+subnets
>> >>
>> >> Best regards,
>> >> Sönke
>> >>
>>
>>
>>
>> --
>> Sönke Liebau
>> Partner
>> Tel. +49 179 7940878
>> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>>
>
>



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Are there plans to migrate some/all of the command line tools to use the new AdminClient?

2018-02-22 Thread Sönke Liebau
I've dug around jira and the list of KIPs for a bit now, but could not
really find anything specific on plans to move the command line tools over
to the new AdminClient. Did I miss something or is that not currently
planned?

Most of the current command line tools require access to Zookeeper, which
becomes a bit of an issue once you enable zookeeper acls, as you need to
kinit with a broker keytab to be allowed write access which is somewhat of
a security concern. Also, if you want to firewall Zookeeper of from the
rest of the world any management command would need to be run from a
cluster machine.
None of this is an actual issue, it just required some additional effort
for cluster administration, however in a larger corporate environment I
can't imagine this would go down well with security audit guys and related
persons.

Using the AdminClient on the other hand allows to give specific users the
right to create topics/acls etc.which is checked by the brokers and
requires no access to Zookeeper by anybody except the brokers.

Maybe we could add a --use-adminclient parameter to the command line tools
sort of similar to the --new-consumer parameter to keep the old
functionality while enabling us to slowly move things over to the
AdminClient implementation?

Best regards,
Sönke


Re: Documentation build system

2018-02-07 Thread Sönke Liebau
Hi,

I'm in favor of moving away from HTML as well, while I have not had
direct exposure to the pain points that Ewen mentions it is in general
not pleasant to write docs in I think and bears a lot of risk of
parsing failures etc.

I can't vote for using rst, not having used it, but I am happy to vote
against markdown, as my own experience is that you hit its limitations
fairly quickly and start looking for ways around these limitatios,
which would likely make us end up in a similarly hacky place as where
we are today, just on top of a different basis.

Best regards,
Sönke

On Wed, Feb 7, 2018 at 2:55 AM, Guozhang Wang <wangg...@gmail.com> wrote:
> Ewen,
>
> Thanks for re-picking this up again. I'm big +1 as I was two years ago :P
> One thing that may still worth pointing out is that migrating from html
> means that for any edits it would require one more compilation step to
> review / compare the diffs if it is not only wording but also formatting /
> displaying purposed. Personally I think it is worthy.
>
> As for rst v.s. markdown v.s. anything else, I'm also inclining to rst but
> admittedly because I am familiar with rst than markdown as well. You have
> listed quite a list of pros of rst on the ticket and I do not have more to
> add.
>
>
> Guozhang
>
>
> On Tue, Feb 6, 2018 at 4:09 PM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
>> Hi all,
>>
>> I just wrote a note in https://issues.apache.org/jira/browse/KAFKA-2967
>> with a proposal for changing how docs are written. I want to move on this
>> soon if possible and normally would just leave the discussion to the JIRA,
>> but as I think this is something everyone has an opinion on and affects
>> everyone contributing to the project, I figured I'd send this quick note to
>> increase the likelihood people see it and have a chance to weigh in.
>>
>> Thanks,
>> -Ewen
>>
>
>
>
> --
> -- Guozhang



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-02-02 Thread Sönke Liebau
Hi Manikumar,

you are right, 5713 is a bit ambiguous about which fields are considered in
scope, but I agree that wildcards for Ips are not necessary when we have
ranges.

I am wondering though, if we might want to extend the scope of this KIP a
bit while we are changing acl and authorizer classes anyway.

After considering this a bit on a flihht with no wifi yesterday I came up
with the following:

* wildcards or regular expressions for principals, groups and topics
* extend the KafkaPrincipal object to allow adding custom key-value pairs
in principalbuilder implementations
* extend SimpleAclAuthorizer and the ACL tools to authorize on these
key/value pairs

The second and third bullet points would allow easy creation of for example
a principalbuilder that adds groups the user belongs to in the active
directory to its principal, without requiring the user to also extend the
authorizer and create custom ACL storage. This would significantly lower
the technical debt incurred by custom authorizer mechanisms I think.

There are a few issues to hash out of course, but I'd think in general this
should work work nicely and be a step towards meeting corporate
authorization requirements.

Best regards,
Sönke
Am 01.02.2018 18:46 schrieb "Manikumar" <manikumar.re...@gmail.com>:

Hi,

They are few deployments using IPv6.  It is good to support IPv6 also.

I think KAFKA-5713 is about adding regular expression support to resource
names (topic. consumer etc..).
Yes, wildcards (*) in hostname doesn't makes sense. Range and subnet
support will give us the flexibility.

On Thu, Feb 1, 2018 at 5:56 PM, Sönke Liebau <
soenke.lie...@opencore.com.invalid> wrote:

> Hi Manikumar,
>
> the current proposal indeed leaves out IPv6 addresses, as I was unsure
> whether Kafka fully supports that yet to be honest. But it would be
> fairly easy to add these to the proposal - I'll update it over the
> weekend.
>
> Regarding KAFKA-5713, I simply listed it as related, since it is
> similar in spirit, if not exact wording.  Parts of that issue
> (wildcards in hosts) would be covered by this kip - just in a slightly
> different way. Do we really need wildcard support in IP addresses if
> we can specify ranges and subnets? I considered it, but only came up
> with scenarios that seemed fairly academic to me, like allowing the
> same host from multiple subnets (10.0.*.1) for example.
>
> Allowing wildcards has the potential to make the code more complex,
> depending on how we decide to implement this feature, hance I decided
> to leave wildcards out for now.
>
> What do you think?
>
> Best regards,
> Sönke
>
> On Thu, Feb 1, 2018 at 10:14 AM, Manikumar <manikumar.re...@gmail.com>
> wrote:
> > Hi,
> >
> > 1. Do we support IPv6 CIDR/ranges?
> >
> > 2. KAFKA-5713 is mentioned in Related JIRAs section. But there is no
> > mention of wildcard support in the KIP.
> >
> >
> > Thanks,
> >
> > On Thu, Feb 1, 2018 at 4:05 AM, Sönke Liebau <
> > soenke.lie...@opencore.com.invalid> wrote:
> >
> >> Hey everybody,
> >>
> >> following a brief inital discussion a couple of days ago on this list
> >> I'd like to get a discussion going on KIP-252 which would allow
> >> specifying ip ranges and subnets for the -allow-host and --deny-host
> >> parameters of the acl tool.
> >>
> >> The KIP can be found at
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 252+-+Extend+ACLs+to+allow+filtering+based+on+ip+ranges+and+subnets
> >>
> >> Best regards,
> >> Sönke
> >>
>
>
>
> --
> Sönke Liebau
> Partner
> Tel. +49 179 7940878
> OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>


Re: [DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-02-01 Thread Sönke Liebau
Hi Manikumar,

the current proposal indeed leaves out IPv6 addresses, as I was unsure
whether Kafka fully supports that yet to be honest. But it would be
fairly easy to add these to the proposal - I'll update it over the
weekend.

Regarding KAFKA-5713, I simply listed it as related, since it is
similar in spirit, if not exact wording.  Parts of that issue
(wildcards in hosts) would be covered by this kip - just in a slightly
different way. Do we really need wildcard support in IP addresses if
we can specify ranges and subnets? I considered it, but only came up
with scenarios that seemed fairly academic to me, like allowing the
same host from multiple subnets (10.0.*.1) for example.

Allowing wildcards has the potential to make the code more complex,
depending on how we decide to implement this feature, hance I decided
to leave wildcards out for now.

What do you think?

Best regards,
Sönke

On Thu, Feb 1, 2018 at 10:14 AM, Manikumar <manikumar.re...@gmail.com> wrote:
> Hi,
>
> 1. Do we support IPv6 CIDR/ranges?
>
> 2. KAFKA-5713 is mentioned in Related JIRAs section. But there is no
> mention of wildcard support in the KIP.
>
>
> Thanks,
>
> On Thu, Feb 1, 2018 at 4:05 AM, Sönke Liebau <
> soenke.lie...@opencore.com.invalid> wrote:
>
>> Hey everybody,
>>
>> following a brief inital discussion a couple of days ago on this list
>> I'd like to get a discussion going on KIP-252 which would allow
>> specifying ip ranges and subnets for the -allow-host and --deny-host
>> parameters of the acl tool.
>>
>> The KIP can be found at
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 252+-+Extend+ACLs+to+allow+filtering+based+on+ip+ranges+and+subnets
>>
>> Best regards,
>> Sönke
>>



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


[DISCUSS] KIP-252: Extend ACLs to allow filtering based on ip ranges and subnets

2018-01-31 Thread Sönke Liebau
Hey everybody,

following a brief inital discussion a couple of days ago on this list
I'd like to get a discussion going on KIP-252 which would allow
specifying ip ranges and subnets for the -allow-host and --deny-host
parameters of the acl tool.

The KIP can be found at
https://cwiki.apache.org/confluence/display/KAFKA/KIP-252+-+Extend+ACLs+to+allow+filtering+based+on+ip+ranges+and+subnets

Best regards,
Sönke


Re: [VOTE] KIP-212: Enforce set of legal characters for connector names

2018-01-26 Thread Sönke Liebau
Thanks Randall for taking care of the administrative stuff and all
voters for their vote!

On Wed, Jan 24, 2018 at 8:35 AM, Randall Hauch <rha...@gmail.com> wrote:
> Thanks, Sönke. This KIP passes with three binding +1s from Ewen, Gwen, and
> Jason. I've updated the KIP pages accordingly.
>
> On Tue, Jan 23, 2018 at 10:14 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> +1
>>
>> On Tue, Jan 23, 2018 at 2:01 PM, Gwen Shapira <g...@confluent.io> wrote:
>>
>> > +1 (binding)
>> >
>> > On Tue, Jan 23, 2018, 1:17 PM Randall Hauch <rha...@gmail.com> wrote:
>> >
>> > > +1 (non-binding)
>> > >
>> > > On Mon, Jan 22, 2018 at 6:35 PM, Sönke Liebau <
>> > > soenke.lie...@opencore.com.invalid> wrote:
>> > >
>> > > > All,
>> > > >
>> > > > this KIP has been discussed for quite some time now and I believe we
>> > > > addressed all major concerns in the current revision, so I'd like to
>> > > > start a vote.
>> > > >
>> > > > KIP can be found here:
>> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > 212%3A+Enforce+set+of+legal+characters+for+connector+names
>> > > >
>> > > > Let me know what you think.
>> > > >
>> > > > Kind regards,
>> > > > Sönke
>> > > >
>> > >
>> >
>>



-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-26 Thread Sönke Liebau
I spent some time with the code today to try and hit the Jan 30th deadline
for 1.1.
I'm not entirely done yet, there is one weird test failure thst I need to
investigate, but I expect to be able to commit a new version sometime
tomorrow.
However, I just wanted to describe the current behavior of the code up
front to see if everybody agrees with it. There are a few peculiarities /
decision we may still need to take on whether to extend the logic a little.

Currently connector names are rejected when the method
Character.isISOControl returns true for any position of the string. This
catches ascii 1 through 31 as well as 127 & 128 and the escape sequences /r
/b /t /n /f and /u representations of these characters. Percent
encoding is decoded before performing these checks, so that can't be used
to sneak anything past the check.

Other escape sequences that are unknown to java cause an exception (unknown
escape character) to be thrown somewhere in the rest classes - I believe
there is not much we can do about that.

So far all is well, on to the stuff I am unsure about.
There is three java escape sewuencrs remaining: /' /" and //
Currently these are not unescaped by the code and would show up in the
connector name exactly like that - which means there is no way to get a
single / in a connector name. Percentencoded backslashes are also converted
to //.

Do we want to substitute this (it would be a finite list of three
substitutions) at the risk of this maybe causing issues somewhere else in
the code because we created an illegal escape sequence again, or are we
happy with that behavior for now?

Kind regards,
Sönke

[1]
https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isISOControl(int)

Am 21.01.2018 23:35 schrieb "Sönke Liebau" <soenke.lie...@opencore.com>:

> I've updated the KIP to prohibit using control characters is connector
> names - will create a vote thread tomorrow unless I hear back on
> necessary changes from anybody.
>
> Current proposal is to ban all control characters including newline
> etc. as well as their escape sequences. I have not specifically listed
> the escape sequences as I will have to dig into that topic a bit more
> to come up with a useful solution I think, but the general principle
> is stated.
>
> Let me know what you think.
>
> Best regards,
> Sönke
>
> On Sun, Jan 21, 2018 at 8:37 PM, Sönke Liebau
> <soenke.lie...@opencore.com> wrote:
> > Hi everybody,
> >
> > I was out of touch for personal reasons the entire week, apologies.
> > I'll update the KIP tonight and kick of a vote tomorrow morning if no
> > one objects until then. That gives a little less than two full days
> > for voting until the deadline kicks in - might work out if everybody
> > is happy with it.
> >
> > Best regards,
> > Sönke
> >
> > On Sat, Jan 20, 2018 at 12:38 AM, Randall Hauch <rha...@gmail.com>
> wrote:
> >> Sonke,
> >>
> >> Have you had a chance to update the KIP and kick off a VOTE thread? We
> need
> >> to do this ASAP if we want this to make the KIP deadline for 1.1, which
> is
> >> Jan 23!
> >>
> >> On Tue, Jan 16, 2018 at 10:33 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> >> wrote:
> >>
> >>> Sonke,
> >>>
> >>> I'm fine filtering some control characters. The trimming also seems
> like it
> >>> might be *somewhat* moot because the way connector names work in
> standalone
> >>> mode is limited by ConfigDef, which already does trimming of settings.
> Not
> >>> a great reason to be restrictive, but we'd partly just be codifying
> what's
> >>> there.
> >>>
> >>> I just generally have a distaste for being restrictive without a clear
> >>> reason. In this case I don't think it has any significant impact.
> >>>
> >>> KIP freeze is nearing and this seems like a simple improvement and a
> PR is
> >>> already available (modulo any changes re: control characters). I'll
> start
> >>> reviewing the PR, do you want to make any last updates about control
> >>> characters in the KIP and kick off a VOTE thread?
> >>>
> >>> -Ewen
> >>>
> >>> On Fri, Jan 12, 2018 at 1:43 PM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >>>
> >>> > On Fri, Jan 12, 2018, at 08:03, Sönke Liebau wrote:
> >>> > > Hi everybody,
> >>> > >
> >>> > > from reading the discussion I understand that we have two things
> still
> >>> > > open for discussen.
> >>> > >
> >>> > >  

Re: [DISCUSS] Improving ACLs by allowing ip ranges and subnet expressions?

2018-01-24 Thread Sönke Liebau
Hi Colin,

I agree with you on the fact that IP based security is not absolute. I was
considering it as an additional layer of security to be used in conjunction
with ssl certificates, so the rule would contain both the principal and
some hosts. This way if someone manages to obtain the certificate he'd need
to jump through extra hoops to use it from outside the cluster when its not
feasible to lock down Kafka with a firewall.

Mostly though I'd argue the principle that if we consider the feature worth
having it should be "done right" - otherwise we might as well remove it to
avoid giving users a false sense of security.

Regarding your suggestion of access control without security, we could
start honouring the HADOOP_USER_NAME environment variable, many people
should already be used to that :)
Not sure if there is a lot of demand for that feature though, I'd consider
it more dangerous than useful, but that is really just a personal opinion.

Best regards,
Sönke

Am 24.01.2018 23:31 schrieb "Colin McCabe" <cmcc...@apache.org>:

Hi Sonke,

IP address based security doesn't really work, though.  Users can spoof IP
addresses.  They can poison the ARP cache on a local network, or
impersonate a DNS server.

For users who want some access controls, but don't care about security,
maybe we should make it easier to use and create users without enabling
kerberos or similar?

best,
Colin


On Wed, Jan 24, 2018, at 12:59, Sönke Liebau wrote:
> Hi everyone,
>
> the current ACL functionality in Kafka is a bit limited concerning
> host based rules when specifying multiple hosts. A common scenario for
> this would be that if have a YARN cluster running Spark jobs that
> access Kafka and want to create ACLs based on the ip addresses of the
> cluster nodes.
> Currently kafka-acls only allows to specify individual ips, so this
> would look like
>
> ./kafka-acls --add --producer \
> --topic test --authorizer-properties zookeeper.connect=localhost:2181 \
> --allow-principal User:spark \
> --allow-host 10.0.0.10 \
> --allow-host 10.0.0.11 \
> --allow-host ...
>
> which can get unwieldy if you have a 200 node cluster. Internally this
> command would not create a single ACL with multiple host entries, but
> rather one ACL per host that is specified on the command line, which
> makes the ACL listing a bit confusing.
>
> There are currently a few jiras in various states around this topic:
> KAFKA-3531 [1], KAFKA-4759 [2], KAFKA-4985 [3] & KAFKA-5713 [4]
>
> KAFKA-4759 has a patch available, but would currently only add
> interpretation of CIDR notation, no specific ranges, which I think
> could easily be added.
>
> Colin McCabe commented in KAFKA-4985 that so far this was not
> implemented as no standard for expressing ip ranges with a fast
> implementation had been found so far, the available patch uses the
> ipmath [5] package for parsing expressions and range checking - which
> seems fairly small and focused.
>
> This would allow for expressions of the following type:
> 10.0.0.1
> 10.0.0.1-10.0.0.10
> 10.0.0.0/24
>
> I'd suggest extending this a little to allow a semicolon separated
> list of values:
> 10.0.0.1;10.0.0.1-10.0.0.10;10.0.0.0/24
>
> Performance considerations
> Internally the ipmath package represents ip addresses as longs, so if
> we stick with the example of a 200 node cluster from above, with the
> current implementation that would be 200 string comparisons for every
> request, whereas with a range it could potentially come down to two
> long comparisons. This is of course a back-of-the-envelope calculation
> at best, but there at least seems to be a case for investigating this
> a bit further I think.
>
>
> These changes would probably necessitate a KIP - though with some
> consideration they could be made in a way that no existing public
> facing functionality is changed, but for transparency and proper
> documentation I'd say a KIP would be preferable.
>
> I'd be happy to draft one if people think this is worthwhile.
>
> Let me know what you think.
>
> best regards,
> Sönke
>
> [1] https://issues.apache.org/jira/browse/KAFKA-3531
> [2] https://issues.apache.org/jira/browse/KAFKA-4759
> [3] https://issues.apache.org/jira/browse/KAFKA-4985
> [4] https://issues.apache.org/jira/browse/KAFKA-5713
> [5] https://github.com/jgonian/commons-ip-math


[DISCUSS] Improving ACLs by allowing ip ranges and subnet expressions?

2018-01-24 Thread Sönke Liebau
Hi everyone,

the current ACL functionality in Kafka is a bit limited concerning
host based rules when specifying multiple hosts. A common scenario for
this would be that if have a YARN cluster running Spark jobs that
access Kafka and want to create ACLs based on the ip addresses of the
cluster nodes.
Currently kafka-acls only allows to specify individual ips, so this
would look like

./kafka-acls --add --producer \
--topic test --authorizer-properties zookeeper.connect=localhost:2181 \
--allow-principal User:spark \
--allow-host 10.0.0.10 \
--allow-host 10.0.0.11 \
--allow-host ...

which can get unwieldy if you have a 200 node cluster. Internally this
command would not create a single ACL with multiple host entries, but
rather one ACL per host that is specified on the command line, which
makes the ACL listing a bit confusing.

There are currently a few jiras in various states around this topic:
KAFKA-3531 [1], KAFKA-4759 [2], KAFKA-4985 [3] & KAFKA-5713 [4]

KAFKA-4759 has a patch available, but would currently only add
interpretation of CIDR notation, no specific ranges, which I think
could easily be added.

Colin McCabe commented in KAFKA-4985 that so far this was not
implemented as no standard for expressing ip ranges with a fast
implementation had been found so far, the available patch uses the
ipmath [5] package for parsing expressions and range checking - which
seems fairly small and focused.

This would allow for expressions of the following type:
10.0.0.1
10.0.0.1-10.0.0.10
10.0.0.0/24

I'd suggest extending this a little to allow a semicolon separated
list of values:
10.0.0.1;10.0.0.1-10.0.0.10;10.0.0.0/24

Performance considerations
Internally the ipmath package represents ip addresses as longs, so if
we stick with the example of a 200 node cluster from above, with the
current implementation that would be 200 string comparisons for every
request, whereas with a range it could potentially come down to two
long comparisons. This is of course a back-of-the-envelope calculation
at best, but there at least seems to be a case for investigating this
a bit further I think.


These changes would probably necessitate a KIP - though with some
consideration they could be made in a way that no existing public
facing functionality is changed, but for transparency and proper
documentation I'd say a KIP would be preferable.

I'd be happy to draft one if people think this is worthwhile.

Let me know what you think.

best regards,
Sönke

[1] https://issues.apache.org/jira/browse/KAFKA-3531
[2] https://issues.apache.org/jira/browse/KAFKA-4759
[3] https://issues.apache.org/jira/browse/KAFKA-4985
[4] https://issues.apache.org/jira/browse/KAFKA-5713
[5] https://github.com/jgonian/commons-ip-math


  1   2   >