Re: [DISCUSS] KIP-821: Connect Transforms support for nested structures

2022-08-12 Thread Robert Yokota
Hi Jorge,

Yes, to escape a backtick I would recommend a backslash.

For example, MySQL allows identifiers to be surrounded with single quotes,
double quotes, or backticks, and they use backslash to escape.

Jsonata went with primarily backticks as they are less common in strings
that appear in JSON, and it seems to have worked nicely for them.

Regards,
Robert

On Fri, Aug 12, 2022 at 10:01 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Robert! And sorry for the late reply.
>
> That's a great catch and will require the current proposal to be updated.
>
> Using back-ticks is a good proposal. Though, the challenge we got is that
> JSON allows really any character into attribute names, including
> back-ticks.
> I wonder what's the best approach to escape back-ticks then. Given that
> those use-cases should be minimal we could fallback to backslash to escape
> back-ticks?
>
> What do you think?
>
> Many thanks,
> Jorge.
>
>
> On Wed, 20 Jul 2022 at 23:03, Robert Yokota  wrote:
>
> > Hi,
> >
> > I'm late to this thread, but would like to comment on the double
> dot/double
> > asterisk syntax.
> >
> > Unfortunately double dot is often used in JSON Path as a descendant
> > selector, see https://www.ietf.org/id/draft-ietf-jsonpath-base-05.html
> >
> > I think a better notation would be to use backticks to quote any names
> with
> > a dot, such as  root.`child.with.dot`.  See Jsonata syntax for example:
> > http://docs.jsonata.org/simple
> >
> > Robert
> >
> > On Wed, Jun 29, 2022 at 3:34 AM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Thanks Chris! I have updated the KIP to include this fix.
> > >
> > > I will keep the array as a potential improvement at the moment, and out
> > of
> > > the scope of this KIP.
> > >
> > > Thanks,
> > > Jorge.
> > >
> > > On Tue, 28 Jun 2022 at 23:19, Chris Egerton 
> > > wrote:
> > >
> > > > Hi Jorge,
> > > >
> > > > Apologies for the long delay, had my own KIP-related work to focus
> on.
> > > >
> > > > I think it's fine to include array accesses but it's not a blocker.
> I'm
> > > +1
> > > > either way. On that front though, I think the MaskField section might
> > > need
> > > > to be updated as it still mentions arrays and deep-scan?
> > > >
> > > > Cheers,
> > > >
> > > > Chris
> > > >
> > > > On Tue, Jun 28, 2022 at 4:38 PM Jorge Esteban Quilcate Otoya <
> > > > quilcate.jo...@gmail.com> wrote:
> > > >
> > > > > Hi there,
> > > > >
> > > > > I have update the KIP to the previous state voted, including the
> > > > > configuration change from `field.style` to `field.syntax.version`.
> > > > >
> > > > > I'll bump the vote thread as well to see if there's agreement on
> > adding
> > > > > this feature to Connect.
> > > > >
> > > > > Cheers,
> > > > > Jorge.
> > > > >
> > > > > On Wed, 15 Jun 2022 at 23:02, Jorge Esteban Quilcate Otoya <
> > > > > quilcate.jo...@gmail.com> wrote:
> > > > >
> > > > > > Thanks, Chris. Your feedback is much appreciated!
> > > > > >
> > > > > > I see how the current proposal might be underestimating some edge
> > > > cases.
> > > > > > I'm happy to move the design for deep-scan and multi-values to
> > future
> > > > > > developments related with this KIP and reduce its scope, though
> > open
> > > > for
> > > > > > more feedback.
> > > > > >
> > > > > > Also, just to be sure, are you proposing also to not include
> array
> > > > access
> > > > > > at this stage?
> > > > > >
> > > > > > Thanks,
> > > > > > Jorge.
> > > > > >
> > > > > > On Tue, 14 Jun 2022 at 03:20, Chris Egerton <
> > fearthecel...@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Jorge,
> > > > > >>
> > > > > >> I've done some more thinking and I hate to say it, but I think
> the
> > > > > syntax
> > > > > >> does need to be

Re: [DISCUSS] KIP-821: Connect Transforms support for nested structures

2022-07-20 Thread Robert Yokota
Hi,

I'm late to this thread, but would like to comment on the double dot/double
asterisk syntax.

Unfortunately double dot is often used in JSON Path as a descendant
selector, see https://www.ietf.org/id/draft-ietf-jsonpath-base-05.html

I think a better notation would be to use backticks to quote any names with
a dot, such as  root.`child.with.dot`.  See Jsonata syntax for example:
http://docs.jsonata.org/simple

Robert

On Wed, Jun 29, 2022 at 3:34 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Chris! I have updated the KIP to include this fix.
>
> I will keep the array as a potential improvement at the moment, and out of
> the scope of this KIP.
>
> Thanks,
> Jorge.
>
> On Tue, 28 Jun 2022 at 23:19, Chris Egerton 
> wrote:
>
> > Hi Jorge,
> >
> > Apologies for the long delay, had my own KIP-related work to focus on.
> >
> > I think it's fine to include array accesses but it's not a blocker. I'm
> +1
> > either way. On that front though, I think the MaskField section might
> need
> > to be updated as it still mentions arrays and deep-scan?
> >
> > Cheers,
> >
> > Chris
> >
> > On Tue, Jun 28, 2022 at 4:38 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi there,
> > >
> > > I have update the KIP to the previous state voted, including the
> > > configuration change from `field.style` to `field.syntax.version`.
> > >
> > > I'll bump the vote thread as well to see if there's agreement on adding
> > > this feature to Connect.
> > >
> > > Cheers,
> > > Jorge.
> > >
> > > On Wed, 15 Jun 2022 at 23:02, Jorge Esteban Quilcate Otoya <
> > > quilcate.jo...@gmail.com> wrote:
> > >
> > > > Thanks, Chris. Your feedback is much appreciated!
> > > >
> > > > I see how the current proposal might be underestimating some edge
> > cases.
> > > > I'm happy to move the design for deep-scan and multi-values to future
> > > > developments related with this KIP and reduce its scope, though open
> > for
> > > > more feedback.
> > > >
> > > > Also, just to be sure, are you proposing also to not include array
> > access
> > > > at this stage?
> > > >
> > > > Thanks,
> > > > Jorge.
> > > >
> > > > On Tue, 14 Jun 2022 at 03:20, Chris Egerton  >
> > > > wrote:
> > > >
> > > >> Hi Jorge,
> > > >>
> > > >> I've done some more thinking and I hate to say it, but I think the
> > > syntax
> > > >> does need to be expanded. Right now it's clear what "a.b" refers to
> > and
> > > >> what "a..b" refers to, but what about "a...b"? Is that referring to
> > > >> subfield ".b" of field "a", or subfield "b" of field "a."? This gets
> > > even
> > > >> more complicated when thinking about fields whose names are
> > exclusively
> > > >> made up of dots.
> > > >>
> > > >> I'm also a little hesitant to mix the cases of multi-value paths and
> > > deep
> > > >> scans. What if you only want to access one subfield deep for an SMT,
> > > >> instead of recursing through all the children of a given field? It's
> > > akin
> > > >> to the distinction between * and ** with file globbing patterns, and
> > > there
> > > >> could be a substantial performance difference if you have
> > heavily-nested
> > > >> fields.
> > > >>
> > > >> Ultimately, I think that if the proposed "field.syntax.version"
> > property
> > > >> sits well with people, it might be better to reduce the scope of the
> > KIP
> > > >> back to the original proposal and just focus on adding support for
> > > >> explicitly-specified nested values, with no multi-value paths
> > > whatsoever,
> > > >> knowing that we have an easy way to introduce new syntax and
> features
> > in
> > > >> the future. (We could probably leave the "a...b" case for that next
> > > >> version
> > > >> too.)
> > > >>
> > > >> I was a huge fan of this KIP before we started trying to address
> more
> > > >> complex use cases, and although I don't want to write those off, I
> > think
> > > >> we
> > > >> may have bitten off more than we can chew in time for the 3.3.0
> > release
> > > >> and
> > > >> would hate to see this KIP get delayed as a result.
> > > >>
> > > >> I'd be really curious to hear from Joshua and Tom on this front,
> > though.
> > > >> Is
> > > >> it acceptable to move more incrementally here and settle on the
> syntax
> > > >> version property as our means of introducing new features, or is it
> > > >> preferable to implement things monolithically and try to get
> > everything
> > > >> (or
> > > >> at least, as much as possible) right the first time?
> > > >>
> > > >> Thanks again for your continued effort on this KIP!
> > > >>
> > > >> Cheers,
> > > >>
> > > >> Chris
> > > >>
> > > >> On Wed, Jun 8, 2022 at 5:41 PM Jorge Esteban Quilcate Otoya <
> > > >> quilcate.jo...@gmail.com> wrote:
> > > >>
> > > >> > Thanks, Chris!
> > > >> >
> > > >> > Please, find my comments below:
> > > >> >
> > > >> > On Tue, 7 Jun 2022 at 04:39, Chris Egerton <
> fearthecel...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Jorge,
> > > >> > >
> > > >> 

[jira] [Created] (KAFKA-8404) Authorization header is not passed in Connect when forwarding REST requests

2019-05-21 Thread Robert Yokota (JIRA)
Robert Yokota created KAFKA-8404:


 Summary: Authorization header is not passed in Connect when 
forwarding REST requests
 Key: KAFKA-8404
 URL: https://issues.apache.org/jira/browse/KAFKA-8404
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Robert Yokota
 Fix For: 2.3.0


When Connect forwards a REST request from one worker to another, the 
Authorization header is not forwarded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-415: Incremental Cooperative Rebalancing in Kafka Connect

2019-03-08 Thread Robert Yokota
Thanks for the great KIP Konstantine!

+1 (non-binding)

Robert

On Thu, Mar 7, 2019 at 2:56 PM Guozhang Wang  wrote:

> Thanks Konstantine, I've read the updated section on
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
> and it lgtm.
>
> I'm +1 on the KIP.
>
>
> Guozhang
>
>
> On Thu, Mar 7, 2019 at 2:35 PM Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > Thanks Guozhang. This is a valid observation regarding the current status
> > of the PR.
> >
> > I updated the KIP to explicitly call out how the downgrade process should
> > work in the section Compatibility, Deprecation, and Migration.
> >
> > Additionally, I reduced the configuration modes for the connect.protocol
> to
> > only two: eager and compatible.
> > That's because there's no way at the moment to select a protocol based on
> > simple majority and not unanimity across at least one option for the
> > sub-protocol.
> > Therefore there's no way to lock a group of workers in a cooperative-only
> > mode at the moment, if we account for accidental joins of workers running
> > at an older version.
> >
> > The changes have been reflected in the KIP doc and will be reflected in
> the
> > PR in a subsequent commit.
> >
> > Thanks,
> > Konstantine
> >
> >
> > On Thu, Mar 7, 2019 at 1:17 PM Guozhang Wang  wrote:
> >
> > > Hi Konstantine,
> > >
> > > Thanks for the updated KIP and the PR as well (which is huge :) I
> briefly
> > > looked through it as well as the KIP, and I have one minor comment to
> add
> > > (otherwise I'm binding +1 on it as well) about the backward
> > compatibility.
> > > I'll use one example to illustrate the issue:
> > >
> > > 1) Suppose you have workerA and B on newer version and configured the
> > > connect.protocol as "compatible", they will send both V0/V1 to the
> leader
> > > (say it's workerA) who will choose V1 as the current protocol, this
> will
> > be
> > > sent back to A and B who would remember the current protocol version is
> > > already V1. So after this rebalance everyone remembers that V1 can be
> > used,
> > > which means that upon prepareJoin they will not revoke all the assigned
> > > tasks.
> > >
> > > 2) Now let's say a new worker joins but with old version V0
> (practically
> > > this is rare, but for illustration purposes some common scenarios may
> > falls
> > > into this, e.g. an existing worker being downgraded, which is
> essentially
> > > as being kicked out of the group, and then rejoined as a new member on
> > the
> > > older version), the leader realized that at least one of the member
> does
> > > not know V1 and hence would fall back to use version V0 to perform
> > > assignment. V0 algorithm would do eager rebalance which may move some
> > tasks
> > > to the new comer immediately from the existing members, as it assumes
> > that
> > > everyone would revoke everything before join (a.k.a the sync-barrier)
> but
> > > this is actually not true, since everyone other than the old versioned
> > new
> > > comer would still follow the behavior of V1 --- not revoking anything
> ---
> > > before sending the join group request.
> > >
> > > This could be solvable though, e.g. when leader realized that he needs
> to
> > > use V0, while the previous "currentProtocol" value is V1, instead of
> just
> > > blindly follow the algorithm of V0 it could just reassign the existing
> > > partitions without migrating anything, while at the same time tell
> > everyone
> > > that the currentProtocol version is downgraded to V0; and then they can
> > > trigger another rebalance based on V0 where everything will revoke the
> > > tasks before sending join group requests.
> > >
> > >
> > > Guozhang
> > >
> > > On Wed, Mar 6, 2019 at 2:28 PM Konstantine Karantasis <
> > > konstant...@confluent.io> wrote:
> > >
> > > > I'd like to open the vote on KIP-415: Incremental Cooperative
> > Rebalancing
> > > > in Kafka Connect
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect
> > > >
> > > > a proposal that will allow Kafka Connect to scale significantly the
> > > number
> > > > of connectors and tasks it can run in a cluster of Connect workers.
> > > >
> > > > Thanks,
> > > > Konstantine
> > > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>


[jira] [Resolved] (KAFKA-7512) java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.Number

2018-10-16 Thread Robert Yokota (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Yokota resolved KAFKA-7512.
--
Resolution: Duplicate

> java.lang.ClassCastException: java.util.Date cannot be cast to 
> java.lang.Number
> ---
>
> Key: KAFKA-7512
> URL: https://issues.apache.org/jira/browse/KAFKA-7512
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.0.0
>Reporter: Rohit Kumar Gupta
>Priority: Blocker
> Attachments: connect.out
>
>
> Steps:
> ~~
> bash-4.2# kafka-avro-console-producer --broker-list localhost:9092 --topic 
> connect_10oct_03 -property schema.registry.url=http://localhost:8081 
> --property value.schema='{"type":"record","name":"myrecord","fields":[
> {"name":"f1","type":"string"}
> ,{"name":"f2","type":["null",
> {"type":"long","logicalType":"timestamp-millis","connect.version":1,"connect.name":"org.apache.kafka.connect.data.Timestamp"}
> ],"default":null}]}'
> {"f1": "value1","f2": \{"null":null}}
> {"f1": "value1","f2": \{"long":1022}}
>  
> bash-4.2# kafka-avro-console-producer --broker-list localhost:9092 --topic 
> connect_10oct_03 -property schema.registry.url=http://localhost:8081 
> --property value.schema='{"type":"record","name":"myrecord","fields":[
> {"name":"f1","type":"string"}
> ,{"name":"f2","type":["null",
> {"type":"long","logicalType":"timestamp-millis","connect.version":1,"connect.name":"org.apache.kafka.connect.data.Timestamp"}
> ],"default":null},\{"name":"f3","type":"string","default":"green"}]}'
> {"f1": "value1","f2": \\{"null":null}
> ,"f3":"toto"}
> {"f1": "value1","f2": \\{"null":null}
> ,"f3":"toto"}
> {"f1": "value1","f2": \\{"null":null}
> ,"f3":"toto"}
> {"f1": "value1","f2": \\{"long":12343536}
> ,"f3":"tutu"}
>  
> bash-4.2# kafka-avro-console-producer --broker-list localhost:9092 --topic 
> connect_10oct_03 -property schema.registry.url=http://localhost:8081 
> --property value.schema='{"type":"record","name":"myrecord","fields":[
> {"name":"f1","type":"string"}
> ,{"name":"f2","type":["null",
> {"type":"long","logicalType":"timestamp-millis","connect.version":1,"connect.name":"org.apache.kafka.connect.data.Timestamp"}
> ],"default":null}]}'
> {"f1": "value1","f2": \{"null":null}}
> {"f1": "value1","f2": \{"long":1022}}
>  
> bash-4.2# curl -X POST -H "Accept: application/json" -H "Content-Type: 
> application/json" http://localhost:8083/connectors -d 
> '\{"name":"hdfs-sink-connector-10oct-03", "config": 
> {"connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector", 
> "tasks.max":"1", "topics":"connect_10oct_03", "hdfs.url": 
> "hdfs://localhost:8020/tmp/", "flush.size":"1", "hive.integration": "true", 
> "hive.metastore.uris": "thrift://localhost:9083", "hive.database": "rohit", 
> "schema.compatibility": "BACKWARD"}}'
> {"name":"hdfs-sink-connector-10oct-03","config":\\{"connector.class":"io.confluent.connect.hdfs.HdfsSinkConnector","tasks.max":"1","topics":"connect_10oct_03","hdfs.url":"hdfs://localhost:8020/tmp/","flush.size":"1","hive.integration":"true","hive.metastore.uris":"thrift://localhost:9083"

[jira] [Created] (KAFKA-7476) SchemaProjector is not properly handling Date-based logical types

2018-10-02 Thread Robert Yokota (JIRA)
Robert Yokota created KAFKA-7476:


 Summary: SchemaProjector is not properly handling Date-based 
logical types
 Key: KAFKA-7476
 URL: https://issues.apache.org/jira/browse/KAFKA-7476
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Robert Yokota
Assignee: Robert Yokota


SchemaProjector is not properly handling Date-based logical types.  An 
exception of the following form is thrown:  `Caused by: 
java.lang.ClassCastException: java.util.Date cannot be cast to java.lang.Number`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7370) Enhance FileConfigProvider to read a directory

2018-09-17 Thread Robert Yokota (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Yokota resolved KAFKA-7370.
--
Resolution: Won't Do

> Enhance FileConfigProvider to read a directory
> --
>
> Key: KAFKA-7370
> URL: https://issues.apache.org/jira/browse/KAFKA-7370
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Affects Versions: 2.0.0
>    Reporter: Robert Yokota
>    Assignee: Robert Yokota
>Priority: Minor
>
> Currently FileConfigProvider can read a Properties file as a set of key-value 
> pairs.  This enhancement is to augment FileConfigProvider so that it can also 
> read a directory, where the file names are the keys and the corresponding 
> file contents are the values.
> This will allow for easier integration with secret management systems where 
> each secret is often an individual file, such as in Docker and Kubernetes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-09-05 Thread Robert Yokota
Hi Colin,

Thanks for the suggestion, I like it.  Do others like the idea of a
DirectoryConfigProvider, and if so is it ok if I amend the KIP?

Thanks,
Robert

On Tue, Sep 4, 2018 at 5:33 PM, Colin McCabe  wrote:

> Hi Robert,
>
> This seems like a reasonable behavior to me.  However, adding this
> behavior to FileConfigProvider seems like it might give someone who
> accidentally configures a directory rather than a file a nasty surprise.
> How about adding a DirectoryConfigProvider which adds this behavior?
>
> best,
> Colin
>
>
> On Tue, Sep 4, 2018, at 14:00, Robert Yokota wrote:
> > Hi everyone,
> >
> > Currently the FileConfigProvider, when passed a path that represents a
> > Properties file, will read the file as a set of key-value pairs.
> >
> > I've filed https://issues.apache.org/jira/browse/KAFKA-7370, which
> proposes
> > to augment FileConfigProvider so that when a path representing a
> directory
> > is passed, it will treat the file names as keys and the corresponding
> file
> > contents as values.   This will allow for easier integration with secret
> > management systems where each secret is often an individual file (such as
> > when using Docker or Kubernetes).The previous behavior is still
> > retained, so this change is backward compatible.
> >
> > Two questions:
> >
> > 1) Does this seem like a reasonable idea?
> >
> > 2) If it is a reasonable idea, is it ok to amend the KIP?
> >
> > Thanks,
> > Robert
> >
> > On Mon, Jun 11, 2018 at 8:16 PM, Konstantine Karantasis <
> > konstant...@confluent.io> wrote:
> >
> > > Sounds great, happy to hear we all agree!
> > > Thanks everyone!
> > >
> > >
> > > - Konstantine
> > >
> > >
> > > On Mon, Jun 11, 2018 at 4:22 PM, Colin McCabe 
> wrote:
> > >
> > > > Sounds good.  Thanks, Konstantin.
> > > >
> > > > Colin
> > > >
> > > >
> > > > On Mon, Jun 11, 2018, at 13:41, Rajini Sivaram wrote:
> > > > > Hi Konstantine,
> > > > >
> > > > > Sounds reasonable to me too.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Mon, Jun 11, 2018 at 7:55 PM, Robert Yokota  >
> > > > wrote:
> > > > >
> > > > > > Hi Konstantine,
> > > > > >
> > > > > > Sounds reasonable!
> > > > > >
> > > > > > Thanks,
> > > > > > Robert
> > > > > >
> > > > > > On Mon, Jun 11, 2018 at 11:49 AM, Konstantine Karantasis <
> > > > > > konstant...@confluent.io> wrote:
> > > > > >
> > > > > > > Hi everyone, after fixing an issue with a regular expression in
> > > > Connect's
> > > > > > > class loading isolation of the new component type
> ConfigProvider
> > > > here:
> > > > > > >
> > > > > > > https://github.com/apache/kafka/pull/5177
> > > > > > >
> > > > > > > I noticed that the new interface ConfigProvider, along with its
> > > first
> > > > > > > implementation FileConfigProvider, have been placed in the
> package:
> > > > > > >
> > > > > > > org.apache.kafka.common.config
> > > > > > >
> > > > > > > This specific package is mentioned in KIP-297 is a few places,
> but
> > > > not in
> > > > > > > any code snippets. I'd like to suggest moving the interface
> and any
> > > > > > current
> > > > > > > of future implementation classes in a new package named:
> > > > > > >
> > > > > > > org.apache.kafka.common.config.provider
> > > > > > >
> > > > > > > and update the KIP document accordingly.
> > > > > > >
> > > > > > > This seems to make sense in general. But, specifically, in
> Connect
> > > > it is
> > > > > > > desired since we treat ConfigProvider implementations as
> Connect
> > > > > > components
> > > > > > > that are loaded in isolation. Having a package for config
> providers
> > > > will
> > > > > > > allow us to avoid making any assumptions with respect to the
> name
>

Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-09-04 Thread Robert Yokota
Hi everyone,

Currently the FileConfigProvider, when passed a path that represents a
Properties file, will read the file as a set of key-value pairs.

I've filed https://issues.apache.org/jira/browse/KAFKA-7370, which proposes
to augment FileConfigProvider so that when a path representing a directory
is passed, it will treat the file names as keys and the corresponding file
contents as values.   This will allow for easier integration with secret
management systems where each secret is often an individual file (such as
when using Docker or Kubernetes).The previous behavior is still
retained, so this change is backward compatible.

Two questions:

1) Does this seem like a reasonable idea?

2) If it is a reasonable idea, is it ok to amend the KIP?

Thanks,
Robert

On Mon, Jun 11, 2018 at 8:16 PM, Konstantine Karantasis <
konstant...@confluent.io> wrote:

> Sounds great, happy to hear we all agree!
> Thanks everyone!
>
>
> - Konstantine
>
>
> On Mon, Jun 11, 2018 at 4:22 PM, Colin McCabe  wrote:
>
> > Sounds good.  Thanks, Konstantin.
> >
> > Colin
> >
> >
> > On Mon, Jun 11, 2018, at 13:41, Rajini Sivaram wrote:
> > > Hi Konstantine,
> > >
> > > Sounds reasonable to me too.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > > On Mon, Jun 11, 2018 at 7:55 PM, Robert Yokota 
> > wrote:
> > >
> > > > Hi Konstantine,
> > > >
> > > > Sounds reasonable!
> > > >
> > > > Thanks,
> > > > Robert
> > > >
> > > > On Mon, Jun 11, 2018 at 11:49 AM, Konstantine Karantasis <
> > > > konstant...@confluent.io> wrote:
> > > >
> > > > > Hi everyone, after fixing an issue with a regular expression in
> > Connect's
> > > > > class loading isolation of the new component type ConfigProvider
> > here:
> > > > >
> > > > > https://github.com/apache/kafka/pull/5177
> > > > >
> > > > > I noticed that the new interface ConfigProvider, along with its
> first
> > > > > implementation FileConfigProvider, have been placed in the package:
> > > > >
> > > > > org.apache.kafka.common.config
> > > > >
> > > > > This specific package is mentioned in KIP-297 is a few places, but
> > not in
> > > > > any code snippets. I'd like to suggest moving the interface and any
> > > > current
> > > > > of future implementation classes in a new package named:
> > > > >
> > > > > org.apache.kafka.common.config.provider
> > > > >
> > > > > and update the KIP document accordingly.
> > > > >
> > > > > This seems to make sense in general. But, specifically, in Connect
> > it is
> > > > > desired since we treat ConfigProvider implementations as Connect
> > > > components
> > > > > that are loaded in isolation. Having a package for config providers
> > will
> > > > > allow us to avoid making any assumptions with respect to the name
> of
> > a
> > > > > class that implements `ConfigProvider` and is included in Apache
> > Kafka.
> > > > It
> > > > > will suffice for this class to reside in the package
> > > > > org.apache.kafka.common.config.provider.
> > > > >
> > > > > Let me know if this is a reasonable request and if you agree on
> > amending
> > > > > the KIP description.
> > > > >
> > > > > - Konstantine
> > > > >
> > > > >
> > > > >
> > > > > On Wed, May 16, 2018 at 10:33 AM, Rajini Sivaram <
> > > > rajinisiva...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for the update, Robert. Looks good to me.
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Rajini
> > > > > >
> > > > > > On Wed, May 16, 2018 at 4:43 PM, Robert Yokota <
> rayok...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi Rajini,
> > > > > > >
> > > > > > > Thanks for the excellent feedback!
> > > > > > >
> > > > > > > I've made the API changes that you've requested in the KIP.
> > > > > > >
> > > > > > >
> &

[jira] [Created] (KAFKA-7370) Enhance FileConfigProvider to read a directory

2018-08-31 Thread Robert Yokota (JIRA)
Robert Yokota created KAFKA-7370:


 Summary: Enhance FileConfigProvider to read a directory
 Key: KAFKA-7370
 URL: https://issues.apache.org/jira/browse/KAFKA-7370
 Project: Kafka
  Issue Type: Improvement
  Components: config
Affects Versions: 2.0.0
Reporter: Robert Yokota
Assignee: Robert Yokota


Currently FileConfigProvider can read a Properties file as a set of key-value 
pairs.  This enhancement is to augment FileConfigProvider so that it can also 
read a directory, where the file names are the keys and the corresponding file 
contents are the values.

This will allow for easier integration with secret management systems where 
each secret is often an individual file, such as in Docker and Kubernetes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-06-11 Thread Robert Yokota
Hi Konstantine,

Sounds reasonable!

Thanks,
Robert

On Mon, Jun 11, 2018 at 11:49 AM, Konstantine Karantasis <
konstant...@confluent.io> wrote:

> Hi everyone, after fixing an issue with a regular expression in Connect's
> class loading isolation of the new component type ConfigProvider here:
>
> https://github.com/apache/kafka/pull/5177
>
> I noticed that the new interface ConfigProvider, along with its first
> implementation FileConfigProvider, have been placed in the package:
>
> org.apache.kafka.common.config
>
> This specific package is mentioned in KIP-297 is a few places, but not in
> any code snippets. I'd like to suggest moving the interface and any current
> of future implementation classes in a new package named:
>
> org.apache.kafka.common.config.provider
>
> and update the KIP document accordingly.
>
> This seems to make sense in general. But, specifically, in Connect it is
> desired since we treat ConfigProvider implementations as Connect components
> that are loaded in isolation. Having a package for config providers will
> allow us to avoid making any assumptions with respect to the name of a
> class that implements `ConfigProvider` and is included in Apache Kafka. It
> will suffice for this class to reside in the package
> org.apache.kafka.common.config.provider.
>
> Let me know if this is a reasonable request and if you agree on amending
> the KIP description.
>
> - Konstantine
>
>
>
> On Wed, May 16, 2018 at 10:33 AM, Rajini Sivaram 
> wrote:
>
> > Thanks for the update, Robert. Looks good to me.
> >
> > Regards,
> >
> > Rajini
> >
> > On Wed, May 16, 2018 at 4:43 PM, Robert Yokota 
> wrote:
> >
> > > Hi Rajini,
> > >
> > > Thanks for the excellent feedback!
> > >
> > > I've made the API changes that you've requested in the KIP.
> > >
> > >
> > > > 1. Are we expecting one provider instance with different contexts
> > > > provided to `ConfigProvider.get()`? If we created a different
> provider
> > > > instance for each context, we could deal with scheduling reloads in
> the
> > > > provider implementation?
> > >
> > > Yes, there would be one provider instance.  I've collapsed the
> > > ConfigContext and the ConfigChangeCallback by adding a parameter
> delayMs
> > to
> > > indicate when the change will happen.  When a particular ConfigProvider
> > > retrieves a lease duration along with a key, it can either 1)
> schedule a
> > > background thread to push out the change when it happens (at which time
> > the
> > > delayMs will be 0), or invoke the callback immediately with the lease
> > > duration set as delayMs (of course, in this case the values for the
> keys
> > > will be the old values).  A ConfProvider could be parameterized to do
> one
> > > or the other.
> > >
> > >
> > > > 2. Couldn't ConfigData  be an interface that just returns a map of
> > > > key-value pairs. Providers that return metadata could extend it to
> > > provide
> > > > metadata in a meaningful format instead of Map.
> > >
> > > I've replaced ConfigData with Map as you suggested.
> > >
> > >
> > > > 3. For ZK, we would use ConfigProvider.get() without `keys` to get
> all
> > > > keys in the path. Do we have two get() methods since some providers
> > need
> > > > keys to be specified and some don't? How do we decide which one to
> use?
> > >
> > > The ConfigProvider should be thought of like a Map interface and does
> not
> > > require that one signature of get() be preferred over the other.
> KIP-226
> > > can use get(String path) while Connect will use get(String path,
> > > Set) since it knows which keys it is interested in.
> > >
> > >
> > > A few more updates to the KIP:
> > >
> > > - I've elided the ConfigTransformer implementation as Colin suggested.
> > > - The variable reference now looks like ${provider:[path:]key} where
> the
> > > path is optional.
> > >
> > >
> > > Thanks!
> > > Robert
> > >
> > >
> > >
> > >
> > > On Wed, May 16, 2018 at 4:30 AM, Rajini Sivaram <
> rajinisiva...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Robert,
> > > >
> > > > Thanks for the KIP updates.
> > > >
> > > > The interfaces look suitable for brokers, with some small changes. If
> > we
&

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-18 Thread Robert Yokota
HI Rajini,

Good questions.

First, if no ConfigProviders are configured, then config values of the form
${vault:mypassword} will remain as is.

Second, I mention in the KIP that if a provider does not have a value for a
given key, the variable will remain unresolved and the final value will be
of the form ${vault:mypassword} still.

If one wants to use a config value ${vault:mypassword}, as well as the
VaultConfigProvider, one can choose to use a different prefix besides
"vault" when referring to the VaultConfigProvider since the prefixes are
arbitrary and specified in a config file.

Finally, if one want to use a config value ${vault:mypassword}, as well as
the VaultConfigProvider, and one wants to use the prefix "vault" and not
something else, then yes, one could use a LiteralConfigProvider as you
described, or even put the ${vault:mypassword} in a different file and use
the FileConfigProvider to pull in the value (since there is only one level
of indirection).

Thanks,
Robert



On Fri, May 18, 2018 at 3:42 AM, Rajini Sivaram <rajinisiva...@gmail.com>
wrote:

> Hi Robert,
>
> A couple of questions:
>
>
>1. Since we always expand config values, don't we also need a way to
>include values that never get expanded? I may want to use
>"${vault:mypassword}" as my literal password without a lookup. Since we
>allow only level of indirection, perhaps all we need is a ConfigProvider
>that uses the string inside, for example: ${literal:${vault:mypassword}}
> ?
>It would avoid having restrictions on what passwords can look like.
>2. What is the behaviour if I specify a password that is
>"${notavault:something}" that matches the config provider syntax, but
> for
>which there is no config provider?
>
>
>
> On Fri, May 18, 2018 at 5:41 AM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > Thanks for addressing this Robert, it's a pretty common user need.
> >
> > First, +1 (binding) generally.
> >
> > Two very minor comments that I think could be clarified but wouldn't
> affect
> > votes:
> >
> > * Let's list in the KIP what package the ConfigProvider,
> > ConfigChangeCallback, ConfigData and ConfigTransformer interfaces are
> > defined in. Very, very minor, but given the aim to possibly reuse
> elsewhere
> > and the fact that it'll likely end up in the common packages might mean
> > devs focused more on the common/core packages will have strong opinions
> > where they should be. I think it'd definitely be good to get input from
> > folks focusing on the broker on where they think it should go since I
> think
> > it would be very natural to extend this to security settings there.
> (Also,
> > I think ConfigData is left out of the list of new interfaces by accident,
> > but I think it's clear what's being added anyway.)
> > * I may have glanced past it, but we're not shipping any ConfigProviders
> > out of the box? This mentions file and vault, but just as examples. Just
> > want to make sure everyone knows up front that this is a pluggable API,
> but
> > you need to add more jars to take advantage of it. I think this is fine
> as
> > I don't think there are truly common secrets provider
> > formats/apis/protocols, just want to make sure it is clear.
> >
> > Thanks,
> > Ewen
> >
> > On Thu, May 17, 2018 at 6:19 PM Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > +1
> > >  Original message From: Magesh Nandakumar <
> > > mage...@confluent.io> Date: 5/17/18  6:05 PM  (GMT-08:00) To:
> > > dev@kafka.apache.org Subject: Re: [VOTE] KIP-297: Externalizing
> Secrets
> > > for Connect Configurations
> > > Thanks Robert, this looks great
> > >
> > > +1 (non-binding)
> > >
> > > On Thu, May 17, 2018 at 5:35 PM, Colin McCabe <cmcc...@apache.org>
> > wrote:
> > >
> > > > Thanks, Robert!
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Colin
> > > >
> > > >
> > > > On Thu, May 17, 2018, at 14:15, Robert Yokota wrote:
> > > > > Hi Colin,
> > > > >
> > > > > I've changed the KIP to have a composite object returned from
> get().
> > > > It's
> > > > > probably the most straightforward option.  Please let me know if
> you
> > > have
> > > > > any other concerns.
> > > > >
> > > > > Thanks,
> > > > > Robert
> > > > >
> > > > > On Thu, May 17, 2018 at 11:44 AM, Robert 

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-18 Thread Robert Yokota
Thanks Ewen!

> * Let's list in the KIP what package the ConfigProvider,
> ConfigChangeCallback, ConfigData and ConfigTransformer interfaces are
> defined in.

Does org.apache.kafka.common.config work for people?

> Also, I think ConfigData is left out of the list of new interfaces by
accident

Good catch, I've added it.

> I may have glanced past it, but we're not shipping any ConfigProviders
> out of the box?

I've updated the KIP to state we intend to provide a FileConfigProvider out
of the box based on a properties file format.

Thanks,
Robert







On Thu, May 17, 2018 at 9:41 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Thanks for addressing this Robert, it's a pretty common user need.
>
> First, +1 (binding) generally.
>
> Two very minor comments that I think could be clarified but wouldn't affect
> votes:
>
> * Let's list in the KIP what package the ConfigProvider,
> ConfigChangeCallback, ConfigData and ConfigTransformer interfaces are
> defined in. Very, very minor, but given the aim to possibly reuse elsewhere
> and the fact that it'll likely end up in the common packages might mean
> devs focused more on the common/core packages will have strong opinions
> where they should be. I think it'd definitely be good to get input from
> folks focusing on the broker on where they think it should go since I think
> it would be very natural to extend this to security settings there. (Also,
> I think ConfigData is left out of the list of new interfaces by accident,
> but I think it's clear what's being added anyway.)
> * I may have glanced past it, but we're not shipping any ConfigProviders
> out of the box? This mentions file and vault, but just as examples. Just
> want to make sure everyone knows up front that this is a pluggable API, but
> you need to add more jars to take advantage of it. I think this is fine as
> I don't think there are truly common secrets provider
> formats/apis/protocols, just want to make sure it is clear.
>
> Thanks,
> Ewen
>
> On Thu, May 17, 2018 at 6:19 PM Ted Yu <yuzhih...@gmail.com> wrote:
>
> > +1
> >  Original message From: Magesh Nandakumar <
> > mage...@confluent.io> Date: 5/17/18  6:05 PM  (GMT-08:00) To:
> > dev@kafka.apache.org Subject: Re: [VOTE] KIP-297: Externalizing Secrets
> > for Connect Configurations
> > Thanks Robert, this looks great
> >
> > +1 (non-binding)
> >
> > On Thu, May 17, 2018 at 5:35 PM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > Thanks, Robert!
> > >
> > > +1 (non-binding)
> > >
> > > Colin
> > >
> > >
> > > On Thu, May 17, 2018, at 14:15, Robert Yokota wrote:
> > > > Hi Colin,
> > > >
> > > > I've changed the KIP to have a composite object returned from get().
> > > It's
> > > > probably the most straightforward option.  Please let me know if you
> > have
> > > > any other concerns.
> > > >
> > > > Thanks,
> > > > Robert
> > > >
> > > > On Thu, May 17, 2018 at 11:44 AM, Robert Yokota <rayok...@gmail.com>
> > > wrote:
> > > >
> > > > >
> > > > >
> > > > > Hi Colin,
> > > > >
> > > > > My last response was not that clear, so let me back up and explain
> a
> > > bit
> > > > > more.
> > > > >
> > > > > Some secret managers, such as Vault (and maybe Keywhiz) have the
> > > notion of
> > > > > a lease duration or a TTL for a path.  Every path can have a
> > different
> > > > > TTL.  This is period after which the value of the keys at the given
> > > path
> > > > > may be invalid.  It can be used to indicate a rotation will be
> done.
> > > In
> > > > > the cause of the Vault integration with AWS, Vault will actually
> > > delete the
> > > > > secrets from AWS at the moment the TTL expires.  A TTL could be
> used
> > by
> > > > > other ConfigProviders, such as a FileConfigProvider, to indicate
> that
> > > all
> > > > > the secrets at a given path (file), will be rotated on a regular
> > basis.
> > > > >
> > > > > I would like to expose the TTL in the APIs somewhere.  The TTL can
> be
> > > made
> > > > > available at the time get() is called.  Connect already has a built
> > in
> > > > > ScheduledExecutor, so Connect can just use the TTL to schedule a
> > > Connector
> > > > > restart.  

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-17 Thread Robert Yokota
Hi Colin,

I've changed the KIP to have a composite object returned from get().  It's
probably the most straightforward option.  Please let me know if you have
any other concerns.

Thanks,
Robert

On Thu, May 17, 2018 at 11:44 AM, Robert Yokota <rayok...@gmail.com> wrote:

>
>
> Hi Colin,
>
> My last response was not that clear, so let me back up and explain a bit
> more.
>
> Some secret managers, such as Vault (and maybe Keywhiz) have the notion of
> a lease duration or a TTL for a path.  Every path can have a different
> TTL.  This is period after which the value of the keys at the given path
> may be invalid.  It can be used to indicate a rotation will be done.  In
> the cause of the Vault integration with AWS, Vault will actually delete the
> secrets from AWS at the moment the TTL expires.  A TTL could be used by
> other ConfigProviders, such as a FileConfigProvider, to indicate that all
> the secrets at a given path (file), will be rotated on a regular basis.
>
> I would like to expose the TTL in the APIs somewhere.  The TTL can be made
> available at the time get() is called.  Connect already has a built in
> ScheduledExecutor, so Connect can just use the TTL to schedule a Connector
> restart.  Originally, I had exposed the TTL in a ConfigContext interface
> passed to the get() method.  To reduce the number of APIs, I placed it on
> the onChange() method.  This means at the time of get(), onChange() would
> be called with a TTL.  The Connector's implementation of the callback would
> use onChange() with the TTL to schedule a restart.
>
> If you think this is overloading onChange() too much, I could add the
> ConfigContext back to get():
>
>
> Map<String, String> get(ConfigContext ctx, String path);
>
> public interface ConfigContext {
>
> void willExpire(String path, long ttl);
>
> }
>
>
>
> or I could separate out the TTL method in the callback:
>
>
> public interface ConfigChangeCallback {
>
> void willExpire(String path, long ttl);
>
> void onChange(String path, Map<String, String> values);
> }
>
>
>
> Or we could return a composite object from get():
>
> ConfigData get(String path);
>
> public class ConfigData {
>
>   Map<String, String> data;
>   long ttl;
>
> }
>
>
> Do you have a preference Colin?
>
> Thanks,
> Robert
>
>
> On Thu, May 17, 2018 at 9:27 AM, Colin McCabe <cmcc...@apache.org> wrote:
>
>> Hi Robert,
>>
>> Hmm.  I thought that if you're using ConfigChangeCallback, you are
>> relying on the ConfigProvider to make a callback to you when the
>> configuration has changed.  So isn't that always the "push model" (where
>> the ConfigProvider pushes changes to Connect).  If you want the "pull
>> model" where you initiate updates, you can simply call ConfigProvider#get
>> directly, right?
>>
>> The actual implementation of ConfigProvider subclasses will depend on the
>> type of configuration storage mechanism on the backend.  In the case of
>> Vault, it sounds like we need to have something like a ScheduledExecutor
>> which re-fetches keys after a certain amount of time.
>>
>> As an aside, what does a "lease duration" mean for a configuration key?
>> Does that mean Vault will reject changes to the configuration key if I try
>> to make them within the lease duration?  Or is this like a period after
>> which a password is automatically rotated?
>>
>> On Wed, May 16, 2018, at 22:25, Robert Yokota wrote:
>> > Hi Colin,
>> >
>> > > With regard to delayMs, can’t we just restart the
>> > > Connector when the keys are actually changed?
>> >
>> > Currently the VaultConfigProvider does not find out when values for keys
>> > have changed.  You could do this with a poll model (with a
>> > background thread in the ConfigProvider), but since for each key-value
>> > pair, Vault provides a lease duration stating exactly when a value for a
>> > key will change in the future, this is an alternative model of just
>> passing
>> > the lease duration to the client (in this case the Connector), to allow
>> it
>> > to determine what to do (such as schedule a restart).   This may allow
>> one
>> > to avoid the complexity of figuring out a proper poll interval (with
>> lease
>> > durations of varying periods), or worrying about putting too much load
>> on
>> > the secrets manager by polling too often.
>>
>> Those things are still concerns if the Connector is polling, right?
>> Perhaps the connector poll too often and puts too much l

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-17 Thread Robert Yokota
Hi Colin,

My last response was not that clear, so let me back up and explain a bit
more.

Some secret managers, such as Vault (and maybe Keywhiz) have the notion of
a lease duration or a TTL for a path.  Every path can have a different
TTL.  This is period after which the value of the keys at the given path
may be invalid.  It can be used to indicate a rotation will be done.  In
the cause of the Vault integration with AWS, Vault will actually delete the
secrets from AWS at the moment the TTL expires.  A TTL could be used by
other ConfigProviders, such as a FileConfigProvider, to indicate that all
the secrets at a given path (file), will be rotated on a regular basis.

I would like to expose the TTL in the APIs somewhere.  The TTL can be made
available at the time get() is called.  Connect already has a built in
ScheduledExecutor, so Connect can just use the TTL to schedule a Connector
restart.  Originally, I had exposed the TTL in a ConfigContext interface
passed to the get() method.  To reduce the number of APIs, I placed it on
the onChange() method.  This means at the time of get(), onChange() would
be called with a TTL.  The Connector's implementation of the callback would
use onChange() with the TTL to schedule a restart.

If you think this is overloading onChange() too much, I could add the
ConfigContext back to get():


Map<String, String> get(ConfigContext ctx, String path);

public interface ConfigContext {

void willExpire(String path, long ttl);

}



or I could separate out the TTL method in the callback:


public interface ConfigChangeCallback {

void willExpire(String path, long ttl);

void onChange(String path, Map<String, String> values);
}



Or we could return a composite object from get():

ConfigData get(String path);

public class ConfigData {

  Map<String, String> data;
  long ttl;

}


Do you have a preference Colin?

Thanks,
Robert


On Thu, May 17, 2018 at 9:27 AM, Colin McCabe <cmcc...@apache.org> wrote:

> Hi Robert,
>
> Hmm.  I thought that if you're using ConfigChangeCallback, you are relying
> on the ConfigProvider to make a callback to you when the configuration has
> changed.  So isn't that always the "push model" (where the ConfigProvider
> pushes changes to Connect).  If you want the "pull model" where you
> initiate updates, you can simply call ConfigProvider#get directly, right?
>
> The actual implementation of ConfigProvider subclasses will depend on the
> type of configuration storage mechanism on the backend.  In the case of
> Vault, it sounds like we need to have something like a ScheduledExecutor
> which re-fetches keys after a certain amount of time.
>
> As an aside, what does a "lease duration" mean for a configuration key?
> Does that mean Vault will reject changes to the configuration key if I try
> to make them within the lease duration?  Or is this like a period after
> which a password is automatically rotated?
>
> On Wed, May 16, 2018, at 22:25, Robert Yokota wrote:
> > Hi Colin,
> >
> > > With regard to delayMs, can’t we just restart the
> > > Connector when the keys are actually changed?
> >
> > Currently the VaultConfigProvider does not find out when values for keys
> > have changed.  You could do this with a poll model (with a
> > background thread in the ConfigProvider), but since for each key-value
> > pair, Vault provides a lease duration stating exactly when a value for a
> > key will change in the future, this is an alternative model of just
> passing
> > the lease duration to the client (in this case the Connector), to allow
> it
> > to determine what to do (such as schedule a restart).   This may allow
> one
> > to avoid the complexity of figuring out a proper poll interval (with
> lease
> > durations of varying periods), or worrying about putting too much load on
> > the secrets manager by polling too often.
>
> Those things are still concerns if the Connector is polling, right?
> Perhaps the connector poll too often and puts too much load on Vault.  And
> so forth.  It seems like this problem needs to be solved either way (and
> probably can be solved with reasonable default minimum fetch intervals).
>
> best,
> Colin
>
>
> >  In other words, by adding this
> > one additional parameter, a ConfigProvider can provide both push and pull
> > models to clients, perhaps with an additional configuration parameter to
> > the ConfigProvider to determine which model (push or poll) to use.
> >
> > Thanks,
> > Robert
> >
> > On Wed, May 16, 2018 at 9:56 PM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > Thanks, Robert.  With regard to delayMs, can’t we just restart the
> > > Connector when the keys are act

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-16 Thread Robert Yokota
Hi Colin,

> With regard to delayMs, can’t we just restart the
> Connector when the keys are actually changed?

Currently the VaultConfigProvider does not find out when values for keys
have changed.  You could do this with a poll model (with a
background thread in the ConfigProvider), but since for each key-value
pair, Vault provides a lease duration stating exactly when a value for a
key will change in the future, this is an alternative model of just passing
the lease duration to the client (in this case the Connector), to allow it
to determine what to do (such as schedule a restart).   This may allow one
to avoid the complexity of figuring out a proper poll interval (with lease
durations of varying periods), or worrying about putting too much load on
the secrets manager by polling too often.  In other words, by adding this
one additional parameter, a ConfigProvider can provide both push and pull
models to clients, perhaps with an additional configuration parameter to
the ConfigProvider to determine which model (push or poll) to use.

Thanks,
Robert

On Wed, May 16, 2018 at 9:56 PM, Colin McCabe <cmcc...@apache.org> wrote:

> Thanks, Robert.  With regard to delayMs, can’t we just restart the
> Connector when the keys are actually changed?  Or is the concern that
> this would lengthen the effective key rotation time?  Can’t the user
> just configure a slightly shorter key rotation time to counteract
> this concern?
> Regards,
> Colin
>
> On Wed, May 16, 2018, at 19:13, Robert Yokota wrote:
> > Hi Colin,
> >
> > Good questions.
> >
> >
> > > As a clarification about the indirections, what if I have the
> > > connect> configuration key foo set up as ${vault:bar}, and in Vault,
> > have the bar> key set to ${file:baz}?
> > > Does connect get foo as the contents of the baz file?  I would
> > > argue that> it should not (and in general, we shouldn't allow
> ConfigProviders to
> > indirect to other
> > > ConfigProviders) but I don't think it's spelled out right now.
> >
> > I've added a clarification to the KIP that further indirections are
> > not> performed even if the values returned from ConfigProviders have the
> > variable syntax.
> >
> >
> > > What's the behavior when a config key is not found in Vault
> > > (or other> ConfigProvider)?  Does the variable get replaced with the
> empty
> > string, or> with the literal ${vault:whatever} string?
> >
> > It would remain unresolved and still be of the form
> > ${provider:key}.  I've> added a clarification to the KIP.
> >
> >
> > > Do we really need "${provider:[path:]key}", or can it just be
> > ${provider:key}?
> >
> > The path is a separate parameter in the APIs, so I think it's
> > important to> explicitly delineate it in the variable syntax.  For
> example, I
> > currently> have a working VaultConfigProvider prototype and the syntax
> for a
> > Vault key> reference looks like
> >
> > db_password=${vault:secret/staging:mysql_password}
> >
> > I think it's important to standardize how to separate the path
> > from the key> rather than leave it to each ConfigProvider to determine a
> possibly
> > different way.  This will also make it easier to move secrets from one>
> ConfigProvider to another should one choose to do so.
> >
> >
> > > Do we really need delayMs?
> >
> > One of the goals of this KIP is to allow for secrets rotation without>
> having to modify existing connectors.  In the case of the
> > VaultConfigProvider, it knows the lease durations and will be able to>
> schedule a restart of the Connector using an API in the Herder.  The
> > delayMs will simply be passed to the Herder.restartConnector(long
> > delayMs,> String connName, Callback cb) method here:
> >
> > https://github.com/rayokota/kafka/blob/secrets-in-connect-
> configs/connect/runtime/src/main/java/org/apache/kafka/
> connect/runtime/Herder.java#L170>
> >
> > Best,
> > Robert
> >
> >
> >
> > On Wed, May 16, 2018 at 6:16 PM, Colin McCabe
> > <cmcc...@apache.org> wrote:>
> > > Thanks, Robert.  Looks good overall.
> > >
> > > As a clarification about the indirections, what if I have the
> > > connect> > configuration key foo set up as ${vault:bar}, and in Vault,
> have
> > > the bar> > key set to ${file:baz}?  Does connect get foo as the
> contents of
> > > the baz> > file?  I would argue that it should not (and in general, we
> > > shouldn't allow> > ConfigProviders to ind

Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-16 Thread Robert Yokota
Hi Colin,

Good questions.


> As a clarification about the indirections, what if I have the connect
configuration key foo set up as ${vault:bar}, and in Vault, have the bar
key set to ${file:baz}?
> Does connect get foo as the contents of the baz file?  I would argue that
it should not (and in general, we shouldn't allow ConfigProviders to
indirect to other
> ConfigProviders) but I don't think it's spelled out right now.

I've added a clarification to the KIP that further indirections are not
performed even if the values returned from ConfigProviders have the
variable syntax.


> What's the behavior when a config key is not found in Vault (or other
ConfigProvider)?  Does the variable get replaced with the empty string, or
with the literal ${vault:whatever} string?

It would remain unresolved and still be of the form ${provider:key}.  I've
added a clarification to the KIP.


> Do we really need "${provider:[path:]key}", or can it just be
${provider:key}?

The path is a separate parameter in the APIs, so I think it's important to
explicitly delineate it in the variable syntax.  For example, I currently
have a working VaultConfigProvider prototype and the syntax for a Vault key
reference looks like

db_password=${vault:secret/staging:mysql_password}

I think it's important to standardize how to separate the path from the key
rather than leave it to each ConfigProvider to determine a possibly
different way.  This will also make it easier to move secrets from one
ConfigProvider to another should one choose to do so.


> Do we really need delayMs?

One of the goals of this KIP is to allow for secrets rotation without
having to modify existing connectors.  In the case of the
VaultConfigProvider, it knows the lease durations and will be able to
schedule a restart of the Connector using an API in the Herder.  The
delayMs will simply be passed to the Herder.restartConnector(long delayMs,
String connName, Callback cb) method here:

https://github.com/rayokota/kafka/blob/secrets-in-connect-configs/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/Herder.java#L170


Best,
Robert



On Wed, May 16, 2018 at 6:16 PM, Colin McCabe <cmcc...@apache.org> wrote:

> Thanks, Robert.  Looks good overall.
>
> As a clarification about the indirections, what if I have the connect
> configuration key foo set up as ${vault:bar}, and in Vault, have the bar
> key set to ${file:baz}?  Does connect get foo as the contents of the baz
> file?  I would argue that it should not (and in general, we shouldn't allow
> ConfigProviders to indirect to other ConfigProviders) but I don't think
> it's spelled out right now.
>
> What's the behavior when a config key is not found in Vault (or other
> ConfigProvider)?  Does the variable get replaced with the empty string, or
> with the literal ${vault:whatever} string?
>
> Do we really need "${provider:[path:]key}", or can it just be
> ${provider:key}?  It seems like the path can be rolled up into the key.  So
> if you want to put your connect keys under my.connect.path, you ask for
> ${vault:my.connect.path.jdbc.config}, etc.
>
> >// A delayMs of 0 indicates an immediate change; a positive delayMs
> indicates
> >// that a future change is anticipated (such as a lease duration)
> >void onChange(String path, Map<String, String> values, int delayMs);
>
> Do we really need delayMs?  It seems like if you get a callback with
> delayMs set, you don't know what the new values will be, only that an
> update is coming, but not yet here.
>
> best,
> Colin
>
>
> On Wed, May 16, 2018, at 17:05, Robert Yokota wrote:
> > Hello everyone,
> >
> > After a good round of discussions with excellent feedback and no major
> > objections, I would like to start a vote on KIP-297 to externalize
> secrets
> > from Kafka Connect configurations.  My thanks in advance!
> >
> > KIP: <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 297%3A+Externalizing+Secrets+for+Connect+Configurations
> > >
> >
> > JIRA: <https://issues.apache.org/jira/browse/KAFKA-6886>
> >
> > Discussion thread: <
> > https://www.mail-archive.com/dev@kafka.apache.org/msg87638.html>
> >
> > Best,
> > Robert
>


[VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-16 Thread Robert Yokota
Hello everyone,

After a good round of discussions with excellent feedback and no major
objections, I would like to start a vote on KIP-297 to externalize secrets
from Kafka Connect configurations.  My thanks in advance!

KIP: <
https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations
>

JIRA: 

Discussion thread: <
https://www.mail-archive.com/dev@kafka.apache.org/msg87638.html>

Best,
Robert


Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-16 Thread Robert Yokota
f Map<String, String>.
>3. For ZK, we would use ConfigProvider.get() without `keys` to get all
>keys in the path. Do we have two get() methods since some providers need
>keys to be specified and some don't? How do we decide which one to use?
>
> Thanks,
>
> Rajini
>
>
> On Wed, May 16, 2018 at 2:40 AM, Robert Yokota <rayok...@gmail.com> wrote:
>
> > Thanks, Ron!  I will take a look.
> >
> > Regards,
> > Robert
> >
> > On Tue, May 15, 2018 at 5:59 PM, Ron Dagostino <rndg...@gmail.com>
> wrote:
> >
> > > Hi Robert.  Regarding your comment "use the lease duration to schedule
> a
> > > configuration reload in the future", you might be interested in the
> code
> > > that does refresh for OAuth Bearer Tokens in KIP-255; specifically, the
> > > class
> > > org.apache.kafka.common.security.oauthbearer.internal.expiring.
> > > ExpiringCredentialRefreshingLogin.
> > > The class performs JAAS logins/relogins based on the expiration time
> of a
> > > retrieved expiring credential.  The implementation of that class is
> > > inspired by the code that currently does refresh for Kerberos tickets
> but
> > > is more reusable.  I don't know if you will leverage JAAS for defining
> > how
> > > to go get a credential (you could since you have to provide credentials
> > to
> > > authenticate to the remote systems anyway), but regardless, that class
> > > should be useful at least in some minimal sense if not more than that.
> > See
> > > https://github.com/apache/kafka/pull/4994.
> > >
> > > Ron
> > >
> > > Ron
> > >
> > > On Tue, May 15, 2018 at 8:01 PM, Robert Yokota <rayok...@gmail.com>
> > wrote:
> > >
> > > > Hi Colin,
> > > >
> > > > Thanks for the feedback!
> > > >
> > > >
> > > > > The KIP says that "Vault is very popular and has been described as
> > 'the
> > > > current gold standard in secret management and provisioning'."  I
> think
> > > > this might be a bit too much detail -- we don't really need to
> > > > > favorites, right? :)
> > > >
> > > > I've removed this line :)
> > > >
> > > >
> > > > > I think we should make the substitution part of the generic
> > > configuration
> > > > code, rather than specific to individual ConfigProviders.  We don't
> > > really
> > > > want it to work differently for Vault vs. KeyWhiz vs.
> > > > > AWS secrets, etc. etc.
> > > >
> > > > Yes, the ConfigProviders merely serve up key-value pairs.  A helper
> > class
> > > > like ConfigTransformer would perform the variable substitutions if
> > > desired.
> > > >
> > > >
> > > > > We should also spell out exactly how substitution works.
> > > >
> > > > By one-level of indirection I just meant a simple replacement of
> > > variables
> > > > (which are the indirect references).  So if you have foo=${bar} and
> > > > bar=${baz} and your file contains bar=hello, baz=world, then the
> final
> > > > result would be foo=hello and bar=world.  I've added this example to
> > the
> > > > KIP.
> > > >
> > > > You can see this as the DEFAULT_PATTERN in the ConfigTransformer.
> The
> > > > ConfigTransformer only provides one level of indirection.
> > > >
> > > >
> > > > > We should also spell out how this interacts with KIP-226
> > > configurations.
> > > >
> > > > Yes, I mention at the beginning that KIP-226 could use the
> > ConfigProvider
> > > > but not the ConfigTransformer.
> > > >
> > > >
> > > > > Maybe a good generic interface would be like this:
> > > >
> > > > I've added the subscription APIs but would like to keep the other
> APIs
> > > as I
> > > > will need them for integration with Vault.  With Vault I obtain the
> > lease
> > > > duration at the time the key is obtained, so at that time I would
> want
> > to
> > > > use the lease duration to schedule a configuration reload in the
> > future.
> > > > This is similar to how the integration between Vault and the Spring
> > > > Framework works.   Also, the lease duration would be included in the
> > > > metadata map vs. 

Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-15 Thread Robert Yokota
Thanks, Ron!  I will take a look.

Regards,
Robert

On Tue, May 15, 2018 at 5:59 PM, Ron Dagostino <rndg...@gmail.com> wrote:

> Hi Robert.  Regarding your comment "use the lease duration to schedule a
> configuration reload in the future", you might be interested in the code
> that does refresh for OAuth Bearer Tokens in KIP-255; specifically, the
> class
> org.apache.kafka.common.security.oauthbearer.internal.expiring.
> ExpiringCredentialRefreshingLogin.
> The class performs JAAS logins/relogins based on the expiration time of a
> retrieved expiring credential.  The implementation of that class is
> inspired by the code that currently does refresh for Kerberos tickets but
> is more reusable.  I don't know if you will leverage JAAS for defining how
> to go get a credential (you could since you have to provide credentials to
> authenticate to the remote systems anyway), but regardless, that class
> should be useful at least in some minimal sense if not more than that.  See
> https://github.com/apache/kafka/pull/4994.
>
> Ron
>
> Ron
>
> On Tue, May 15, 2018 at 8:01 PM, Robert Yokota <rayok...@gmail.com> wrote:
>
> > Hi Colin,
> >
> > Thanks for the feedback!
> >
> >
> > > The KIP says that "Vault is very popular and has been described as 'the
> > current gold standard in secret management and provisioning'."  I think
> > this might be a bit too much detail -- we don't really need to
> > > favorites, right? :)
> >
> > I've removed this line :)
> >
> >
> > > I think we should make the substitution part of the generic
> configuration
> > code, rather than specific to individual ConfigProviders.  We don't
> really
> > want it to work differently for Vault vs. KeyWhiz vs.
> > > AWS secrets, etc. etc.
> >
> > Yes, the ConfigProviders merely serve up key-value pairs.  A helper class
> > like ConfigTransformer would perform the variable substitutions if
> desired.
> >
> >
> > > We should also spell out exactly how substitution works.
> >
> > By one-level of indirection I just meant a simple replacement of
> variables
> > (which are the indirect references).  So if you have foo=${bar} and
> > bar=${baz} and your file contains bar=hello, baz=world, then the final
> > result would be foo=hello and bar=world.  I've added this example to the
> > KIP.
> >
> > You can see this as the DEFAULT_PATTERN in the ConfigTransformer.  The
> > ConfigTransformer only provides one level of indirection.
> >
> >
> > > We should also spell out how this interacts with KIP-226
> configurations.
> >
> > Yes, I mention at the beginning that KIP-226 could use the ConfigProvider
> > but not the ConfigTransformer.
> >
> >
> > > Maybe a good generic interface would be like this:
> >
> > I've added the subscription APIs but would like to keep the other APIs
> as I
> > will need them for integration with Vault.  With Vault I obtain the lease
> > duration at the time the key is obtained, so at that time I would want to
> > use the lease duration to schedule a configuration reload in the future.
> > This is similar to how the integration between Vault and the Spring
> > Framework works.   Also, the lease duration would be included in the
> > metadata map vs. the data map.  Finally, I need an additional "path" or
> > "bucket" parameter, which is used by Vault to indicate which set of
> > key-values are to be retrieved.
> >
> >
> > > With regard to ConfigTransformer: do we need to include all this code
> in
> > the KIP?  Seems like an implementation detail.
> >
> > I use the ConfigTransformer to show how the pattern ${provider:key} is
> > defined and how the substitution only involves one level of indirection.
> > If you feel it does not add anything to the text, I can remove it.
> >
> >
> > > Is there a way to avoid this couping?
> >
> > I'd have to look into it and get back to you.  However, I assume that the
> > answer is not relevant for this KIP :)
> >
> >
> > Thanks,
> > Robert
> >
> >
> >
> >
> >
> > On Tue, May 15, 2018 at 4:04 PM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > Hi Robert,
> > >
> > > Thanks for posting this.  In the past we've been kind of reluctant to
> add
> > > more complexity to configuration.  I think Connect does have a clear
> need
> > > for this kind of functionality, though.  As you mention, Connect
> > integrates
> > > w

Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-15 Thread Robert Yokota
ibe to notifications
> about changes to certain parameters.  Maybe a good generic interface would
> be like this:
>
>  > public interface ConfigProvider extends Closeable {
> >  // batched get is potentially more efficient
>  > Map<String, String> get(Collection keys);
> >
> >// The ConfigProvider is responsible for making this callback
> whenever the key changes.
> >// Some ConfigProviders may want to have a background thread with a
> configurable update interval.
>  > void subscribe(String key, ConfigurationChangeCallback callback);
> >
> >// Inverse of subscribe
>  > void unsubscribe(String key);
> >
> >// Close all subscriptions and clean up all resources
>  > void close();
>  > }
>  >
>  > interface ConfigurationChangeCallback {
>  > void onChange(String key, String value);
>  > }
>
> With regard to ConfigTransformer: do we need to include all this code in
> the KIP?  Seems like an implementation detail.
>
> > Other connectors such as the S3 connector are tightly coupled with a
> particular secret manager, and may
> > wish to handle rotation on their own.
>
> Is there a way to avoid this couping?  Seems like some users might want to
> use their own secret manager here.
>
> best,
> Colin
>
>
> On Wed, May 9, 2018, at 16:32, Robert Yokota wrote:
> > Hi Magesh,
> >
> > I updated the KIP with a link to a PR for a working prototype.  The
> > prototype does not yet use the Connect plugin machinery for class loader
> > isolation, but should give you an idea of what the final implementation
> > will look like.  Here is the link:
> > https://github.com/apache/kafka/pull/4990/files.
> >
> > I also added an example of a FileConfigProvider to the KIP.
> >
> > Thanks,
> > Robert
> >
> > On Wed, May 9, 2018 at 10:04 AM, Robert Yokota <rayok...@gmail.com>
> wrote:
> >
> > > Hi Magesh,
> > >
> > > Thanks for the feedback!
> > >
> > > I will put together a PR to demonstrate what the implementation might
> look
> > > like, as well as a reference FileConfigProvider.
> > >
> > > 1.  The delayMs for a (potentially) scheduled reload is determined by
> the
> > > ConfigProvider.  For example, a (hypothetical) VaultConfigProvider,
> upon
> > > contacting Vault for a particular secret, might also obtain a lease
> > > duration indicating that the secret expires in 1 hour. The
> > > VaultConfigProvider could then call scheduleConfigReload with delayMs
> set
> > > to 360ms (1 hour).  This would cause the Connector to restart in an
> > > hour, forcing it to reload the configs and re-resolve all indirect
> > > references.
> > >
> > > 2. Yes, the start() methods in SourceTask and SinkTask would get the
> > > configs with all the indirect references resolved.   Those config()
> methods
> > > are for Connectors that want to get the latest configs (with all
> indirect
> > > references re-resolved) at some time after start().  For example, if a
> Task
> > > encountered some security exception because a secret expired, it could
> call
> > > config() to get the config with the latest values.  This is assuming
> that
> > > the Task can gracefully recover from the security exception.
> > >
> > > 3. Yes, that is up to the ConfigProvider implementation and is out of
> > > scope.  If the ConfigProvider also needs some kind of secrets or other
> > > data, it could possibly be passed in through the param properties
> > > ("config.providers.vault.param.auth=/run/myauth").  For example Docker
> > > might generate the auth info for Vault in an in-memory tmpfs file that
> > > could then be passed as a param.
> > >
> > > Thanks,
> > > Robert
> > >
> > >
> > >
> > > On Tue, May 8, 2018 at 10:10 PM, Magesh Nandakumar <
> mage...@confluent.io>
> > > wrote:
> > >
> > >> Hi Robert,
> > >>
> > >> Thanks for the KIP. I think, this will be a great addition to the
> > >> framework. I think, will be great if the KIP can elaborate a little
> bit
> > >> more on how implementations would look like with an example.
> > >> Also, would be good to provide a reference implementation as well.
> > >>
> > >> The other questions I had were
> > >>
> > >> 1.  How would the framework get the delayMs for void
> scheduleConfigReload(
> > >> long delayMs);
> > >> 2. Would the start methods in SourceTask and SinkTask get the configs
> with
> > >> all the indirect references resolved. If so, trying to understand
> > >> the intent of the config() in SourceTaskContext and the
> SinkTaskContext
> > >> 3. What if the provider itself needs some kind of secrets to be
> configured
> > >> to connect to it? I assume that's out of scope for this proposal but
> > >> wanted
> > >> to clarify it.
> > >>
> > >> Thanks
> > >> Magesh
> > >>
> > >> On Tue, May 8, 2018 at 1:52 PM, Robert Yokota <rayok...@gmail.com>
> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I would like to start a discussion for KIP-297 to externalize
> secrets
> > >> from
> > >> > Kafka Connect configurations.  Any feedback is appreciated.
> > >> > <
> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > 297%3A+Externalizing+Secrets+for+Connect+Configurations
> > >> > >
> > >> >
> > >> > JIRA: <https://issues.apache.org/jira/browse/KAFKA-6886>
> > >> >
> > >> > Thanks in advance,
> > >> > Robert
> > >> >
> > >>
> > >
> > >
>


[jira] [Created] (KAFKA-6894) Cannot access GlobalKTable store from KStream.transform()

2018-05-10 Thread Robert Yokota (JIRA)
Robert Yokota created KAFKA-6894:


 Summary: Cannot access GlobalKTable store from KStream.transform()
 Key: KAFKA-6894
 URL: https://issues.apache.org/jira/browse/KAFKA-6894
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.1.0
Reporter: Robert Yokota
Assignee: Robert Yokota


I was trying to access a store from a {{GlobalKTable}} in 
{{KStream.transform()}}, but I got the following error:

{code}
org.apache.kafka.streams.errors.TopologyException: Invalid topology: StateStore 
globalStore is not added yet.
at 
org.apache.kafka.streams.processor.internals.InternalTopologyBuilder.connectProcessorAndStateStore(InternalTopologyBuilder.java:716)
at 
org.apache.kafka.streams.processor.internals.InternalTopologyBuilder.connectProcessorAndStateStores(InternalTopologyBuilder.java:615)
at 
org.apache.kafka.streams.kstream.internals.KStreamImpl.transform(KStreamImpl.java:521)
{code}

I was able to make a change to 
{{InternalTopologyBuilder.connectProcessorAndState}} to allow me to access the 
global store from {{KStream.transform()}}.  I will submit a PR for review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-09 Thread Robert Yokota
Hi Magesh,

I updated the KIP with a link to a PR for a working prototype.  The
prototype does not yet use the Connect plugin machinery for class loader
isolation, but should give you an idea of what the final implementation
will look like.  Here is the link:
https://github.com/apache/kafka/pull/4990/files.

I also added an example of a FileConfigProvider to the KIP.

Thanks,
Robert

On Wed, May 9, 2018 at 10:04 AM, Robert Yokota <rayok...@gmail.com> wrote:

> Hi Magesh,
>
> Thanks for the feedback!
>
> I will put together a PR to demonstrate what the implementation might look
> like, as well as a reference FileConfigProvider.
>
> 1.  The delayMs for a (potentially) scheduled reload is determined by the
> ConfigProvider.  For example, a (hypothetical) VaultConfigProvider, upon
> contacting Vault for a particular secret, might also obtain a lease
> duration indicating that the secret expires in 1 hour. The
> VaultConfigProvider could then call scheduleConfigReload with delayMs set
> to 360ms (1 hour).  This would cause the Connector to restart in an
> hour, forcing it to reload the configs and re-resolve all indirect
> references.
>
> 2. Yes, the start() methods in SourceTask and SinkTask would get the
> configs with all the indirect references resolved.   Those config() methods
> are for Connectors that want to get the latest configs (with all indirect
> references re-resolved) at some time after start().  For example, if a Task
> encountered some security exception because a secret expired, it could call
> config() to get the config with the latest values.  This is assuming that
> the Task can gracefully recover from the security exception.
>
> 3. Yes, that is up to the ConfigProvider implementation and is out of
> scope.  If the ConfigProvider also needs some kind of secrets or other
> data, it could possibly be passed in through the param properties
> ("config.providers.vault.param.auth=/run/myauth").  For example Docker
> might generate the auth info for Vault in an in-memory tmpfs file that
> could then be passed as a param.
>
> Thanks,
> Robert
>
>
>
> On Tue, May 8, 2018 at 10:10 PM, Magesh Nandakumar <mage...@confluent.io>
> wrote:
>
>> Hi Robert,
>>
>> Thanks for the KIP. I think, this will be a great addition to the
>> framework. I think, will be great if the KIP can elaborate a little bit
>> more on how implementations would look like with an example.
>> Also, would be good to provide a reference implementation as well.
>>
>> The other questions I had were
>>
>> 1.  How would the framework get the delayMs for void scheduleConfigReload(
>> long delayMs);
>> 2. Would the start methods in SourceTask and SinkTask get the configs with
>> all the indirect references resolved. If so, trying to understand
>> the intent of the config() in SourceTaskContext and the SinkTaskContext
>> 3. What if the provider itself needs some kind of secrets to be configured
>> to connect to it? I assume that's out of scope for this proposal but
>> wanted
>> to clarify it.
>>
>> Thanks
>> Magesh
>>
>> On Tue, May 8, 2018 at 1:52 PM, Robert Yokota <rayok...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I would like to start a discussion for KIP-297 to externalize secrets
>> from
>> > Kafka Connect configurations.  Any feedback is appreciated.
>> > <
>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 297%3A+Externalizing+Secrets+for+Connect+Configurations
>> > >
>> >
>> > JIRA: <https://issues.apache.org/jira/browse/KAFKA-6886>
>> >
>> > Thanks in advance,
>> > Robert
>> >
>>
>
>


Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-09 Thread Robert Yokota
Hi Ron,

Thanks for the pointer!  I'll take a look at KIP 269.  I will also put
together a PR to demonstrate what the implementation might look like.

Thanks,
Robert

On Wed, May 9, 2018 at 6:44 AM, Ron Dagostino <rndg...@gmail.com> wrote:

> Hi Robert.  You may find KIP-269: Substitution Within Configuration Values
> (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+
> 269+Substitution+Within+Configuration+Values) to be interesting.  I don't
> know if it is relevant here -- I agree with Magesh, more concrete details
> would help, including to determine if KIP 269 is potentially adaptable to
> fit -- but I thought you should know about it in case you were not aware of
> it and it does turn out to be useful.  Note that I will soon be marking KIP
> 269 "Inactive" with respect to the core Kafka implementation, but that
> doesn't mean the ideas (or even the code) couldn't be used in the Connect
> framework if it was determined that would be appropriate.  I hope you find
> it helpful.
>
> Ron
>
>
> On Wed, May 9, 2018 at 1:10 AM, Magesh Nandakumar <mage...@confluent.io>
> wrote:
>
> > Hi Robert,
> >
> > Thanks for the KIP. I think, this will be a great addition to the
> > framework. I think, will be great if the KIP can elaborate a little bit
> > more on how implementations would look like with an example.
> > Also, would be good to provide a reference implementation as well.
> >
> > The other questions I had were
> >
> > 1.  How would the framework get the delayMs for void
> scheduleConfigReload(
> > long delayMs);
> > 2. Would the start methods in SourceTask and SinkTask get the configs
> with
> > all the indirect references resolved. If so, trying to understand
> > the intent of the config() in SourceTaskContext and the SinkTaskContext
> > 3. What if the provider itself needs some kind of secrets to be
> configured
> > to connect to it? I assume that's out of scope for this proposal but
> wanted
> > to clarify it.
> >
> > Thanks
> > Magesh
> >
> > On Tue, May 8, 2018 at 1:52 PM, Robert Yokota <rayok...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I would like to start a discussion for KIP-297 to externalize secrets
> > from
> > > Kafka Connect configurations.  Any feedback is appreciated.
> > > <
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 297%3A+Externalizing+Secrets+for+Connect+Configurations
> > > >
> > >
> > > JIRA: <https://issues.apache.org/jira/browse/KAFKA-6886>
> > >
> > > Thanks in advance,
> > > Robert
> > >
> >
>


Re: [DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-09 Thread Robert Yokota
Hi Magesh,

Thanks for the feedback!

I will put together a PR to demonstrate what the implementation might look
like, as well as a reference FileConfigProvider.

1.  The delayMs for a (potentially) scheduled reload is determined by the
ConfigProvider.  For example, a (hypothetical) VaultConfigProvider, upon
contacting Vault for a particular secret, might also obtain a lease
duration indicating that the secret expires in 1 hour. The
VaultConfigProvider could then call scheduleConfigReload with delayMs set
to 360ms (1 hour).  This would cause the Connector to restart in an
hour, forcing it to reload the configs and re-resolve all indirect
references.

2. Yes, the start() methods in SourceTask and SinkTask would get the
configs with all the indirect references resolved.   Those config() methods
are for Connectors that want to get the latest configs (with all indirect
references re-resolved) at some time after start().  For example, if a Task
encountered some security exception because a secret expired, it could call
config() to get the config with the latest values.  This is assuming that
the Task can gracefully recover from the security exception.

3. Yes, that is up to the ConfigProvider implementation and is out of
scope.  If the ConfigProvider also needs some kind of secrets or other
data, it could possibly be passed in through the param properties
("config.providers.vault.param.auth=/run/myauth").  For example Docker
might generate the auth info for Vault in an in-memory tmpfs file that
could then be passed as a param.

Thanks,
Robert



On Tue, May 8, 2018 at 10:10 PM, Magesh Nandakumar <mage...@confluent.io>
wrote:

> Hi Robert,
>
> Thanks for the KIP. I think, this will be a great addition to the
> framework. I think, will be great if the KIP can elaborate a little bit
> more on how implementations would look like with an example.
> Also, would be good to provide a reference implementation as well.
>
> The other questions I had were
>
> 1.  How would the framework get the delayMs for void scheduleConfigReload(
> long delayMs);
> 2. Would the start methods in SourceTask and SinkTask get the configs with
> all the indirect references resolved. If so, trying to understand
> the intent of the config() in SourceTaskContext and the SinkTaskContext
> 3. What if the provider itself needs some kind of secrets to be configured
> to connect to it? I assume that's out of scope for this proposal but wanted
> to clarify it.
>
> Thanks
> Magesh
>
> On Tue, May 8, 2018 at 1:52 PM, Robert Yokota <rayok...@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to start a discussion for KIP-297 to externalize secrets
> from
> > Kafka Connect configurations.  Any feedback is appreciated.
> > <
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 297%3A+Externalizing+Secrets+for+Connect+Configurations
> > >
> >
> > JIRA: <https://issues.apache.org/jira/browse/KAFKA-6886>
> >
> > Thanks in advance,
> > Robert
> >
>


[DISCUSS] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-08 Thread Robert Yokota
Hi,

I would like to start a discussion for KIP-297 to externalize secrets from
Kafka Connect configurations.  Any feedback is appreciated.
<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations
>

JIRA: 

Thanks in advance,
Robert


[jira] [Created] (KAFKA-6886) Externalize Secrets for Kafka Connect Configurations

2018-05-08 Thread Robert Yokota (JIRA)
Robert Yokota created KAFKA-6886:


 Summary: Externalize Secrets for Kafka Connect Configurations
 Key: KAFKA-6886
 URL: https://issues.apache.org/jira/browse/KAFKA-6886
 Project: Kafka
  Issue Type: New Feature
  Components: KafkaConnect
Reporter: Robert Yokota
Assignee: Robert Yokota
 Fix For: 2.0.0


Kafka Connect's connector configurations have plaintext passwords, and Connect 
stores these in cleartext either on the filesystem (for standalone mode) or in 
internal topics (for distributed mode). 

Connect should not store or transmit cleartext passwords in connector 
configurations. Secrets in stored connector configurations should be allowed to 
be replaced with references to values stored in external secret management 
systems. Connect should provide an extension point for adding customized 
integrations, as well as provide a file-based extension as an example. Second, 
a Connect runtime should be allowed to be configured to use one or more of 
these extensions, and allow connector configurations to use placeholders that 
will be resolved by the runtime before passing the complete connector 
configurations to connectors. This will allow existing connectors to not see 
any difference in the configurations that Connect provides to them at startup. 
And third, Connect's API should be changed to allow a connector to obtain the 
latest connector configuration at any time.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-6407) Sink task metrics are the same for all connectors

2018-02-09 Thread Robert Yokota (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Yokota resolved KAFKA-6407.
--
Resolution: Duplicate

> Sink task metrics are the same for all connectors
> -
>
> Key: KAFKA-6407
> URL: https://issues.apache.org/jira/browse/KAFKA-6407
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 1.0.0
>Reporter: Alexander Koval
>Priority: Minor
>
> I have a lot of sink connectors inside a distributed worker. When I tried to 
> save metrics to graphite I discovered all task metrics are identical.
> {code}
> $>get -b 
> kafka.connect:type=sink-task-metrics,connector=prom-by-catalog-company,task=0 
> sink-record-read-total
> #mbean = 
> kafka.connect:type=sink-task-metrics,connector=prom-by-catalog-company,task=0:
> sink-record-read-total = 228744.0;
> $>get -b 
> kafka.connect:type=sink-task-metrics,connector=prom-kz-catalog-product,task=0 
> sink-record-read-total
> #mbean = 
> kafka.connect:type=sink-task-metrics,connector=prom-kz-catalog-product,task=0:
> sink-record-read-total = 228744.0;
> $>get -b 
> kafka.connect:type=sink-task-metrics,connector=prom-ru-catalog-company,task=0 
> sink-record-read-total
> #mbean = 
> kafka.connect:type=sink-task-metrics,connector=prom-ru-catalog-company,task=0:
> sink-record-read-total = 228744.0;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-2925) NullPointerException if FileStreamSinkTask is stopped before initialization finishes

2018-02-02 Thread Robert Yokota (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Yokota resolved KAFKA-2925.
--
Resolution: Cannot Reproduce

I wasn't able to reproduce the NPE and by reviewing the code it doesn't seem 
possible any longer.  Closing this as cannot reproduce.

> NullPointerException if FileStreamSinkTask is stopped before initialization 
> finishes
> 
>
> Key: KAFKA-2925
> URL: https://issues.apache.org/jira/browse/KAFKA-2925
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Robert Yokota
>Priority: Minor
>
> If a FileStreamSinkTask is stopped too quickly after a distributed herder 
> rebalances work, it can result in cleanup happening without start() ever 
> being called:
> {quote}
> Sink task org.apache.kafka.connect.runtime.WorkerSinkTask@f9ac651 was stopped 
> before completing join group. Task initialization and start is being skipped 
> (org.apache.kafka.connect.runtime.WorkerSinkTask:150)
> {quote}
> This is actually a bit weird since stop() is still called so resources 
> allocated in the constructor can be cleaned up, but possibly unexpected that 
> stop() will be called without start() ever being called.
> Because the code in FileStreamSinkTask's stop() method assumes start() has 
> been called, it can result in a NullPointerException because it assumes the 
> PrintStream is already initialized.
> The easy fix is to check for nulls before closing. However, we should 
> probably also consider whether the current possibly sequence of events is 
> confusing and if we shoud not invoke stop() and make it clear in the SInkTask 
> interface that you should only initialize stuff in the constructor that won't 
> need any manual cleanup later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Please add me to the contributor list

2018-02-01 Thread Robert Yokota
Hi,

Can someone please add me to the contributor list?   I want to assign
KAFKA-2925 to myself so I can close it out.

Thanks,
Robert Yokota