Re: KIP-28 does not allow Processor to specify partition of output message

2015-10-14 Thread Randall Hauch
Created https://issues.apache.org/jira/browse/KAFKA-2649 and attached a PR with 
the proposed change.

Thanks!


On October 14, 2015 at 3:12:34 AM, Guozhang Wang (wangg...@gmail.com) wrote:

Thanks!

On Tue, Oct 13, 2015 at 9:34 PM, Randall Hauch <rha...@gmail.com> wrote:
Ok, cool. I agree we want something simple.  I'll create an issue and create a 
pull request with a proposal. Look for it tomorrow. 

On Oct 13, 2015, at 10:25 PM, Guozhang Wang <wangg...@gmail.com> wrote:

I see your point. Yeah I think it is a good way to add a Partitioner into 
addSink(...) but the Partitioner interface in producer is a bit overkill:

"partition(String topic, Object key, byte[] keyBytes, Object value, byte[] 
valueBytes, Cluster cluster)"

whereas for us we only want to partition on (K key, V value).

Perhaps we should add a new Partitioner interface in Kafka Streams?

Guozhang

On Tue, Oct 13, 2015 at 6:38 PM, Randall Hauch <rha...@gmail.com> wrote:
This overrides the partitioning logic for all topics, right? That means I have 
to explicitly call the default partitioning logic for all topics except those 
that my Producer forwards. I’m guess the best way to do by extending 
org.apache.kafka.clients.producer.DefaultProducer. Of course, with multiple 
sinks in my topology, I have to put all of the partitioning logic inside a 
single class.

What would you think about adding an overloaded TopologyBuilder.addSink(…) 
method that takes a Partitioner (or better yet a smaller functional interface). 
The resulting SinkProcessor could use that Partitioner instance to set the 
partition number? That’d be super convenient for users, would keep the logic 
where it belongs (where the topology defines the sinks), and best of all the 
implementations won't have to worry about any other topics, such as those used 
by stores, metrics, or other sinks.

Best regards,

Randall


On October 13, 2015 at 8:09:41 PM, Guozhang Wang (wangg...@gmail.com) wrote:

Hi Randall,

You can try to set the partitioner class as
ProducerConfig.PARTITIONER_CLASS_CONFIG in the StreamsConfig, its interface
can be found in

org.apache.kafka.clients.producer.Partitioner

Let me know if it works for you.

Guozhang

On Tue, Oct 13, 2015 at 10:59 AM, Randall Hauch <rha...@gmail.com> wrote:

> The new streams API added with KIP-28 is great. I’ve been using it on a
> prototype for a few weeks, and I’m looking forward to it being included in
> 0.9.0. However, at the moment, a Processor implementation is not able to
> specify the partition number when it outputs messages.
>
> I’d be happy to log a JIRA and create a PR to add it to the API, but
> without knowing all of the history I’m wondering if leaving it out of the
> API was intentional.
>
> Thoughts?
>
> Best regards,
>
> Randall Hauch
>



--
-- Guozhang



--
-- Guozhang



--
-- Guozhang


Re: KIP-28 does not allow Processor to specify partition of output message

2015-10-13 Thread Randall Hauch
Ok, cool. I agree we want something simple.  I'll create an issue and create a 
pull request with a proposal. Look for it tomorrow. 

> On Oct 13, 2015, at 10:25 PM, Guozhang Wang <wangg...@gmail.com> wrote:
> 
> I see your point. Yeah I think it is a good way to add a Partitioner into 
> addSink(...) but the Partitioner interface in producer is a bit overkill:
> 
> "partition(String topic, Object key, byte[] keyBytes, Object value, byte[] 
> valueBytes, Cluster cluster)"
> 
> whereas for us we only want to partition on (K key, V value).
> 
> Perhaps we should add a new Partitioner interface in Kafka Streams?
> 
> Guozhang
> 
>> On Tue, Oct 13, 2015 at 6:38 PM, Randall Hauch <rha...@gmail.com> wrote:
>> This overrides the partitioning logic for all topics, right? That means I 
>> have to explicitly call the default partitioning logic for all topics except 
>> those that my Producer forwards. I’m guess the best way to do by extending 
>> org.apache.kafka.clients.producer.DefaultProducer. Of course, with multiple 
>> sinks in my topology, I have to put all of the partitioning logic inside a 
>> single class.
>> 
>> What would you think about adding an overloaded TopologyBuilder.addSink(…) 
>> method that takes a Partitioner (or better yet a smaller functional 
>> interface). The resulting SinkProcessor could use that Partitioner instance 
>> to set the partition number? That’d be super convenient for users, would 
>> keep the logic where it belongs (where the topology defines the sinks), and 
>> best of all the implementations won't have to worry about any other topics, 
>> such as those used by stores, metrics, or other sinks.
>> 
>> Best regards,
>> 
>> Randall
>> 
>> 
>>> On October 13, 2015 at 8:09:41 PM, Guozhang Wang (wangg...@gmail.com) wrote:
>>> 
>>> Hi Randall, 
>>> 
>>> You can try to set the partitioner class as 
>>> ProducerConfig.PARTITIONER_CLASS_CONFIG in the StreamsConfig, its interface 
>>> can be found in 
>>> 
>>> org.apache.kafka.clients.producer.Partitioner 
>>> 
>>> Let me know if it works for you. 
>>> 
>>> Guozhang 
>>> 
>>> On Tue, Oct 13, 2015 at 10:59 AM, Randall Hauch <rha...@gmail.com> wrote: 
>>> 
>>> > The new streams API added with KIP-28 is great. I’ve been using it on a 
>>> > prototype for a few weeks, and I’m looking forward to it being included 
>>> > in 
>>> > 0.9.0. However, at the moment, a Processor implementation is not able to 
>>> > specify the partition number when it outputs messages. 
>>> > 
>>> > I’d be happy to log a JIRA and create a PR to add it to the API, but 
>>> > without knowing all of the history I’m wondering if leaving it out of the 
>>> > API was intentional. 
>>> > 
>>> > Thoughts? 
>>> > 
>>> > Best regards, 
>>> > 
>>> > Randall Hauch 
>>> > 
>>> 
>>> 
>>> 
>>> -- 
>>> -- Guozhang
> 
> 
> 
> -- 
> -- Guozhang


Re: KIP-28 does not allow Processor to specify partition of output message

2015-10-14 Thread Randall Hauch
It absolutely is important that the partitioning logic for a single topic be 
the same across an entire cluster. IOW, if a topology has a single sink, then 
no matter where that topology is run in the cluster, it had better use the same 
partitioning logic. I would argue that when the partitioning logic varies from 
the default logic, it’s far better to encapsulate it within the topology’s 
definition, and adding it to the sink is a very easy way to do this (and very 
natural for the developer using Kafka Streams).

However, centralizing the partitioning logic for all streams is certainly not 
ideal, primarily because different topics will likely need to be partitioned in 
different ways. This is especially true for stateful stream processing, which 
depends on messages with the same key going to the same processor instance that 
owns that keyed data. IOW, the partitioning logic used by a producer is 
strongly informed by how the *downstream stateful consumers* are 
organized/clustered. It gets far more complicated when considering built-in 
topics used by offset management, state storage, and metrics. 

The bottom line is that *different* topics will likely need to be partitioned 
differently.

On October 14, 2015 at 12:57:37 PM, Yasuhiro Matsuda 
(yasuhiro.mats...@gmail.com) wrote:

A partitioning scheme should be a cluster wide thing. Letting each sink  
have a different partitioning scheme does not make sense to me. A  
partitioning scheme is not specific to a stream job, each task or a sink. I  
think specifying it at sink level is more error prone.  

If a user wants to customize a partitioning scheme, he/she also want to  
manage it at some central place, maybe a code repo, or a jar file. All  
application must use the same logic, otherwise data will be messed up.  
Thus, a single class representing all partitioning logic is not a bad thing  
at all. (The code organization wise, all logic does not necessarily in the  
single class, of course.)  


On Wed, Oct 14, 2015 at 8:47 AM, Randall Hauch <rha...@gmail.com> wrote:  

> Created https://issues.apache.org/jira/browse/KAFKA-2649 and attached a  
> PR with the proposed change.  
>  
> Thanks!  
>  
>  
> On October 14, 2015 at 3:12:34 AM, Guozhang Wang (wangg...@gmail.com)  
> wrote:  
>  
> Thanks!  
>  
> On Tue, Oct 13, 2015 at 9:34 PM, Randall Hauch <rha...@gmail.com> wrote:  
> Ok, cool. I agree we want something simple. I'll create an issue and  
> create a pull request with a proposal. Look for it tomorrow.  
>  
> On Oct 13, 2015, at 10:25 PM, Guozhang Wang <wangg...@gmail.com> wrote:  
>  
> I see your point. Yeah I think it is a good way to add a Partitioner into  
> addSink(...) but the Partitioner interface in producer is a bit overkill:  
>  
> "partition(String topic, Object key, byte[] keyBytes, Object value, byte[]  
> valueBytes, Cluster cluster)"  
>  
> whereas for us we only want to partition on (K key, V value).  
>  
> Perhaps we should add a new Partitioner interface in Kafka Streams?  
>  
> Guozhang  
>  
> On Tue, Oct 13, 2015 at 6:38 PM, Randall Hauch <rha...@gmail.com> wrote:  
> This overrides the partitioning logic for all topics, right? That means I  
> have to explicitly call the default partitioning logic for all topics  
> except those that my Producer forwards. I’m guess the best way to do by  
> extending org.apache.kafka.clients.producer.DefaultProducer. Of course,  
> with multiple sinks in my topology, I have to put all of the partitioning  
> logic inside a single class.  
>  
> What would you think about adding an overloaded TopologyBuilder.addSink(…)  
> method that takes a Partitioner (or better yet a smaller functional  
> interface). The resulting SinkProcessor could use that Partitioner instance  
> to set the partition number? That’d be super convenient for users, would  
> keep the logic where it belongs (where the topology defines the sinks), and  
> best of all the implementations won't have to worry about any other topics,  
> such as those used by stores, metrics, or other sinks.  
>  
> Best regards,  
>  
> Randall  
>  
>  
> On October 13, 2015 at 8:09:41 PM, Guozhang Wang (wangg...@gmail.com)  
> wrote:  
>  
> Hi Randall,  
>  
> You can try to set the partitioner class as  
> ProducerConfig.PARTITIONER_CLASS_CONFIG in the StreamsConfig, its interface  
> can be found in  
>  
> org.apache.kafka.clients.producer.Partitioner  
>  
> Let me know if it works for you.  
>  
> Guozhang  
>  
> On Tue, Oct 13, 2015 at 10:59 AM, Randall Hauch <rha...@gmail.com> wrote:  
>  
> > The new streams API added with KIP-28 is great. I’ve been using it on a  
> > prototype for a few weeks, and I’m looking forward to it being included  
> in  
> &g

KIP-28 does not allow Processor to specify partition of output message

2015-10-13 Thread Randall Hauch
The new streams API added with KIP-28 is great. I’ve been using it on a
prototype for a few weeks, and I’m looking forward to it being included in
0.9.0. However, at the moment, a Processor implementation is not able to
specify the partition number when it outputs messages.

I’d be happy to log a JIRA and create a PR to add it to the API, but
without knowing all of the history I’m wondering if leaving it out of the
API was intentional.

Thoughts?

Best regards,

Randall Hauch


Re: KIP-28 does not allow Processor to specify partition of output message

2015-10-13 Thread Randall Hauch
This overrides the partitioning logic for all topics, right? That means I have 
to explicitly call the default partitioning logic for all topics except those 
that my Producer forwards. I’m guess the best way to do by extending 
org.apache.kafka.clients.producer.DefaultProducer. Of course, with multiple 
sinks in my topology, I have to put all of the partitioning logic inside a 
single class.

What would you think about adding an overloaded TopologyBuilder.addSink(…) 
method that takes a Partitioner (or better yet a smaller functional interface). 
The resulting SinkProcessor could use that Partitioner instance to set the 
partition number? That’d be super convenient for users, would keep the logic 
where it belongs (where the topology defines the sinks), and best of all the 
implementations won't have to worry about any other topics, such as those used 
by stores, metrics, or other sinks.

Best regards,

Randall


On October 13, 2015 at 8:09:41 PM, Guozhang Wang (wangg...@gmail.com) wrote:

Hi Randall,  

You can try to set the partitioner class as  
ProducerConfig.PARTITIONER_CLASS_CONFIG in the StreamsConfig, its interface  
can be found in  

org.apache.kafka.clients.producer.Partitioner  

Let me know if it works for you.  

Guozhang  

On Tue, Oct 13, 2015 at 10:59 AM, Randall Hauch <rha...@gmail.com> wrote:  

> The new streams API added with KIP-28 is great. I’ve been using it on a  
> prototype for a few weeks, and I’m looking forward to it being included in  
> 0.9.0. However, at the moment, a Processor implementation is not able to  
> specify the partition number when it outputs messages.  
>  
> I’d be happy to log a JIRA and create a PR to add it to the API, but  
> without knowing all of the history I’m wondering if leaving it out of the  
> API was intentional.  
>  
> Thoughts?  
>  
> Best regards,  
>  
> Randall Hauch  
>  



--  
-- Guozhang  


Additional Kafka Connect schema logical types?

2016-03-04 Thread Randall Hauch
I’m working on a Kafka Connect connector that reads a MySQL binlog to provide 
near real-time change data capture, and I also plan connectors for other 
DBMSes. The problem is that I’m not able to map all of the MySQL data types — 
or even all of the standard JDBC types — to Kafka Connect Schemas without 
resorting to complex Schemas that radically increase the footprint of messages.

Specifically, I’d like my connectors to be able to use the following “logical” 
types:

- Bits: A set of bits of arbitrary length, corresponding to java.util.BitSet. 
See [1] for code..
- IsoTime: An ISO8601 time that includes the time zone and corresponding to 
Java 8’s java.time.OffsetTime that represents a time with the offset from 
UTC/Greenwich, and that has a well-defined ordering and thus is more suitable 
for persistent storage. See [2] for code..
- IsoTimestamp: An ISO8601 timestamp that includes the time zone and 
corresponding to Java 8’s java.time.OffsetDateTime that represents an instant 
with the offset from UTC/Greenwich, and that has a well-defined ordering and 
thus is more suitable for persistent storage. See [3] for code.

These are very similar to the 4 built-in logical types (Decimal, Date, Time, 
and Timestamp). These logical types are much akin to aliases for a primitive 
type (typically BYTES), and their use within a Schema includes semantics that 
would not be there by just using the corresponding primitive.

Unfortunately, Kafka Connect is not currently able to support custom logical 
types. Sure, you can create them, since the JsonConverter (nor any of the other 
Converters) will know how to serialize or deserialize them.

One option is for Kafka Connect to add these, but this is sort of a 
never-ending battle. And, since Kafka is not yet on Java 8, supporting 
OffsetTime and OffsetDateTime would be problematic.

Perhaps a better option is to support custom logical types, where each logical 
type must be based upon a single primitive type and must define a class that 
knows how to serialize and deserialize the logical type from the primitive 
type. The Converters, once modified, could look for the referenced class and 
use its serdes logic as needed.

A couple of points:

1) Any source connector that is producing a record with these logical types 
would obviously have to have the logical type’s class available on the 
classpath. That doesn’t seem a difficult requirement to satisfy.

2) Any consumer or source connector that is consuming records with these values 
needs to be able to work with the logical type’s class to be able to work with 
it. This doesn’t seem too horrible, especially if the logical type class(es) 
are nicely separated into separate JARs. However, if the consumer doesn’t have 
the logical type class, then its local Converter would just deserialize to the 
corresponding primitive value (e.g., byte[], int, long, float, String, etc.) — 
is this sufficient if the consumer or source connector is simply passing the 
value along?

3) There are a couple of ways the logical type’s Schema object could reference 
its class. The 4 built-ins use the convention that the name corresponds to the 
name of the class, though I suspect this is largely just a technique to 
guarantees a unique name. However, at this time there is no interface or base 
class for logical types, so something would have to be changed to allow for 
easy invocation of the serdes methods. An alternative might be to add to 
“Schema” an optional “serdes” field that references the name of the class that 
implements a serdes interface; this is probably cleaner, though it does 
increase the verbosity of the Schema object.


Thoughts?

Randall Hauch

[1] 
https://github.com/debezium/debezium/blob/74c5adcc8d30afaa221bbdbecad3bb6f6febbaa5/debezium-core/src/main/java/io/debezium/data/Bits.java
[2] 
https://github.com/debezium/debezium/blob/74c5adcc8d30afaa221bbdbecad3bb6f6febbaa5/debezium-core/src/main/java/io/debezium/data/IsoTime.java
[3] 
https://github.com/debezium/debezium/blob/74c5adcc8d30afaa221bbdbecad3bb6f6febbaa5/debezium-core/src/main/java/io/debezium/data/IsoTimestamp.java
 




Re: Kafka Connect connector monitoring tasks via offset storage reader

2016-05-24 Thread Randall Hauch



On May 24, 2016 at 4:28:41 PM, Liquan Pei (liquan...@gmail.com) wrote:

Hi Randall, 

This is interesting. Essentially we can track the progress by getting data 
from offset storage. What are the use cases you have in mind that uses the 
offsets of source partitions? I can imagine that comparing the source 
offsets for the new data and already delivered data and make some decisions 
such as task reconfiguration and etc.
Yes, exactly. I can think of two use cases:

Upon startup, using previously committed offsets to identify new vs existing 
source partitions, and using this to more intelligently distribute the source 
partitions across tasks.
Tracking the progress of tasks by reading the committed offsets, and signaling 
for task reconfiguration when task(s) have reached some predetermined point. 
One concern is that offsets read from OffsetStorageReader may be stale and 
we may end up making decisions not on the latest data. In general, I think 
the questions is that what do we want to do if we know that up to this 
source offset the data is delivered in Kafka. 
Would they really be stale during connector startup? Aren’t they accurate in 
this case, enabling the connector to make intelligent decisions about task 
configurations.

However, if the connector and tasks are running then, yes, the only guarantee 
about offsets read from storage is that the offsets were committed, so tasks 
have _at least_ recorded those offsets but may have done more.

From the API perspective, do we want to expose the OffsetStorageReader or 
just add a method to return the source offsets? Note that this is only 
relevant to source connectors, not sure whether makes sense or not to 
create SourceConnectorContext and SinkConnectorContext. 
Yes, this probably is only related to source connectors, and defining 
SourceConnectorContext and SinkConnectorContext may be the appropriate way to 
go forward. Changing the type of the ‘context’ field in the Connector abstract 
class or moving that to the appropriate subclasses would break binary classfile 
compatibility with earlier versions (meaning a newer version of Kafka Connect 
could not use a connector compiled with an older Kafka Connect library). But 
that may not matter, since the `Connector` interface is still annotated with 
“@InterfaceStability.Unstable” in the 0.10.0.0 tag [1] and in the trunk branch 
[2]. In fact, it may be useful to define those ConnectorContext subtypes sooner 
than later to allow binary classfile backward compatibility later on.

Best regards,

Randall

[1] 
https://github.com/apache/kafka/blob/0.10.0/connect/api/src/main/java/org/apache/kafka/connect/connector/Connector.java

[2] 
https://github.com/apache/kafka/blob/trunk/connect/api/src/main/java/org/apache/kafka/connect/connector/Connector.java



Thanks, 
Liquan 

On Tue, May 24, 2016 at 1:31 PM, Randall Hauch <rha...@gmail.com> wrote: 

> I have a need for one of my SourceConnector implementations to configure a 
> bunch of tasks and, when those are all “done”, request a task 
> reconfiguration so that it can run a single task. Think: many tasks to make 
> snapshot of database tables, then when those are completed reconfigure 
> itself so that it then started _one_ task to read the transaction log. 
> 
> Unfortunately, I can’t figure out a way for the connector to “monitor” the 
> progress of its tasks, especially when those tasks are distributed across 
> the cluster. The only way I can think of to get around this is to have my 
> connector start *one* task that performs the snapshot and then starts 
> reading the transaction log. Unfortunately, that means to parallelize the 
> snapshotting work, the task would need to manage its own threads. That’s 
> possible, but undesirable for many reasons, not the least of which is that 
> the work can’t be distributed as multiple tasks amongst the cluster of 
> Kafka Connect workers. 
> 
> On the other hand, a simple enhancement to Kafka Connect would make this 
> very easy: add to the ConnectorContext a method that returned the 
> OffsetStorageReader. The connector could start a thread to periodically 
> poll the offsets for various partitions, and effectively watch the progress 
> of the tasks. Not only that, the connector’s 'taskConfigs(int)’ method 
> could use the OffsetStorageReader to read previously-recorded offsets to 
> more intelligently configure its tasks. This seems very straightforward, 
> backward compatible, and non-intrusive. 
> 
> Is there any interest in this? If so, I can create an issue and work on a 
> pull request. 
> 
> Best regards, 
> 
> Randall Hauch 




-- 
Liquan Pei 
Software Engineer, Confluent Inc 


Kafka Connect connector monitoring tasks via offset storage reader

2016-05-24 Thread Randall Hauch
I have a need for one of my SourceConnector implementations to configure a 
bunch of tasks and, when those are all “done”, request a task reconfiguration 
so that it can run a single task. Think: many tasks to make snapshot of 
database tables, then when those are completed reconfigure itself so that it 
then started _one_ task to read the transaction log.

Unfortunately, I can’t figure out a way for the connector to “monitor” the 
progress of its tasks, especially when those tasks are distributed across the 
cluster. The only way I can think of to get around this is to have my connector 
start *one* task that performs the snapshot and then starts reading the 
transaction log. Unfortunately, that means to parallelize the snapshotting 
work, the task would need to manage its own threads. That’s possible, but 
undesirable for many reasons, not the least of which is that the work can’t be 
distributed as multiple tasks amongst the cluster of Kafka Connect workers.

On the other hand, a simple enhancement to Kafka Connect would make this very 
easy: add to the ConnectorContext a method that returned the 
OffsetStorageReader. The connector could start a thread to periodically poll 
the offsets for various partitions, and effectively watch the progress of the 
tasks. Not only that, the connector’s 'taskConfigs(int)’ method could use the 
OffsetStorageReader to read previously-recorded offsets to more intelligently 
configure its tasks. This seems very straightforward, backward compatible, and 
non-intrusive.

Is there any interest in this? If so, I can create an issue and work on a pull 
request.

Best regards,

Randall Hauch

Re: [DISCUSS] KIP-131 : Add access to OffsetStorageReader from SourceConnector

2017-08-15 Thread Randall Hauch
Sorry it's taken me so long to come back to this.

Have you considered creating a `SourceConnectorContext` interface that
extends `ConnectorContext` and that adds the method to access the offset
storage? This would very closely match the existing `SourceTaskContext`.

`SourceConnector` implementations could always cast the `context` field in
the superclass to `SourceConnectorContext`, but perhaps a slightly better
way to do this is to add the following method to the `SourceConnector`
class:


public SourceConnectorContext context() {
return (SourceConnectorContext)context;
}


Now, `SourceConnector` implementations can either cast themselves or use
this additional method to obtain the correctly cast context.

In fact, it might be good to do this for `SinkConnector` as well, and we
could even add a `context()` method in the `Connector` interface, since
subinterfaces can change the return type to be a subtype of that returned
by the interface:

ConnectorContext context();

One advantage of this approach is that `SourceConnectorContext` and
`SinkConnectorContext` remain interfaces. Another is not adding new method
to `SourceConnector` that implementers may need to learn that they should
not override or implement them. A third is that now we have a
`SourceConnectorContext` and `SinkConnectorContext` to which we can add
more methods if needed, and they are very similar to `SourceTaskContext`
and `SinkTaskContext`.

Thoughts?

On Wed, Apr 5, 2017 at 3:59 PM, Florian Hussonnois 
wrote:

> Hi All,
>
> Is there any feedback regarding that KIP ?
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+Add+access+to+
> OffsetStorageReader+from+SourceConnector
>
> Thanks,
>
> 2017-03-14 22:51 GMT+01:00 Florian Hussonnois :
>
> > Hi Matthias,
> >
> > Sorry I didn't know this page. Ths KIP has been added to it.
> >
> > Thanks,
> >
> > 2017-03-13 21:30 GMT+01:00 Matthias J. Sax :
> >
> >> Can you please add the KIP to this table:
> >>
> >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
> >> Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion
> >>
> >> Thanks,
> >>
> >>  Matthias
> >>
> >>
> >> On 3/7/17 1:24 PM, Florian Hussonnois wrote:
> >> > Hi all,
> >> >
> >> > I've created a new KIP to add access to OffsetStorageReader from
> >> > SourceConnector
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+
> >> Add+access+to+OffsetStorageReader+from+SourceConnector
> >> >
> >> > Thanks.
> >> >
> >>
> >>
> >
> >
> > --
> > Florian HUSSONNOIS
> >
>
>
>
> --
> Florian HUSSONNOIS
>


Re: Kafka Connect and Partitions

2017-04-28 Thread Randall Hauch
The source connector creates SourceRecord object and can set a number of
fields, including the message's key and value, the Kafka topic name and, if
desired, the Kafka topic partition number. If the connector does se the the
topic partition to a non-null value, then that's the partition to which
Kafka Connect will write the message; otherwise, the customer partitioner
(e.g., your custom partitioner) used by the Kafka Connect producer will
choose/compute the partition based purely upon the key and value byte
arrays. Note that if the connector doesn't set the topic partition number
and no special producer partitioner is specified, the default hash-based
Kafka partitioner will be used.

So, the connector can certainly set the topic partition number, and it may
be easier to do it there since the connector actually has the key and
values before they are serialized. But no matter what, the connector is the
only thing that sets the message key in the source record.

BTW, the SourceRecord's "source position" and "source offset" are actually
the connector-defined information about the source and where the connector
has read in that source. Don't confuse these with the topic name or topic
partition number.

Hope that helps.

Randall

On Fri, Apr 28, 2017 at 7:15 AM,  wrote:

> Hi Gwen,
>
> Having added a custom partitioner (via the producer.partitioner.class
> property in worker.properties) that simply randomly selects a partition,
> the data is now written evenly across all the partitions :)
>
> The root of my confusion regarding why the default partitioner writes all
> data to the same partition is that I don't understand how the SourceRecords
> returned in the source task poll() method are used by the partitioner.  The
> data that is passed to the partitioner includes a key Object (which is an
> empty byte array - presumably this is a bad idea!), and a value Object
> (which is a non-empty byte array):
>
> @Override
> public int partition(String topic, Object key, byte[] keyBytes, Object
> value, byte[] valueBytes, Cluster cluster) {
> System.out.println(String.format(
> "### PARTITION key[%s][%s][%d] value[%s][%s][%d]",
> key, key.getClass().getSimpleName(), keyBytes.length,
> value, value.getClass().getSimpleName(),
> valueBytes.length));
>
> =>
> ### PARTITION key[[B@584f599f][byte[]][0] value[[B@73cc0cd8][byte[]][236]
>
> However, I don't understand how the above key and value are derived from
> the SourceRecord attributes which, in my application's case, is as follows:
>
> events.add(new SourceRecord(
> offsetKey(filename),
> offsetValue(++recordIndex),
> topicName,
> Schema.BYTES_SCHEMA,
> line));
> System.out.println(String.format(
> "### PARTITION SourceRecord key[%s] value[%s]
> topic[%s] schema[%s], data[%s]",
> offsetKey(filename), offsetValue(recordIndex),
> topicName, Schema.BYTES_SCHEMA, line));
>
> =>
> ### PARTITION SourceRecord key[{_taskFiles=e:\a\b\c\d.ser}]
> value[{_position=1}] topic[Topic1] schema[Schema{BYTES}], data[{"field1":
> value1,  …, "fieldN": valueN}]
>
> In worker.properties I use the key.converter and value.converter
> properties to apply an Avro converter to the data written to Kafka.  Hence,
> I assume, the byte[]  format of the key and the value.  Though I don't
> understand why the key is empty and this, presumably, is why all data is
> mapped to the same Kafka partition.
>
> Could you explain how the SourceRecord is used to derive the partition key
> please.  Can you see from the above summary why the partition key is null?
> It defeats me :(
>
> Have a good weekend, thanks,
>
> David
>
> -Original Message-
> From: Gwen Shapira [mailto:g...@confluent.io]
> Sent: 27 April 2017 17:44
> To: dev@kafka.apache.org
> Subject: Re: Kafka Connect and Partitions
>
> That's great! So we tracked this down to the source connector not properly
> partitioning data.
>
> Do you set both key and value? It sounds a bit like maybe all your records
> have the exact same key, which means they all get hashed to the same
> partition. Can you check that?
>
> On Thu, Apr 27, 2017 at 3:22 AM,  wrote:
>
> > Hi Gwen,
> >
> > Many thanks for your much appreciated offer to help with this.
> >
> > In answer to your questions:
> > * Are you writing a connector or trying to use an existing one?
> > I'm writing a new source/sink connector pipeline: folderToTopics piped
> > into topicsToFolders.
> > * Is the connector reading from the topic you think you are reading?
> > Yes
> > * Do you actually have 4 tasks? Are they all running? Are there errors?
> > Yes, Yes, No (see log output below)
> > * What happens if you stop the only task doing the work?
> > I'm not 

Re: [DISCUSS] KIP-154 Add Kafka Connect configuration properties for creating internal topics

2017-05-08 Thread Randall Hauch
Yes, that's the approach I'm suggesting and that is mentioned in the KIP. I
also propose that the distributed configuration provided in the examples
set the replication factor to one but include a relevant comment.

On Mon, May 8, 2017 at 11:14 PM, BigData dev <bigdatadev...@gmail.com>
wrote:

> So, when Kafka broker is less than 3, and the user has not set the
> replication configuration it will throw an error to the user, to correct
> the configuration according to his setup? Is this the approach you are
> suggesting here?
>
>
>
> On Mon, May 8, 2017 at 7:13 PM, Randall Hauch <rha...@gmail.com> wrote:
>
> > One of the "Rejected Alternatives" was to do something "smarter" by
> > automatically reducing the replication factor when the cluster size is
> > smaller than the replication factor. However, this is extremely
> > unintuitive, and in rare cases (e.g., during a partial outage) might even
> > result in internal topics being created with too small of a replication
> > factor. And defaulting to 1 is certainly bad for production use cases, so
> > that's not an option, either.
> >
> > While defaulting to 3 and failing if the cluster doesn't have 3 nodes is
> a
> > bit harsher than I'd like, it does appear to be the safer option: an
> error
> > message (with instructions on how to correct) is better than
> inadvertently
> > setting the replication factor too small and not knowing about it until
> it
> > is too late.
> >
> > On Mon, May 8, 2017 at 6:12 PM, BigData dev <bigdatadev...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I liked the KIP, as it will avoid so many errors which user can make
> > during
> > > setup.
> > > I have 1 questions here.
> > > 1. As default replication factor is set to 3, but if Kafka cluster is
> > setup
> > > for one node, then the user needs to override the default configuraion,
> > > till then topics will not be created.
> > > So, this is the behavior we want to give?
> > >
> > > On Mon, May 8, 2017 at 2:25 PM, Konstantine Karantasis <
> > > konstant...@confluent.io> wrote:
> > >
> > > > Thanks a lot for the KIP Randall. This improvement should simplify
> both
> > > > regular deployments and testing!
> > > >
> > > > A minor comment. Maybe it would be nice to add a note about why
> there's
> > > no
> > > > need for the property: config.storage.partitions
> > > > I'm mentioning this for the sake of completeness, in case someone
> > notices
> > > > this slight asymmetry with respect to the newly introduced config
> > > > properties.
> > > >
> > > > This is by no means a blocking comment.
> > > >
> > > > Thanks,
> > > > Konstantine
> > > >
> > > > On Fri, May 5, 2017 at 7:18 PM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > > >
> > > > > Thanks, Gwen.
> > > > >
> > > > > Switching to low-priority is a great idea.
> > > > >
> > > > > The default value for the replication factor configuration is 3,
> > since
> > > > > that makes sense and is safe for production. Using the default
> values
> > > in
> > > > > the example would mean it could only be run against a Kafka cluster
> > > with
> > > > a
> > > > > minimum of 3 nodes. I propose overriding the example's replication
> > > factor
> > > > > configurations to be 1 so that the examples could be run on any
> sized
> > > > > cluster.
> > > > >
> > > > > The rejected alternatives mentions why the implementation doesn't
> try
> > > to
> > > > > be too smart by calculating the replication factor.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Randall
> > > > >
> > > > > > On May 5, 2017, at 8:02 PM, Gwen Shapira <g...@confluent.io>
> > wrote:
> > > > > >
> > > > > > Looks great to me :)
> > > > > >
> > > > > > Just one note - configurations have levels (which reflect in the
> > > docs)
> > > > -
> > > > > I
> > > > > > suggest putting the whole thing as LOW. Most users will never
> need
> > to
> > > > > worry
> > > > > > about these. For same reason I recommend leaving them out of the
>

[DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2017-05-22 Thread Randall Hauch
Hi, all.

We recently added the ability for Kafka Connect to create *internal* topics
using the new AdminClient, but it still would be great if Kafka Connect
could do this for new topics that result from source connector records.
I've outlined an approach to do this in "KIP-158 Kafka Connect should allow
source connectors to set topic-specific settings for new topics".

*https://cwiki.apache.org/confluence/display/KAFKA/KIP-158%3A+Kafka+Connect+should+allow+source+connectors+to+set+topic-specific+settings+for+new+topics
*

Please take a look and provide feedback. Thanks!

Best regards,

Randall


Re: Kafka Connect: To much restarting with a SourceConnector with dynamic set of tasks

2017-05-22 Thread Randall Hauch
You're not doing anything wrong, but I suspect you're requesting task
reconfiguration more frequently than was originally envisioned, which means
that the current implementation is not as optimal for your case.

I'm not sure how much effort is required to implement this new behavior.
The logic for the standalone worker is pretty straightforward, but the
logic for the distributed worker is going to be much more involved. But we
also need to be careful about changing existing behavior, since it's not
hard to imagine connectors that might expect that all tasks be restarted
when there are any changes to the task configurations. If there's any
potential that this is the case, we'd have to be sure to keep the existing
behavior as the default but to somehow enable the new behavior if desired.

One possibility is to add an overloaded requestTaskReconfiguration(boolean
changedOnly) that specifies whether only changed tasks should be
reconfigured. This way the existing requestTaskReconfiguration() method
could be changed to call requestTaskReconfiguration(false), and then the
implementation has to deal with this.

But again, the bigger challenge is to implement this new behavior in the
DistributedHerder. OTOH, perhaps it's not as complicated as I might guess.



On Tue, May 16, 2017 at 4:57 AM, Per Steffensen  wrote:

> Hi
>
> Kafka (Connect) 0.10.2.1
>
> I am writing my own SourceConnector. It will communicate with a remote
> server, and continuously calculate the set of tasks that has to be running.
> Each task also makes a connection to the remote server from which it will
> get its data to forward to Kafka.
>
> When the SourceConnector realizes that the set of tasks has to be
> modified, it makes sure taskConfigs-method will return config for the new
> complete set of tasks (likely including tasks that already existed before,
> probably some new tasks, and maybe some of the existing tasks will no
> longer be included). After that the SourceConnector calls
> context.requestTaskReconfiguration. This results in the current instance
> of my SourceConnector and all existing/running tasks gets stopped, a new
> instance of my SourceConnector gets created and all tasks (those that
> existed before and new ones) are started.
>
> It all works nicely, but because my SourceConnector and my SourceTasks has
> to (re)establish connection and (re)initialize the streaming of data, and
> because my set of tasks changes fairly often, and because it very very
> often contains tasks that were also in the set before the change, I end up
> having lots of stop/start of tasks that really just ought to continue
> running.
>
> Any plans on making this more delta-ish, so that when doing a
> requestTaskReconfiguration
> * Only tasks that were not already in the task-config-set before the
> requestTaskConfiguration are started
> * Only tasks that were in the task-config-set before the
> requestTaskConfiguration, but not in the set after, are stopped
> * Tasks that are in the task-config-set both before and after
> requestTaskConfiguration, are just allowed to keep running, without
> restarting
> * Not so important: Do not create a new instance of the SourceConnector,
> just because it has a changed task-config-set
>
> Or am I doing something wrong in my SourceConnector? Are there a different
> way that I should maintain a dynamic set of tasks?
>
> Thanks!!!
>
> Regards, Per Steffensen
>
>


Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2017-05-23 Thread Randall Hauch
Thanks for the quick feedback, Mathieu. Yes, the first configuration rule
whose regex matches will be applied, and no other rules will be used. I've
updated the KIP to try to make this more clear, but let me know if it's
still not clear.

Best regards,

Randall

On Tue, May 23, 2017 at 10:07 AM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> Hi Randall,
>
> Awesome, very much looking forward to this.
>
> It isn't 100% clear from the KIP how multiple config-based rules would
> be applied; it looks like the first configuration rule whose regex
> matches the topic name will be used, and no other rules will be
> applied.  Is that correct?  (I wasn't sure if it might cascade
> together multiple matching rules...)
>
> Looks great,
>
> Mathieu
>
>
> On Mon, May 22, 2017 at 1:43 PM, Randall Hauch <rha...@gmail.com> wrote:
> > Hi, all.
> >
> > We recently added the ability for Kafka Connect to create *internal*
> topics
> > using the new AdminClient, but it still would be great if Kafka Connect
> > could do this for new topics that result from source connector records.
> > I've outlined an approach to do this in "KIP-158 Kafka Connect should
> allow
> > source connectors to set topic-specific settings for new topics".
> >
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 158%3A+Kafka+Connect+should+allow+source+connectors+to+
> set+topic-specific+settings+for+new+topics
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 158%3A+Kafka+Connect+should+allow+source+connectors+to+
> set+topic-specific+settings+for+new+topics>*
> >
> > Please take a look and provide feedback. Thanks!
> >
> > Best regards,
> >
> > Randall
>


Re: [DISCUSS] KIP-146: Classloading Isolation in Connect

2017-05-01 Thread Randall Hauch
Very nice work, Konstantine. Conflicting dependencies of connectors is
indeed a big issue that makes it hard to manage installed connectors.

I do like Gwen's idea about removing the 'module.isolation.enabled'
property. However, I would have anticipated always using classpath
isolation for *only* those components registered under the module path and
not really for anything else already on the normal classpath. So, people
could continue to place custom connector JARs onto the classpath, though
this would become deprecated in favor of installing custom connector JARs /
modules via the module path. This keeps configuration simple, gives people
time to migrate, but let's people that need classpath isolation get it to
install a variety of connectors each with their dependencies that
potentially conflict with other components.

The challenge is whether there should be a default for 'module.path'.
Ideally there would be so that users know where they can install their
connectors. However, I suspect that this might be difficult to do unless it
can make use of system properties such as "${kafka.home}" so that relative
directories can be specified.

A few other questions/comments:

1) Does the KIP have to specify how are components / modules installed,
discovered, or recognized by Kafka Connect? Or perhaps the KIP needs to
just specify the semantics of the file system module path (e.g., the
directories below those specified in the module path are to be unique and
identify an installed component).

2) Will the module classloader filtering also have to exclude Kafka Connect
dependencies? The only one that I can think of is the SLF4J API, which
can't be loaded from the module's classloader if the connector is to send
its log messages to the same logging system.

3) Rather than specify filtering, would be it a bit more flexible to simply
say that the implementation will need to ensure that Java, Kafka Connect,
and other third party APIs (e.g., SLF4J API) will not be loaded from the
module classloaders? It'd be better to avoid specifying how it will be
done, just in case the implementation needs to evolve or use a different
technique (e.g., load the Java and public Kafka Connect APIs via one
classloader that is reused and that always appears before the module
classloader, while Kafka Connect implementation JARs appear after the
component's classloader.

4) Perhaps to address #2 and #3 above, perhaps the KIP could explicitly
specify the classloader order for a deployed connector. For example,
'java', 'kafka-connect-apis', 'connector-module', 'smt-module-1', ...,
'kafka-connect-impls', where 'connector-module' is the classloader for the
(first) module where the connector is found, 'smt-module-1' is the
classloader for the (first) module where the first SMT class is found (if
specified and found in a separate module), 'smt-module-2' is the
classloader  Might also need to say that the KIP does not specify how
the implementation will pick the module if a specified class if found in
more than one module.

Thoughts?

Randall

On Mon, May 1, 2017 at 6:43 AM, Gwen Shapira  wrote:

> Hi Konstantine,
>
> Thank you so much for driving this! The connector classpath mess is driving
> me nuts (or worse, driving me to use Docker).
>
> I like the proposal for micro-benchmarks to test the context switching
> overhead.
>
> I have a difficult time figuring out the module.isolation.enabled.
> Especially with a default to false. I can't think of a reason that anyone
> will not want classpath isolation. "No! I want my connectors to mess up
> each other's dependencies" said no one ever.
>
> So it looks like this is mostly for upgrade purpose? Because the initial
> upgrade will not have the module.path set and therefore classpath isolation
> will simply not work by default?
>
> In that case, why don't we simply use the existence of non-empty
> module.path as an indicator of whether isolation should work or not? seem
> simpler and intuitive to me.
>
> Thanks!
>
> Gwen
>
>
>
>
>
> On Sat, Apr 29, 2017 at 9:16 AM, Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > * Because of KIP number collision, please disregard my previous KIP
> > announcement and use this thread for discussion instead *
> >
> >
> > Hi everyone,
> >
> > we aim to address dependency conflicts in Kafka Connect soon by applying
> > class loading isolation.
> >
> > Feel free to take a look at KIP-146 here:
> >
> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 146+-+Classloading+Isolation+in+Connect
> >  > 146+-+Classloading+Isolation+in+Connect>*
> >
> > which describes minimal required changes to public interfaces and the
> > general implementation approach.
> >
> > This is a much wanted feature for Kafka Connect. Your feedback is highly
> > appreciated.
> >
> > -Konstantine
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow 

Re: Kafka Connect and Partitions

2017-05-02 Thread Randall Hauch
Hi, David.

Excellent. I'm glad that you've solved the puzzle.
Best regards,

Randall

On Tue, May 2, 2017 at 9:18 AM, <david.frank...@bt.com> wrote:

> Hi Gwen/Randall,
>
> I think I've finally understood, more or less, how partitioning relates to
> SourceRecords.
>
> Because I was using the SourceRecord constructor that doesn't provide
> values for key and key schema, the key is null.  The DefaultPartioner
> appears to map null to a constant value rather than round-robin across all
> of the partitions :(
> SourceRecord(Map<String, ?> sourcePartition, Map<String, ?>
> sourceOffset, String topic, Schema valueSchema, Object value)
>
> Another SourceRecord constructor enables the partition to be specified but
> I'd prefer not to use this as I don't want to couple the non-Kafka source
> side to Kafka by making it aware of topic partitions - this would also
> presumably involve coupling it to the Cluster so that the number of
> partitions in a topic can be determined :(
> SourceRecord(Map<String, ?> sourcePartition, Map<String, ?>
> sourceOffset, String topic, Integer partition, Schema keySchema, Object
> key, Schema valueSchema, Object value)
>
> Instead, if I use the SourceRecord constructor that also takes arguments
> for the key and key schema (making them take the same values as the value
> and value schema in my application), then the custom partitioner /
> producer.partitioner.class property is not required and the data is
> distributed across the partitions :)
> SourceRecord(Map<String, ?> sourcePartition, Map<String, ?>
> sourceOffset, String topic, Integer partition, Schema keySchema, Object
> key, Schema valueSchema, Object value)
>
> Many thanks once again for your guidance.  I think this puzzle is now
> solved :)
> Best wishes,
> David
>
> -Original Message-
> From: Randall Hauch [mailto:rha...@gmail.com]
> Sent: 28 April 2017 16:08
> To: dev@kafka.apache.org
> Subject: Re: Kafka Connect and Partitions
>
> The source connector creates SourceRecord object and can set a number of
> fields, including the message's key and value, the Kafka topic name and, if
> desired, the Kafka topic partition number. If the connector does se the the
> topic partition to a non-null value, then that's the partition to which
> Kafka Connect will write the message; otherwise, the customer partitioner
> (e.g., your custom partitioner) used by the Kafka Connect producer will
> choose/compute the partition based purely upon the key and value byte
> arrays. Note that if the connector doesn't set the topic partition number
> and no special producer partitioner is specified, the default hash-based
> Kafka partitioner will be used.
>
> So, the connector can certainly set the topic partition number, and it may
> be easier to do it there since the connector actually has the key and
> values before they are serialized. But no matter what, the connector is the
> only thing that sets the message key in the source record.
>
> BTW, the SourceRecord's "source position" and "source offset" are actually
> the connector-defined information about the source and where the connector
> has read in that source. Don't confuse these with the topic name or topic
> partition number.
>
> Hope that helps.
>
> Randall
>
> On Fri, Apr 28, 2017 at 7:15 AM, <david.frank...@bt.com> wrote:
>
> > Hi Gwen,
> >
> > Having added a custom partitioner (via the producer.partitioner.class
> > property in worker.properties) that simply randomly selects a partition,
> > the data is now written evenly across all the partitions :)
> >
> > The root of my confusion regarding why the default partitioner writes all
> > data to the same partition is that I don't understand how the
> SourceRecords
> > returned in the source task poll() method are used by the partitioner.
> The
> > data that is passed to the partitioner includes a key Object (which is an
> > empty byte array - presumably this is a bad idea!), and a value Object
> > (which is a non-empty byte array):
> >
> > @Override
> > public int partition(String topic, Object key, byte[] keyBytes,
> Object
> > value, byte[] valueBytes, Cluster cluster) {
> > System.out.println(String.format(
> > "### PARTITION key[%s][%s][%d] value[%s][%s][%d]",
> > key, key.getClass().getSimpleName(), keyBytes.length,
> > value, value.getClass().getSimpleName(),
> > valueBytes.length));
> >
> > =>
> > ### PARTITION key[[B@584f599f][byte[]][0] value[[B@73cc0cd8][byte[]][
> 236]
> >
> > Howe

Please give me permission to create/edit Kafka Improvement Proposals

2017-05-05 Thread Randall Hauch
See subject. I'm a contributor to Kafka Connect. Thanks!

Randall


[VOTE] KIP-154 Add Kafka Connect configuration properties for creating internal topics

2017-05-08 Thread Randall Hauch
Hi, everyone.

Given the simple and non-controversial nature of the KIP, I would like to
start the voting process for KIP-154: Add Kafka Connect configuration
properties for creating internal topics:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-154+Add+Kafka+Connect+configuration+properties+for+creating+internal+topics

The vote will run for a minimum of 72 hours.

Thanks,

Randall


[DISCUSS] KIP-154 Add Kafka Connect configuration properties for creating internal topics

2017-05-05 Thread Randall Hauch
Hi, all.

I've been working on KAFKA-4667 to change the distributed worker of Kafka
Connect to look for the topics used to store connector and task
configurations, offsets, and status, and if those tasks do not exist to
create them using the new AdminClient. To make this as useful as possible
and to minimize the need to still manually create the topics, I propose
adding several new distributed worker configurations to specify the
partitions and replication factor for these topics, and have outlined them
in "KIP-154 Add Kafka Connect configuration properties for creating
internal topics".

https://cwiki.apache.org/confluence/display/KAFKA/KIP-154+Add+Kafka+Connect+configuration+properties+for+creating+internal+topics

Please take a look and provide feedback. Thanks!

Best regards,

Randall


Re: [DISCUSS] KIP-154 Add Kafka Connect configuration properties for creating internal topics

2017-05-05 Thread Randall Hauch
Thanks, Gwen. 

Switching to low-priority is a great idea.

The default value for the replication factor configuration is 3, since that 
makes sense and is safe for production. Using the default values in the example 
would mean it could only be run against a Kafka cluster with a minimum of 3 
nodes. I propose overriding the example's replication factor configurations to 
be 1 so that the examples could be run on any sized cluster.

The rejected alternatives mentions why the implementation doesn't try to be too 
smart by calculating the replication factor.

Best regards, 

Randall

> On May 5, 2017, at 8:02 PM, Gwen Shapira <g...@confluent.io> wrote:
> 
> Looks great to me :)
> 
> Just one note - configurations have levels (which reflect in the docs) - I
> suggest putting the whole thing as LOW. Most users will never need to worry
> about these. For same reason I recommend leaving them out of the example
> config files - we already have issues with users playing with configs
> without understanding what they are doing and not liking the results.
> 
>> On Fri, May 5, 2017 at 3:42 PM, Randall Hauch <rha...@gmail.com> wrote:
>> 
>> Hi, all.
>> 
>> I've been working on KAFKA-4667 to change the distributed worker of Kafka
>> Connect to look for the topics used to store connector and task
>> configurations, offsets, and status, and if those tasks do not exist to
>> create them using the new AdminClient. To make this as useful as possible
>> and to minimize the need to still manually create the topics, I propose
>> adding several new distributed worker configurations to specify the
>> partitions and replication factor for these topics, and have outlined them
>> in "KIP-154 Add Kafka Connect configuration properties for creating
>> internal topics".
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> 154+Add+Kafka+Connect+configuration+properties+for+
>> creating+internal+topics
>> 
>> Please take a look and provide feedback. Thanks!
>> 
>> Best regards,
>> 
>> Randall
>> 
> 
> 
> 
> -- 
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> <http://www.confluent.io/blog>


Re: [DISCUSS] KIP-154 Add Kafka Connect configuration properties for creating internal topics

2017-05-08 Thread Randall Hauch
One of the "Rejected Alternatives" was to do something "smarter" by
automatically reducing the replication factor when the cluster size is
smaller than the replication factor. However, this is extremely
unintuitive, and in rare cases (e.g., during a partial outage) might even
result in internal topics being created with too small of a replication
factor. And defaulting to 1 is certainly bad for production use cases, so
that's not an option, either.

While defaulting to 3 and failing if the cluster doesn't have 3 nodes is a
bit harsher than I'd like, it does appear to be the safer option: an error
message (with instructions on how to correct) is better than inadvertently
setting the replication factor too small and not knowing about it until it
is too late.

On Mon, May 8, 2017 at 6:12 PM, BigData dev <bigdatadev...@gmail.com> wrote:

> Hi,
> I liked the KIP, as it will avoid so many errors which user can make during
> setup.
> I have 1 questions here.
> 1. As default replication factor is set to 3, but if Kafka cluster is setup
> for one node, then the user needs to override the default configuraion,
> till then topics will not be created.
> So, this is the behavior we want to give?
>
> On Mon, May 8, 2017 at 2:25 PM, Konstantine Karantasis <
> konstant...@confluent.io> wrote:
>
> > Thanks a lot for the KIP Randall. This improvement should simplify both
> > regular deployments and testing!
> >
> > A minor comment. Maybe it would be nice to add a note about why there's
> no
> > need for the property: config.storage.partitions
> > I'm mentioning this for the sake of completeness, in case someone notices
> > this slight asymmetry with respect to the newly introduced config
> > properties.
> >
> > This is by no means a blocking comment.
> >
> > Thanks,
> > Konstantine
> >
> > On Fri, May 5, 2017 at 7:18 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> > > Thanks, Gwen.
> > >
> > > Switching to low-priority is a great idea.
> > >
> > > The default value for the replication factor configuration is 3, since
> > > that makes sense and is safe for production. Using the default values
> in
> > > the example would mean it could only be run against a Kafka cluster
> with
> > a
> > > minimum of 3 nodes. I propose overriding the example's replication
> factor
> > > configurations to be 1 so that the examples could be run on any sized
> > > cluster.
> > >
> > > The rejected alternatives mentions why the implementation doesn't try
> to
> > > be too smart by calculating the replication factor.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> > > > On May 5, 2017, at 8:02 PM, Gwen Shapira <g...@confluent.io> wrote:
> > > >
> > > > Looks great to me :)
> > > >
> > > > Just one note - configurations have levels (which reflect in the
> docs)
> > -
> > > I
> > > > suggest putting the whole thing as LOW. Most users will never need to
> > > worry
> > > > about these. For same reason I recommend leaving them out of the
> > example
> > > > config files - we already have issues with users playing with configs
> > > > without understanding what they are doing and not liking the results.
> > > >
> > > >> On Fri, May 5, 2017 at 3:42 PM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > > >>
> > > >> Hi, all.
> > > >>
> > > >> I've been working on KAFKA-4667 to change the distributed worker of
> > > Kafka
> > > >> Connect to look for the topics used to store connector and task
> > > >> configurations, offsets, and status, and if those tasks do not exist
> > to
> > > >> create them using the new AdminClient. To make this as useful as
> > > possible
> > > >> and to minimize the need to still manually create the topics, I
> > propose
> > > >> adding several new distributed worker configurations to specify the
> > > >> partitions and replication factor for these topics, and have
> outlined
> > > them
> > > >> in "KIP-154 Add Kafka Connect configuration properties for creating
> > > >> internal topics".
> > > >>
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> 154+Add+Kafka+Connect+configuration+properties+for+
> > > >> creating+internal+topics
> > > >>
> > > >> Please take a look and provide feedback. Thanks!
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Randall
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > *Gwen Shapira*
> > > > Product Manager | Confluent
> > > > 650.450.2760 | @gwenshap
> > > > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > > > <http://www.confluent.io/blog>
> > >
> >
>


Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-11 Thread Randall Hauch
Thanks, Ewen. Comments inline below.

On Mon, Sep 11, 2017 at 5:46 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Randall,
>
> A couple of questions:
>
> * Some metrics don't seem to have unique names? e.g.
> source-record-produce-rate and source-record-produce-total seem like they
> are duplicated. Looks like maybe just an oversight that the second ones
> should be changed from "produce" to "write".
>

Nice catch. You are correct - should be "write" instead of "produce". I
will correct.


> * I think there's a stray extra character in a couple of
> places: kafka.connect:type=source-task-metrics,name=source-
> record-produce-total,worker=([-.\w]+)l,connector=([-.\w]+),task=([\d]+)
> has an extra char after the worker name.
>

Thanks. Removed in 2 places.


> * Are the produce totals actually useful given rebalancing would cancel
> them out anyway? Doesn't seem like you could do much with them.
>

Yes, the totals would be since the last rebalance. Maybe that isn't that
useful. Might be better to capture the offsets and lag as Roger was
suggestion. Thoughts?


> * Why do transformations get their own metric but not converters? And are
> we concerned at all about the performance impact of getting such fine
> grained info? Getting current time isn't free and we've seen before that we
> ended up w/ accidental performance regressions as we tried to check it too
> frequently to enforce timeouts fine grained in the producer (iirc).
> Batching helps w/ this, but on the consumer side, a max.poll.records=1
> setting could put you in a bad place, especially since transforms might be
> very lightweight (or nothing) and converters are expected to be relatively
> cheap as well.
>

We could remove the read, transform, and put time-based metrics for sink
tasks, and poll, transform, and write time-based metrics. Can/should they
be replaced with anything else?


> * If we include the worker id everywhere and don't have metrics without
> that included, isn't that a pain for users that dump this data into some
> other system? They have to know which worker the connector/task is
> currently on *or* need to do extra work to merge the metrics from across
> machines. Including versions with the worker ID can make sense for
> completeness and accuracy (e.g. technically there are still very slim risks
> of having a task running twice due to zombies), but it seems like bad
> usability for the common case.
>

Part of the reason was also to help identify where each of the metrics came
from, but per the next comment this may not be as useful, either.
So remove the worker ID in all the task and connector metric names? What
about the worker metrics?


> * Is aggregating things like source record rate at the (worker, connector)
> level really useful since you're just going to need to do additional
> aggregation anyway once you've collected metrics across all workers? I'd
> rather add a smaller number of metrics w/ clear use cases than just try to
> be exhaustive and then have to maintain stuff that nobody actually uses.
>

Yes, the connector aggregate metrics are maybe not as useful if you also
have to aggregate them from different workers. Removing them probably also
reduces the risk of them being misinterpretted.


> * You have status for connectors but not for tasks. Any reason why? Seems
> like it'd make sense to expose both, especially since users generally care
> about task status more than connector status (not many connectors actually
> run a monitoring thread.)
>

Ack.


> * Is number of tasks for each connector a useful metric? Not sure whether
> someone would find this useful or not. Probably not for alerts, but might
> be useful to be able to check it via your metrics dashboard.
>

Seems like it might be useful, at least in terms of tracking the number of
tasks over time. Might not be as useful for connectors that have relatively
static tasks, but it would be more interesting/useful for connectors that
create tasks dynamically and periodically request task reconfigurations.


> * Same questions re: granularity of sink tasks/connectors timing and
> whether the connectors need all the roll-ups of individual (worker, task)
> values to (worker, connector) level.
>

I'm fine with taking out the aggregates to keep things simple and prevent
misunderstanding.


> * If we expose the who the worker currently thinks is leader, it might also
> make sense to expose the underlying epoch. Not actually sure if we expose
> that for the consumer today, but it's an indicator of who is properly up to
> date.
>

Ack.


> * Why worker-level offset commit stats? It's not clear to me that these are
> useful without considering the specific connector.
>

So would they make more sense on the tasks? Again, on the 

Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-11 Thread Randall Hauch
Based on Roger and Ewen's feedback, I removed the aggregate metrics as they
would be difficult to make use of without extra work. This simplified
things a great deal, and I took the opportunity to reorganize the groups of
metrics. Also, based upon Ewen's concerns regarding measuring
times/durations, I removed all time-related metrics except for the offset
commits and rebalances, which are infrequent enough to warrant the capture
of percentiles. Roger asked about capturing batch size metrics for source
and sink tasks, and offset lag metrics for sink tasks. Finally, Ewen
pointed out that all count/total metrics are only valid since the most
recent rebalance and are therefore less meaningful, and were removed.

On Mon, Sep 11, 2017 at 6:50 PM, Randall Hauch <rha...@gmail.com> wrote:

> Thanks, Ewen. Comments inline below.
>
> On Mon, Sep 11, 2017 at 5:46 PM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
>> Randall,
>>
>> A couple of questions:
>>
>> * Some metrics don't seem to have unique names? e.g.
>> source-record-produce-rate and source-record-produce-total seem like they
>> are duplicated. Looks like maybe just an oversight that the second ones
>> should be changed from "produce" to "write".
>>
>
> Nice catch. You are correct - should be "write" instead of "produce". I
> will correct.
>
>
>> * I think there's a stray extra character in a couple of
>> places: kafka.connect:type=source-task-metrics,name=source-record-
>> produce-total,worker=([-.\w]+)l,connector=([-.\w]+),task=([\d]+)
>> has an extra char after the worker name.
>>
>
> Thanks. Removed in 2 places.
>
>
>> * Are the produce totals actually useful given rebalancing would cancel
>> them out anyway? Doesn't seem like you could do much with them.
>>
>
> Yes, the totals would be since the last rebalance. Maybe that isn't that
> useful. Might be better to capture the offsets and lag as Roger was
> suggestion. Thoughts?
>
>
>> * Why do transformations get their own metric but not converters? And are
>> we concerned at all about the performance impact of getting such fine
>> grained info? Getting current time isn't free and we've seen before that
>> we
>> ended up w/ accidental performance regressions as we tried to check it too
>> frequently to enforce timeouts fine grained in the producer (iirc).
>> Batching helps w/ this, but on the consumer side, a max.poll.records=1
>> setting could put you in a bad place, especially since transforms might be
>> very lightweight (or nothing) and converters are expected to be relatively
>> cheap as well.
>>
>
> We could remove the read, transform, and put time-based metrics for sink
> tasks, and poll, transform, and write time-based metrics. Can/should they
> be replaced with anything else?
>
>
>> * If we include the worker id everywhere and don't have metrics without
>> that included, isn't that a pain for users that dump this data into some
>> other system? They have to know which worker the connector/task is
>> currently on *or* need to do extra work to merge the metrics from across
>> machines. Including versions with the worker ID can make sense for
>> completeness and accuracy (e.g. technically there are still very slim
>> risks
>> of having a task running twice due to zombies), but it seems like bad
>> usability for the common case.
>>
>
> Part of the reason was also to help identify where each of the metrics
> came from, but per the next comment this may not be as useful, either.
> So remove the worker ID in all the task and connector metric names? What
> about the worker metrics?
>
>
>> * Is aggregating things like source record rate at the (worker, connector)
>> level really useful since you're just going to need to do additional
>> aggregation anyway once you've collected metrics across all workers? I'd
>> rather add a smaller number of metrics w/ clear use cases than just try to
>> be exhaustive and then have to maintain stuff that nobody actually uses.
>>
>
> Yes, the connector aggregate metrics are maybe not as useful if you also
> have to aggregate them from different workers. Removing them probably also
> reduces the risk of them being misinterpretted.
>
>
>> * You have status for connectors but not for tasks. Any reason why? Seems
>> like it'd make sense to expose both, especially since users generally care
>> about task status more than connector status (not many connectors actually
>> run a monitoring thread.)
>>
>
> Ack.
>
>
>> * Is number of tasks for each connector a useful metric? Not sure whether
>> s

Re: [VOTE] KIP-196: Add metrics to Kafka Connect framework

2017-09-14 Thread Randall Hauch
On the advice of Ismael, I made a few minor changes to a few of the metrics
to adhere to the new pattern of `-rate` and `-total` metric pairs defined
in KIP-187 [1]:

1. Several of the "-rate" metrics were paired with a "-count" metric; the
"-count" metrics were renamed to "-total".
2. Two "-rate" metrics were missing a count total metric, so a "-total"
metric was added.

Please respond if you have any concerns or objections to this minor change.

Best regards,

Randall



[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-187+-+Add+cumulative+count+metric+for+all+Kafka+rate+metrics

On Wed, Sep 13, 2017 at 9:53 PM, Randall Hauch <rha...@gmail.com> wrote:

> The KIP has passed with three binding +1 votes (Gwen, Sriram, Jason) and
> no -1 or +0 votes.
>
> Thanks to everyone for the feedback.
>
> On Tue, Sep 12, 2017 at 2:48 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
>> +1. Thanks for the KIP.
>>
>> On Tue, Sep 12, 2017 at 12:42 PM, Sriram Subramanian <r...@confluent.io>
>> wrote:
>>
>> > +1
>> >
>> > On Tue, Sep 12, 2017 at 12:41 PM, Gwen Shapira <g...@confluent.io>
>> wrote:
>> >
>> > > My +1 remains :)
>> > >
>> > > On Tue, Sep 12, 2017 at 12:31 PM Randall Hauch <rha...@gmail.com>
>> wrote:
>> > >
>> > > > The KIP was modified (most changes due to reorganization of
>> metrics).
>> > > Feel
>> > > > free to re-vote if you dislike the changes.
>> > > >
>> > > > On Mon, Sep 11, 2017 at 8:40 PM, Sriram Subramanian <
>> r...@confluent.io>
>> > > > wrote:
>> > > >
>> > > > > +1
>> > > > >
>> > > > > On Mon, Sep 11, 2017 at 2:56 PM, Gwen Shapira <g...@confluent.io>
>> > > wrote:
>> > > > >
>> > > > > > +1
>> > > > > >
>> > > > > > Thanks for this. Can't wait for more complete monitoring for
>> > Connect.
>> > > > > >
>> > > > > > On Mon, Sep 11, 2017 at 7:40 AM Randall Hauch <rha...@gmail.com
>> >
>> > > > wrote:
>> > > > > >
>> > > > > > > I'd like to start the vote on KIP-196 to add metrics to the
>> Kafka
>> > > > > Connect
>> > > > > > > framework so the worker processes can be measured. Details are
>> > > here:
>> > > > > > >
>> > > > > > >
>> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > > > > > 196%3A+Add+metrics+to+Kafka+Connect+framework
>> > > > > > >
>> > > > > > > Thanks, and best regards.
>> > > > > > >
>> > > > > > > Randall
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


Re: [VOTE] KIP-196: Add metrics to Kafka Connect framework

2017-09-13 Thread Randall Hauch
The KIP has passed with three binding +1 votes (Gwen, Sriram, Jason) and no
-1 or +0 votes.

Thanks to everyone for the feedback.

On Tue, Sep 12, 2017 at 2:48 PM, Jason Gustafson <ja...@confluent.io> wrote:

> +1. Thanks for the KIP.
>
> On Tue, Sep 12, 2017 at 12:42 PM, Sriram Subramanian <r...@confluent.io>
> wrote:
>
> > +1
> >
> > On Tue, Sep 12, 2017 at 12:41 PM, Gwen Shapira <g...@confluent.io>
> wrote:
> >
> > > My +1 remains :)
> > >
> > > On Tue, Sep 12, 2017 at 12:31 PM Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > The KIP was modified (most changes due to reorganization of metrics).
> > > Feel
> > > > free to re-vote if you dislike the changes.
> > > >
> > > > On Mon, Sep 11, 2017 at 8:40 PM, Sriram Subramanian <
> r...@confluent.io>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Mon, Sep 11, 2017 at 2:56 PM, Gwen Shapira <g...@confluent.io>
> > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks for this. Can't wait for more complete monitoring for
> > Connect.
> > > > > >
> > > > > > On Mon, Sep 11, 2017 at 7:40 AM Randall Hauch <rha...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > I'd like to start the vote on KIP-196 to add metrics to the
> Kafka
> > > > > Connect
> > > > > > > framework so the worker processes can be measured. Details are
> > > here:
> > > > > > >
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 196%3A+Add+metrics+to+Kafka+Connect+framework
> > > > > > >
> > > > > > > Thanks, and best regards.
> > > > > > >
> > > > > > > Randall
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2017-10-04 Thread Randall Hauch
Oops. Yes, I meant “replication factor”. 

> On Oct 4, 2017, at 7:18 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> Randall:
> bq. AdminClient currently allows changing the replication factory.
> 
> By 'replication factory' did you mean 'replication factor' ?
> 
> Cheers
> 
>> On Wed, Oct 4, 2017 at 9:58 AM, Randall Hauch <rha...@gmail.com> wrote:
>> 
>> Currently the KIP's scope is only topics that don't yet exist, and we have
>> to cognizant of race conditions between tasks with the same connector. I
>> think it is worthwhile to consider whether the KIP's scope should expand to
>> also address *existing* partitions, though it may not be appropriate to
>> have as much control when changing the topic settings for an existing
>> topic. For example, changing the number of partitions (which the KIP
>> considers a "topic-specific setting" even though technically it is not)
>> shouldn't be done blindly due to the partitioning impacts, and IIRC you
>> can't reduce them (which we could verify before applying). Also, I don't
>> think the AdminClient currently allows changing the replication factory. I
>> think changing the topic configs is less problematic both from what makes
>> sense for connectors to verify/change and from what the AdminClient
>> supports.
>> 
>> Even if we decide that it's not appropriate to change the settings on an
>> existing topic, I do think it's advantageous to at least notify the
>> connector (or task) prior to the first record sent to a given topic so that
>> the connector can fail or issue a warning if it doesn't meet its
>> requirements.
>> 
>> Best regards,
>> 
>> Randall
>> 
>> On Wed, Oct 4, 2017 at 12:52 AM, Stephane Maarek <
>> steph...@simplemachines.com.au> wrote:
>> 
>>> Hi Randall,
>>> 
>>> Thanks for the KIP. I like it
>>> What happens when the target topic is already created but the configs do
>>> not match?
>>> i.e. wrong RF, num partitions, or missing / additional configs? Will you
>>> attempt to apply the necessary changes or throw an error?
>>> 
>>> Thanks!
>>> Stephane
>>> 
>>> 
>>> On 24/5/17, 5:59 am, "Mathieu Fenniak" <mathieu.fenn...@replicon.com>
>>> wrote:
>>> 
>>>Ah, yes, I see you a highlighted part that should've made this clear
>>>to me the first read. :-)  Much clearer now!
>>> 
>>>By the way, enjoyed your Debezium talk in NYC.
>>> 
>>>Looking forward to this Kafka Connect change; it will allow me to
>>>remove a post-deployment tool that I hacked together for the purpose
>>>of ensuring auto-created topics have the right config.
>>> 
>>>Mathieu
>>> 
>>> 
>>>On Tue, May 23, 2017 at 11:38 AM, Randall Hauch <rha...@gmail.com>
>>> wrote:
>>>> Thanks for the quick feedback, Mathieu. Yes, the first
>> configuration
>>> rule
>>>> whose regex matches will be applied, and no other rules will be
>>> used. I've
>>>> updated the KIP to try to make this more clear, but let me know if
>>> it's
>>>> still not clear.
>>>> 
>>>> Best regards,
>>>> 
>>>> Randall
>>>> 
>>>> On Tue, May 23, 2017 at 10:07 AM, Mathieu Fenniak <
>>>> mathieu.fenn...@replicon.com> wrote:
>>>> 
>>>>> Hi Randall,
>>>>> 
>>>>> Awesome, very much looking forward to this.
>>>>> 
>>>>> It isn't 100% clear from the KIP how multiple config-based rules
>>> would
>>>>> be applied; it looks like the first configuration rule whose regex
>>>>> matches the topic name will be used, and no other rules will be
>>>>> applied.  Is that correct?  (I wasn't sure if it might cascade
>>>>> together multiple matching rules...)
>>>>> 
>>>>> Looks great,
>>>>> 
>>>>> Mathieu
>>>>> 
>>>>> 
>>>>> On Mon, May 22, 2017 at 1:43 PM, Randall Hauch <rha...@gmail.com>
>>> wrote:
>>>>>> Hi, all.
>>>>>> 
>>>>>> We recently added the ability for Kafka Connect to create
>>> *internal*
>>>>> topics
>>>>>> using the new AdminClient, but it still would be great if Kafka
>>> Connect
>>>>>> could do this for new topics that result from source connector
>>> records.
>>>>>> I've outlined an approach to do this in "KIP-158 Kafka Connect
>>> should
>>>>> allow
>>>>>> source connectors to set topic-specific settings for new
>> topics".
>>>>>> 
>>>>>> *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 158%3A+Kafka+Connect+should+allow+source+connectors+to+
>>>>> set+topic-specific+settings+for+new+topics
>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 158%3A+Kafka+Connect+should+allow+source+connectors+to+
>>>>> set+topic-specific+settings+for+new+topics>*
>>>>>> 
>>>>>> Please take a look and provide feedback. Thanks!
>>>>>> 
>>>>>> Best regards,
>>>>>> 
>>>>>> Randall
>>>>> 
>>> 
>>> 
>>> 
>>> 
>> 


Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2017-10-04 Thread Randall Hauch
Currently the KIP's scope is only topics that don't yet exist, and we have
to cognizant of race conditions between tasks with the same connector. I
think it is worthwhile to consider whether the KIP's scope should expand to
also address *existing* partitions, though it may not be appropriate to
have as much control when changing the topic settings for an existing
topic. For example, changing the number of partitions (which the KIP
considers a "topic-specific setting" even though technically it is not)
shouldn't be done blindly due to the partitioning impacts, and IIRC you
can't reduce them (which we could verify before applying). Also, I don't
think the AdminClient currently allows changing the replication factory. I
think changing the topic configs is less problematic both from what makes
sense for connectors to verify/change and from what the AdminClient
supports.

Even if we decide that it's not appropriate to change the settings on an
existing topic, I do think it's advantageous to at least notify the
connector (or task) prior to the first record sent to a given topic so that
the connector can fail or issue a warning if it doesn't meet its
requirements.

Best regards,

Randall

On Wed, Oct 4, 2017 at 12:52 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi Randall,
>
> Thanks for the KIP. I like it
> What happens when the target topic is already created but the configs do
> not match?
> i.e. wrong RF, num partitions, or missing / additional configs? Will you
> attempt to apply the necessary changes or throw an error?
>
> Thanks!
> Stephane
>
>
> On 24/5/17, 5:59 am, "Mathieu Fenniak" <mathieu.fenn...@replicon.com>
> wrote:
>
> Ah, yes, I see you a highlighted part that should've made this clear
> to me the first read. :-)  Much clearer now!
>
> By the way, enjoyed your Debezium talk in NYC.
>
> Looking forward to this Kafka Connect change; it will allow me to
> remove a post-deployment tool that I hacked together for the purpose
> of ensuring auto-created topics have the right config.
>
> Mathieu
>
>
> On Tue, May 23, 2017 at 11:38 AM, Randall Hauch <rha...@gmail.com>
> wrote:
> > Thanks for the quick feedback, Mathieu. Yes, the first configuration
> rule
> > whose regex matches will be applied, and no other rules will be
> used. I've
> > updated the KIP to try to make this more clear, but let me know if
> it's
> > still not clear.
> >
> > Best regards,
> >
> > Randall
> >
> > On Tue, May 23, 2017 at 10:07 AM, Mathieu Fenniak <
> > mathieu.fenn...@replicon.com> wrote:
> >
> >> Hi Randall,
> >>
> >> Awesome, very much looking forward to this.
> >>
> >> It isn't 100% clear from the KIP how multiple config-based rules
> would
> >> be applied; it looks like the first configuration rule whose regex
> >> matches the topic name will be used, and no other rules will be
> >> applied.  Is that correct?  (I wasn't sure if it might cascade
> >> together multiple matching rules...)
> >>
> >> Looks great,
> >>
> >> Mathieu
> >>
> >>
> >> On Mon, May 22, 2017 at 1:43 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> >> > Hi, all.
> >> >
> >> > We recently added the ability for Kafka Connect to create
> *internal*
> >> topics
> >> > using the new AdminClient, but it still would be great if Kafka
> Connect
> >> > could do this for new topics that result from source connector
> records.
> >> > I've outlined an approach to do this in "KIP-158 Kafka Connect
> should
> >> allow
> >> > source connectors to set topic-specific settings for new topics".
> >> >
> >> > *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 158%3A+Kafka+Connect+should+allow+source+connectors+to+
> >> set+topic-specific+settings+for+new+topics
> >> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> 158%3A+Kafka+Connect+should+allow+source+connectors+to+
> >> set+topic-specific+settings+for+new+topics>*
> >> >
> >> > Please take a look and provide feedback. Thanks!
> >> >
> >> > Best regards,
> >> >
> >> > Randall
> >>
>
>
>
>


Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2017-10-16 Thread Randall Hauch
The broker's configuration options are "listeners" (plural) and
"listeners.security.protocol.map". I agree that following the pattern set
by the broker is better, so these are really good ideas. However, at this
point I don't see a need for the "listeners.security.procotol.map", which
for the broker must be set if the listener name is not a security protocol.
Can we not simply just allow "HTTP" and "HTTPS" as the names of the
listeners (rather than the broker's "PLAINTEXT", "SSL", etc.)? If so, then
for example "listeners" might be set to "http://myhost:8081,
https://myhost:80;, which seems to work out nicely without needing listener
names other than security protocols.

I also like using the worker's SSL and SASL security configs by default if
"https" is included in the listener, but allowing the overriding of this
via other additional properties. Here, I'm not a big fan of
"listeners.name.https.*" prefix, which I think is pretty verbose, but I
could see "listener.https.*" as a prefix. This allows us to add other
security protocols at some point, if that ever becomes necessary.

+1 for continuing down this road. Nice work.

On Mon, Oct 16, 2017 at 9:51 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> +1 to this proposal.
>
> On Mon, Oct 16, 2017 at 7:49 AM, Jakub Scholz <ja...@scholz.cz> wrote:
>
> > I was having some more thoughts about it. We can simply take over what
> > Kafka broker implements for the listeners:
> > - We can take over the "listener" and "listener.security.protocol.map"
> > options to define multiple REST listeners and the security protocol they
> > should use
> > - The HTTPS interface will by default use the default configuration
> options
> > ("ssl.keystore.localtion" etc.). But if desired, the values can be
> > overridden for given listener (again, as in Kafka broker "listener.name
> > ..ssl.keystore.location")
> >
> > This should address both issues raised. But before I incorporate it into
> > the KIP, I would love to get some feedback if this sounds OK. Please let
> me
> > know what do you think ...
> >
> > Jakub
> >
> > On Sun, Oct 15, 2017 at 12:23 AM, Jakub Scholz <ja...@scholz.cz> wrote:
> >
> > > I agree, adding both HTTP and HTTPS is not complicated. I just didn't
> saw
> > > the use case for it. But I can add it. Would you add just support for a
> > > single HTTP and single HTTPS interface? Or do you see some value even
> in
> > > allowing more than 2 interfaces (for example one HTTP and two HTTPS
> with
> > > different configuration)? It could be done similarly to how the Kafka
> > > broker does it through the "listener" configuration parameter with
> comma
> > > separated list. What do you think?
> > >
> > > As for the "rest" prefix - if we remove it, some of the same
> > configuration
> > > options are already used today as the option for connecting from Kafka
> > > Connect to Kafka broker. So I'm not sure we should mix them. I can
> > > definitely imagine some cases where the client SSL configuration will
> not
> > > be the same as the REST HTTPS configuration. That is why I added the
> > > prefix. If we remove the prefix, how would you deal with this?
> > >
> > > On Fri, Oct 13, 2017 at 6:25 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > >> Also, do we need these properties to be preceded with `rest`? I'd
> argue
> > >> that we're just configuring the worker's SSL information, and that the
> > >> REST
> > >> API would just use that. If we added another non-REST API, we'd want
> to
> > >> use
> > >> the same security configuration.
> > >>
> > >> It's not that complicated in Jetty to support both "http" and "https"
> > >> simultaneously, so IMO we should add that from the beginning.
> > >>
> > >> On Fri, Oct 13, 2017 at 9:34 AM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > >>
> > >> > It'd be useful to specify the default values for the configuration
> > >> > properties.
> > >> >
> > >> > On Tue, Oct 10, 2017 at 2:53 AM, Jakub Scholz <ja...@scholz.cz>
> > wrote:
> > >> >
> > >> >> FYI: Based on Ewen's suggestion from the related JIRA, I added a
> > >> >> clarification to the KIP that it doesn't do anything around
> > >> authorization
> > >> >> /
> > >> >> ACLs. While authorization / ACLs would be for sure valuable
> feature I
> > >> >> would
> > >> >> prefer to leave it for different KIP.
> > >> >>
> > >> >> Jakub
> > >> >>
> > >> >> On Mon, Oct 9, 2017 at 5:25 PM, Jakub Scholz <ja...@scholz.cz>
> > wrote:
> > >> >>
> > >> >> > Hi,
> > >> >> >
> > >> >> > I would like to start a discussion about KIP-208: Add SSL support
> > to
> > >> >> Kafka
> > >> >> > Connect REST interface (https://cwiki.apache.org/
> > >> >> > confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+
> > >> >> > Kafka+Connect+REST+interface).
> > >> >> >
> > >> >> > I think this would be useful feature to improve the security of
> > Kafka
> > >> >> > Connect.
> > >> >> >
> > >> >> > Thanks & Regards
> > >> >> > Jakub
> > >> >> >
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>


Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2017-10-13 Thread Randall Hauch
Also, do we need these properties to be preceded with `rest`? I'd argue
that we're just configuring the worker's SSL information, and that the REST
API would just use that. If we added another non-REST API, we'd want to use
the same security configuration.

It's not that complicated in Jetty to support both "http" and "https"
simultaneously, so IMO we should add that from the beginning.

On Fri, Oct 13, 2017 at 9:34 AM, Randall Hauch <rha...@gmail.com> wrote:

> It'd be useful to specify the default values for the configuration
> properties.
>
> On Tue, Oct 10, 2017 at 2:53 AM, Jakub Scholz <ja...@scholz.cz> wrote:
>
>> FYI: Based on Ewen's suggestion from the related JIRA, I added a
>> clarification to the KIP that it doesn't do anything around authorization
>> /
>> ACLs. While authorization / ACLs would be for sure valuable feature I
>> would
>> prefer to leave it for different KIP.
>>
>> Jakub
>>
>> On Mon, Oct 9, 2017 at 5:25 PM, Jakub Scholz <ja...@scholz.cz> wrote:
>>
>> > Hi,
>> >
>> > I would like to start a discussion about KIP-208: Add SSL support to
>> Kafka
>> > Connect REST interface (https://cwiki.apache.org/
>> > confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+
>> > Kafka+Connect+REST+interface).
>> >
>> > I think this would be useful feature to improve the security of Kafka
>> > Connect.
>> >
>> > Thanks & Regards
>> > Jakub
>> >
>>
>
>


Re: Kafka Connect Source questions

2017-10-13 Thread Randall Hauch
On Thu, Oct 12, 2017 at 1:59 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> Hi,
>
>
>
> I had a look at the Connect Source Worker code and have two questions:
> When a Source Task commits offsets, does it perform compaction /
> optimisation before sending off? E.g.  I read from 1 source partition, and
> I read 1000 messages. Will the offset flush send 1000 messages to the
> offset storage, or just 1 (the last one)?
>

Just the latest. As each message is processed, the offset is put into the
map keyed by the source partition. So multiple records with the
same/equivalent source partition will store only the latest offset. When
the offsets are committed, the contents of the map are written to Kafka.


> I don’t really understand why WorkerSourceTask is trying to flush
> outstanding messages before committing the offsets? (cf
> https://github.com/apache/kafka/blob/trunk/connect/
> runtime/src/main/java/org/apache/kafka/connect/runtime/
> WorkerSourceTask.java#L328 ).
> I would believe that committing the offsets would just commit the offsets
> for the messages we know for sure have been flushed at the moment the
> commit is requested. That would remove one massive timeout from happening
> if the source task pulls a lot of message and the producer is overwhelmed /
> can’t complete the message flush in the 5 seconds of timeout.
>

The offsets are written to a different topic than the records, and that
means they may be on different brokers. If the task does not wait for the
records to be flushed/written to the destination topics, it is possible the
offsets could be written successfully *before* the records have all be
written and acknowledged. If that is the case and the producer fails or
times out, then the task will have written offsets corresponding to records
that haven't yet been actually written. If the task fails, upon restart it
would begin where the offsets specify, and that means you'd have skipped
over some data.

The worker's "offset.flush.timeout.ms" does default to 5 seconds, but if
that's not sufficient then you can increase it.

I think one reason why the offset writer doesn't just flush the offsets for
the records that have been written so far is that the record
acknowledgements may be written out of order relative to what the source
task produced. So even though a source task might produce two records with
offsets A and B, respectively, the producer may actually write the record
with offset B first (e.g., the record with offset A may be retried). If
this were the case and the offsets were flushed before other records were
written, then offset storage would record that offset A is the most
recently written. Upon restart, the connector would re-produce the record
for offset B. The current approach eliminates this, by ensuring that the
most recently generated offset is actually what is written to offset
storage.

Hope this helps.


>
> Thanks a lot for the responses. I may open JIRAs based on the answers of
> the questions, if that would help bring some performance improvements.
>
>
>
> Stephane
>
>


Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2017-10-13 Thread Randall Hauch
It'd be useful to specify the default values for the configuration
properties.

On Tue, Oct 10, 2017 at 2:53 AM, Jakub Scholz  wrote:

> FYI: Based on Ewen's suggestion from the related JIRA, I added a
> clarification to the KIP that it doesn't do anything around authorization /
> ACLs. While authorization / ACLs would be for sure valuable feature I would
> prefer to leave it for different KIP.
>
> Jakub
>
> On Mon, Oct 9, 2017 at 5:25 PM, Jakub Scholz  wrote:
>
> > Hi,
> >
> > I would like to start a discussion about KIP-208: Add SSL support to
> Kafka
> > Connect REST interface (https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-208%3A+Add+SSL+support+to+
> > Kafka+Connect+REST+interface).
> >
> > I think this would be useful feature to improve the security of Kafka
> > Connect.
> >
> > Thanks & Regards
> > Jakub
> >
>


Re: [DISCUSS] KIP-197: Include Connector type in Connector REST API

2017-09-08 Thread Randall Hauch
Hi, Ted.

Thanks for creating this KIP and for working on the implementation. The
proposal looks great for the "/connectors/{name}" endpoint, but there are
several others that we need to consider as well so that the responses are
all consistent. In particular, look at "/connectors/{name}/status" and
"/connectors/{name}/config" (at least) that include the connector
information.

Best regards,

Randall

On Fri, Sep 8, 2017 at 12:00 PM, Ted Yu  wrote:

> Hi,
> Please take a look at:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 197+Connect+REST+API+should+include+the+connector+type+
> when+describing+a+connector
>
> Thanks
>


Re: [DISCUSS] KIP-197: Include Connector type in Connector REST API

2017-09-08 Thread Randall Hauch
Hi, Ted. Thanks for the quick turn around.

I didn't remember that the response to the "/connectors/{name}/config" is
the actual configuration, so we probably shouldn't add "type" there. Sorry
about that.

And in the "/connector/{name}/status" response, I wonder if it would be
better to embed the "type" within the "connector" field, which corresponds
to adding the type to the `ConnectorStateInfo` (just like you did to the
`ConnectorInfo` class). WDYT?

Best regards,

Randall

On Fri, Sep 8, 2017 at 1:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Thanks for the reminder, Randall.
>
> I have modified the KIP to include these two endpoints.
>
> On Fri, Sep 8, 2017 at 11:00 AM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Hi, Ted.
> >
> > Thanks for creating this KIP and for working on the implementation. The
> > proposal looks great for the "/connectors/{name}" endpoint, but there are
> > several others that we need to consider as well so that the responses are
> > all consistent. In particular, look at "/connectors/{name}/status" and
> > "/connectors/{name}/config" (at least) that include the connector
> > information.
> >
> > Best regards,
> >
> > Randall
> >
> > On Fri, Sep 8, 2017 at 12:00 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > Hi,
> > > Please take a look at:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 197+Connect+REST+API+should+include+the+connector+type+
> > > when+describing+a+connector
> > >
> > > Thanks
> > >
> >
>


Re: [DISCUSS] KIP-197: Include Connector type in Connector REST API

2017-09-08 Thread Randall Hauch
One more thing. I wonder if we should use lowercase constants for the types
rather than mixed case: "sink" and "source" (rather than "Sink" and
"Source"). Thoughts?

On Fri, Sep 8, 2017 at 6:02 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Updated the KIP accordingly.
>
> Cheers
>
> On Fri, Sep 8, 2017 at 3:57 PM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Hi, Ted. Thanks for the quick turn around.
> >
> > I didn't remember that the response to the "/connectors/{name}/config" is
> > the actual configuration, so we probably shouldn't add "type" there.
> Sorry
> > about that.
> >
> > And in the "/connector/{name}/status" response, I wonder if it would be
> > better to embed the "type" within the "connector" field, which
> corresponds
> > to adding the type to the `ConnectorStateInfo` (just like you did to the
> > `ConnectorInfo` class). WDYT?
> >
> > Best regards,
> >
> > Randall
> >
> > On Fri, Sep 8, 2017 at 1:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > Thanks for the reminder, Randall.
> > >
> > > I have modified the KIP to include these two endpoints.
> > >
> > > On Fri, Sep 8, 2017 at 11:00 AM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > Hi, Ted.
> > > >
> > > > Thanks for creating this KIP and for working on the implementation.
> The
> > > > proposal looks great for the "/connectors/{name}" endpoint, but there
> > are
> > > > several others that we need to consider as well so that the responses
> > are
> > > > all consistent. In particular, look at "/connectors/{name}/status"
> and
> > > > "/connectors/{name}/config" (at least) that include the connector
> > > > information.
> > > >
> > > > Best regards,
> > > >
> > > > Randall
> > > >
> > > > On Fri, Sep 8, 2017 at 12:00 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > > Please take a look at:
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 197+Connect+REST+API+should+include+the+connector+type+
> > > > > when+describing+a+connector
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-197: Include Connector type in Connector REST API

2017-09-08 Thread Randall Hauch
Looks good to me! Thanks again!

I say go ahead and ask for a vote in a new thread.

Randall

On Fri, Sep 8, 2017 at 6:22 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Lowercase constants are generated by the enum.
>
> Updated KIP again.
>
> On Fri, Sep 8, 2017 at 4:08 PM, Randall Hauch <rha...@gmail.com> wrote:
>
> > One more thing. I wonder if we should use lowercase constants for the
> types
> > rather than mixed case: "sink" and "source" (rather than "Sink" and
> > "Source"). Thoughts?
> >
> > On Fri, Sep 8, 2017 at 6:02 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > Updated the KIP accordingly.
> > >
> > > Cheers
> > >
> > > On Fri, Sep 8, 2017 at 3:57 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > Hi, Ted. Thanks for the quick turn around.
> > > >
> > > > I didn't remember that the response to the
> "/connectors/{name}/config"
> > is
> > > > the actual configuration, so we probably shouldn't add "type" there.
> > > Sorry
> > > > about that.
> > > >
> > > > And in the "/connector/{name}/status" response, I wonder if it would
> be
> > > > better to embed the "type" within the "connector" field, which
> > > corresponds
> > > > to adding the type to the `ConnectorStateInfo` (just like you did to
> > the
> > > > `ConnectorInfo` class). WDYT?
> > > >
> > > > Best regards,
> > > >
> > > > Randall
> > > >
> > > > On Fri, Sep 8, 2017 at 1:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > > >
> > > > > Thanks for the reminder, Randall.
> > > > >
> > > > > I have modified the KIP to include these two endpoints.
> > > > >
> > > > > On Fri, Sep 8, 2017 at 11:00 AM, Randall Hauch <rha...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi, Ted.
> > > > > >
> > > > > > Thanks for creating this KIP and for working on the
> implementation.
> > > The
> > > > > > proposal looks great for the "/connectors/{name}" endpoint, but
> > there
> > > > are
> > > > > > several others that we need to consider as well so that the
> > responses
> > > > are
> > > > > > all consistent. In particular, look at
> "/connectors/{name}/status"
> > > and
> > > > > > "/connectors/{name}/config" (at least) that include the connector
> > > > > > information.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Randall
> > > > > >
> > > > > > On Fri, Sep 8, 2017 at 12:00 PM, Ted Yu <yuzhih...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > > Please take a look at:
> > > > > > >
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > 197+Connect+REST+API+should+include+the+connector+type+
> > > > > > > when+describing+a+connector
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [VOTE] KIP-197: Include Connector type in Connector REST API

2017-09-08 Thread Randall Hauch
+1 (non-binding)

Randall

On Fri, Sep 8, 2017 at 6:32 PM, Ted Yu  wrote:

> Hi,
> Please take a look at the following and cast your vote:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 197+Connect+REST+API+should+include+the+connector+type+
> when+describing+a+connector
>
> Thanks
>


Re: [VOTE] KIP-131 - Add access to OffsetStorageReader from SourceConnector

2017-09-09 Thread Randall Hauch
Gwen,

I've had more time to look into the code. First, the OffsetStorageReader
JavaDoc says: "OffsetStorageReader provides access to the offset storage
used by sources. This can be used by connectors to determine offsets to
start consuming data from. This is most commonly used during initialization
of a task, but can also be used during runtime, e.g. when reconfiguring a
task."

Second, this KIP allows the SourceConnector implementations to access the
OffsetStorageReader, but really the SourceConnector methods are *only*
related to lifecycle changes when using the OffsetStorageReader would be
appropriate per the comment above.

In summary, I don't think there is any concern about the
OffsetStorageReader being used inappropriate by the SourceConnector
implementations.

Randall

On Fri, Sep 8, 2017 at 9:46 AM, Gwen Shapira <g...@confluent.io> wrote:

> Basically, you are saying that the part where the comment says: "Offset
> data should only be read during startup or reconfiguration of a task."
> is incorrect? because the API extension allows reading offset data at any
> point in the lifecycle, right?
>
> On Fri, Sep 8, 2017 at 5:18 AM Florian Hussonnois <fhussonn...@gmail.com>
> wrote:
>
> > Hi Shapira,
> >
> > We only expose the OffsetStorageReader to connector which relies on
> > KafkaOffsetBackingStore. The store continuesly consumes offsets from
> kafka
> > so I think we can't have stale data.
> >
> >
> > Le 8 sept. 2017 06:13, "Randall Hauch" <rha...@gmail.com> a écrit :
> >
> > > The KIP and PR expose the OffsetStorageReader, which is already exposed
> > to
> > > the tasks. The OffsetStorageWriter is part of the implementation, but
> was
> > > not and is not exposed thru the API.
> > >
> > > > On Sep 7, 2017, at 9:04 PM, Gwen Shapira <g...@confluent.io> wrote:
> > > >
> > > > I just re-read the code for the OffsetStorageWriter, and ran into
> this
> > > > comment:
> > > >
> > > > * Note that this only provides write functionality. This is
> > > > intentional to ensure stale data is
> > > > * never read. Offset data should only be read during startup or
> > > > reconfiguration of a task. By
> > > > * always serving those requests by reading the values from the
> backing
> > > > store, we ensure we never
> > > > * accidentally use stale data. (One example of how this can occur: a
> > > > task is processing input
> > > > * partition A, writing offsets; reconfiguration causes partition A to
> > > > be reassigned elsewhere;
> > > > * reconfiguration causes partition A to be reassigned to this node,
> > > > but now the offset data is out
> > > > * of date). Since these offsets are created and managed by the
> > > > connector itself, there's no way
> > > > * for the offset management layer to know which keys are "owned" by
> > > > which tasks at any given
> > > > * time.
> > > >
> > > >
> > > > I can't figure out how the KIP avoids the stale-reads problem
> explained
> > > here.
> > > >
> > > > Can you talk me through it? I'm cancelling my vote since right now
> > > > exposing this interface sounds risky and misleading.
> > > >
> > > >
> > > > Gwen
> > > >
> > > >
> > > >> On Thu, Sep 7, 2017 at 5:04 PM Gwen Shapira <g...@confluent.io>
> > wrote:
> > > >>
> > > >> +1 (binding)
> > > >>
> > > >> Looking forward to see how connector implementations use this in
> > > practice
> > > >> :)
> > > >>
> > > >>> On Thu, Sep 7, 2017 at 3:49 PM Randall Hauch <rha...@gmail.com>
> > wrote:
> > > >>>
> > > >>> I'd like to open the vote for KIP-131:
> > > >>>
> > > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+
> > > Add+access+to+OffsetStorageReader+from+SourceConnector
> > > >>>
> > > >>> Thanks to Florian for submitting the KIP and the implementation,
> and
> > to
> > > >>> everyone else that helped review.
> > > >>>
> > > >>> Best regards,
> > > >>>
> > > >>> Randall
> > > >>>
> > > >>
> > >
> >
>


[VOTE] KIP-199: Add Kafka Connect Offset Tool

2017-09-11 Thread Randall Hauch
I'd like to start the vote on KIP-199 to add a command line tool that will
allow Connect operators to read, modify, and update source connector
offsets. Details are here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-199%3A+Add+Kafka+Connect+offset+tool

Thanks, and best regards.

Randall


[VOTE] KIP-196: Add metrics to Kafka Connect framework

2017-09-11 Thread Randall Hauch
I'd like to start the vote on KIP-196 to add metrics to the Kafka Connect
framework so the worker processes can be measured. Details are here:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework

Thanks, and best regards.

Randall


Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-11 Thread Randall Hauch
Thanks for reviewing. Responses inline below.

On Mon, Sep 11, 2017 at 11:22 AM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Randall,
>
> Thank you for the KIP.  This should improve visibility greatly.  I had a
> few questions/ideas for more metrics.
>
>
>1. What's the relationship between the worker state and the connector
>status?  Does the 'paused' status at the Connector level include the
> time
>that worker is 'rebalancing'?
>

The worker state metric simply reports whether the worker is running or
rebalancing. This state is independent of how many connectors are
deployed/running/paused. During a rebalance, the connectors are being
stopped and restarted but are effectively not running.


>2. Are the "Source Connector" metrics like record rate an aggregation of
>the "Source Task" metrics?
>

Yes.


>   - How much value is there is monitoring at the "Source Connector"
>   level (other than status) if the number of constituent tasks may
> change
>   over time?
>

The task metrics allow you to know whether the tasks are evenly loaded and
each making progress. The aggregate connector metrics tell you how much
work has been performed by all the tasks in that worker. Both are useful
IMO.


>   - I'm imagining that it's most useful to collect metrics at the task
>   level as the task-level metrics should be stable regardless of tasks
>   shifting to different workers
>

Correct, this is where the most value is because it is the most fine
grained.


>   - If so, can we duplicate the Connector Status down at the task level
>  so that all important metrics can be tracked by task?
>

Possibly. The challenge is that the threads running the tasks are blocked
when a connector is paused.


>  3. For the Sink Task metrics
>   - Can we add offset lag and timestamp lag on commit?
>  - After records are flushed/committed
> - what is the diff between the record timestamps and commit
> time (histogram)?  this is a measure of end-to-end pipeline
> latency
> - what is the diff between record offsets and latest offset of
> their partition at commit time (histogram)? this is a
> measure of whether
> this particular task is keeping up
>

Yeah, possibly. Will have to compare with the consumer metrics to see what
we can get.


>  - How about flush error rate?  Assuming the sink connectors are
>   using retries, it would be helpful to know how many errors they're
> seeing
>

We could add a metric to track how many times the framework receives a
retry exception and then retries, but the connectors may also do this on
their own.


>   - Can we tell at the framework level how many records were inserted
>   vs updated vs deleted?
>

No, there's no distinction in the Connect framework.


>   - Batching stats
>  - Histogram of flush batch size
>  - Counts of flush trigger method (time vs max batch size)
>

Should be able to add these.


>
> Cheers,
>
> Roger
>
> On Sun, Sep 10, 2017 at 8:45 AM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Thanks, Gwen.
> >
> > That's a great idea, so I've changed the KIP to add those metrics. I've
> > also made a few other changes:
> >
> >
> >1. The context of all metrics is limited to the activity within the
> >worker. This wasn't clear before, so I changed the motivation and
> metric
> >descriptions to explicitly state this.
> >2. Added the worker ID to all MBean attributes. In addition to
> hopefully
> >making this same scope obvious from within JMX or other metric
> reporting
> >system. This is also similar to how the Kafka producer and consumer
> > metrics
> >include the client ID in their MBean attributes. Hopefully this does
> not
> >negatively impact or complicate how external reporting systems'
> > aggregate
> >metrics from multiple workers.
> >3. Stated explicitly that aggregating metrics across workers was out
> of
> >scope of this KIP.
> >4. Added metrics to report the connector class and version for both
> sink
> >and source connectors.
> >
> > Check this KIP's history for details of these changes.
> >
> > Please let me know if you have any other suggestions. I hope to start the
> > voting soon!
> >
> > Best regards,
> >
> > Randall
> >
> > On Thu, Sep 7, 2017 at 9:35 PM, Gwen Shapira <g...@confluent.io> wrote:
> >
> > > Thanks for the KIP, Randall. Those are badly needed!
> > >
> > > Can we have two metrics 

Re: [DISCUSS] KIP-131 : Add access to OffsetStorageReader from SourceConnector

2017-09-05 Thread Randall Hauch
Thanks for taking into account my suggestions/concerns. I had a few very
minor suggestions on the PR regarding documentation, but overall everything
looks great to me.

I'd encourage anyone else to review
https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+Add+access+to+OffsetStorageReader+from+SourceConnector
and the PR, and to then provide feedback here. If we hear nothing, we'll
submit for a vote.

Best regards,

Randall

On Thu, Aug 24, 2017 at 12:38 PM, Florian Hussonnois <fhussonn...@gmail.com>
wrote:

> Hi Randall,
>
> Thank you for your answer.
>
> I will update the KIP and the PR with your last approach which sounds
> better.
>
> Thanks.
>
> Le 16 août 2017 00:53, "Randall Hauch" <rha...@gmail.com> a écrit :
>
> Sorry it's taken me so long to come back to this.
>
> Have you considered creating a `SourceConnectorContext` interface that
> extends `ConnectorContext` and that adds the method to access the offset
> storage? This would very closely match the existing `SourceTaskContext`.
>
> `SourceConnector` implementations could always cast the `context` field in
> the superclass to `SourceConnectorContext`, but perhaps a slightly better
> way to do this is to add the following method to the `SourceConnector`
> class:
>
>
> public SourceConnectorContext context() {
> return (SourceConnectorContext)context;
> }
>
>
> Now, `SourceConnector` implementations can either cast themselves or use
> this additional method to obtain the correctly cast context.
>
> In fact, it might be good to do this for `SinkConnector` as well, and we
> could even add a `context()` method in the `Connector` interface, since
> subinterfaces can change the return type to be a subtype of that returned
> by the interface:
>
> ConnectorContext context();
>
> One advantage of this approach is that `SourceConnectorContext` and
> `SinkConnectorContext` remain interfaces. Another is not adding new method
> to `SourceConnector` that implementers may need to learn that they should
> not override or implement them. A third is that now we have a
> `SourceConnectorContext` and `SinkConnectorContext` to which we can add
> more methods if needed, and they are very similar to `SourceTaskContext`
> and `SinkTaskContext`.
>
> Thoughts?
>
> On Wed, Apr 5, 2017 at 3:59 PM, Florian Hussonnois <fhussonn...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > Is there any feedback regarding that KIP ?
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 131+-+Add+access+to+
> > OffsetStorageReader+from+SourceConnector
> >
> > Thanks,
> >
> > 2017-03-14 22:51 GMT+01:00 Florian Hussonnois <fhussonn...@gmail.com>:
> >
> > > Hi Matthias,
> > >
> > > Sorry I didn't know this page. Ths KIP has been added to it.
> > >
> > > Thanks,
> > >
> > > 2017-03-13 21:30 GMT+01:00 Matthias J. Sax <matth...@confluent.io>:
> > >
> > >> Can you please add the KIP to this table:
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+
> > >> Improvement+Proposals#KafkaImprovementProposals-KIPsunderdiscussion
> > >>
> > >> Thanks,
> > >>
> > >>  Matthias
> > >>
> > >>
> > >> On 3/7/17 1:24 PM, Florian Hussonnois wrote:
> > >> > Hi all,
> > >> >
> > >> > I've created a new KIP to add access to OffsetStorageReader from
> > >> > SourceConnector
> > >> >
> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+
> > >> Add+access+to+OffsetStorageReader+from+SourceConnector
> > >> >
> > >> > Thanks.
> > >> >
> > >>
> > >>
> > >
> > >
> > > --
> > > Florian HUSSONNOIS
> > >
> >
> >
> >
> > --
> > Florian HUSSONNOIS
> >
>


[VOTE] KIP-131 - Add access to OffsetStorageReader from SourceConnector

2017-09-07 Thread Randall Hauch
I'd like to open the vote for KIP-131:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+Add+access+to+OffsetStorageReader+from+SourceConnector

Thanks to Florian for submitting the KIP and the implementation, and to
everyone else that helped review.

Best regards,

Randall


[DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-07 Thread Randall Hauch
Hi everyone.

I've created a new KIP to add metrics to the Kafka Connect framework:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-196%3A+Add+metrics+to+Kafka+Connect+framework

The KIP approval deadline is looming, so if you're interested in Kafka
Connect metrics please review and provide feedback as soon as possible. I'm
interested not only in whether the metrics are sufficient and appropriate,
but also in whether the MBean naming conventions are okay.

Best regards,

Randall


Re: [VOTE] KIP-131 - Add access to OffsetStorageReader from SourceConnector

2017-09-07 Thread Randall Hauch
The KIP and PR expose the OffsetStorageReader, which is already exposed to the 
tasks. The OffsetStorageWriter is part of the implementation, but was not and 
is not exposed thru the API. 

> On Sep 7, 2017, at 9:04 PM, Gwen Shapira <g...@confluent.io> wrote:
> 
> I just re-read the code for the OffsetStorageWriter, and ran into this
> comment:
> 
> * Note that this only provides write functionality. This is
> intentional to ensure stale data is
> * never read. Offset data should only be read during startup or
> reconfiguration of a task. By
> * always serving those requests by reading the values from the backing
> store, we ensure we never
> * accidentally use stale data. (One example of how this can occur: a
> task is processing input
> * partition A, writing offsets; reconfiguration causes partition A to
> be reassigned elsewhere;
> * reconfiguration causes partition A to be reassigned to this node,
> but now the offset data is out
> * of date). Since these offsets are created and managed by the
> connector itself, there's no way
> * for the offset management layer to know which keys are "owned" by
> which tasks at any given
> * time.
> 
> 
> I can't figure out how the KIP avoids the stale-reads problem explained here.
> 
> Can you talk me through it? I'm cancelling my vote since right now
> exposing this interface sounds risky and misleading.
> 
> 
> Gwen
> 
> 
>> On Thu, Sep 7, 2017 at 5:04 PM Gwen Shapira <g...@confluent.io> wrote:
>> 
>> +1 (binding)
>> 
>> Looking forward to see how connector implementations use this in practice
>> :)
>> 
>>> On Thu, Sep 7, 2017 at 3:49 PM Randall Hauch <rha...@gmail.com> wrote:
>>> 
>>> I'd like to open the vote for KIP-131:
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-131+-+Add+access+to+OffsetStorageReader+from+SourceConnector
>>> 
>>> Thanks to Florian for submitting the KIP and the implementation, and to
>>> everyone else that helped review.
>>> 
>>> Best regards,
>>> 
>>> Randall
>>> 
>> 


Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Randall Hauch
Regarding the existing rebalance metrics under
"kafka.connect:type=connect-coordinator-metrics", I think we should just
plan on reusing them rather than duplicating them.

On Tue, Sep 12, 2017 at 5:06 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Requests are generally substantial batches of data, you are not guaranteed
> that for the processing batches both because source connectors can hand you
> batches of whatever size they want and consumer's max.poll.records can be
> overridden.
>
> Both SMTs and converters are a concern because they can both be relatively
> cheap such that just checking the time in between them could possibly dwarf
> the cost of applying them.
>
> Also, another thought re: rebalance metrics: we are already getting some
> info via AbstractCoordinator and those actually provide a bit more detail
> in some ways (e.g. join & sync vs the entire rebalance). Not sure if we
> want to effectively duplicate some info so it can all be located under
> Connect names or rely on the existing metrics for some of these.
>
> -Ewen
>
> On Tue, Sep 12, 2017 at 2:05 PM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
> > Ewen,
> >
> > I don't know the details of the perf concern.  How is it that the Kafka
> > broker can keep latency stats per request without suffering too much
> > performance?  Maybe SMTs are the only concern b/c they are per-message.
> If
> > so, let's remove those and keep timing info for everything else like
> > flushes, which are batch-based.
> >
> >
> > On Tue, Sep 12, 2017 at 1:32 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > On Tue, Sep 12, 2017 at 10:55 AM, Gwen Shapira <g...@confluent.io>
> > wrote:
> > >
> > > > Ewen, you gave a nice talk at Kafka Summit where you warned about the
> > > > danger of SMTs that slow down the data pipe. If we don't provide the
> > time
> > > > metrics, how will users know when their SMTs are causing performance
> > > > issues?
> > > >
> > >
> > > Metrics aren't the only way to gain insight about performance and
> always
> > > measuring this even when it's not necessarily being used may not make
> > > sense. SMT authors are much better off starting out with a JMH or
> similar
> > > benchmark. What I was referring to in the talk is more about
> > understanding
> > > that the processing for SMTs is entirely synchronous and that means
> > certain
> > > classes of operations will just generally be a bad idea, e.g. anything
> > that
> > > goes out over the network to another service. You don't even really
> need
> > > performance info to determine that that type of transformation will
> cause
> > > problems.
> > >
> > > But my point wasn't that timing info isn't useful. It's that we know
> that
> > > getting timestamps is pretty expensive and we'll already be doing so
> > > elsewhere (e.g. if a source record doesn't include a timestamp). For
> some
> > > use cases such as ByteArrayConverter + no SMTs + lightweight processing
> > > (e.g. just gets handed to a background thread that deals with sending
> the
> > > data), it wouldn't be out of the question that adding 4 or so more
> calls
> > to
> > > get timestamps could become a bottleneck. Since I don't know if it
> would
> > > but we have definitely seen the issue come up before, I would be
> > > conservative in adding the metrics unless we had some numbers showing
> it
> > > doesn't matter or doesn't matter much.
> > >
> > > In general, I don't think metrics that require always-on measurement
> are
> > a
> > > good way to get fine grained performance information. Instrumenting
> > > different phases that imply different types of performance problems can
> > be
> > > helpful (e.g. "processing time" that should be CPU/memory throughput
> > bound
> > > vs. "send time" that, at least for many connectors, is more likely to
> be
> > IO
> > > bound), but if you want finer-grained details, you probably either want
> > > something that can be toggled on/off temporarily or just use a tool
> > that's
> > > really designed for the job, i.e. a profiler like perf.
> > >
> > > -Ewen
> > >
> > >
> > > >
> > > > Gwen
> > > >
> > > > On Mon, Sep 11, 2017 at 7:50 PM Ewen Cheslack-Postava <
> > e...@confluent.io
> > > >
> > > > wrote:
&g

[DISCUSS] KIP-199: Add Kafka Connect offset reset tool

2017-09-10 Thread Randall Hauch
Hi all,

KIP-199 [1] describes a new tool that will allow Connect operators to read,
update, and remove offsets stored by Connect runtime. This capability has
been often asked for by Connect users. The proposal is simple but flexible.
Please review and add feedback.

Best regards,

Randall

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-199%3A+Add+Kafka+Connect+offset+reset+tool


Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-10 Thread Randall Hauch
Thanks, Gwen.

That's a great idea, so I've changed the KIP to add those metrics. I've
also made a few other changes:


   1. The context of all metrics is limited to the activity within the
   worker. This wasn't clear before, so I changed the motivation and metric
   descriptions to explicitly state this.
   2. Added the worker ID to all MBean attributes. In addition to hopefully
   making this same scope obvious from within JMX or other metric reporting
   system. This is also similar to how the Kafka producer and consumer metrics
   include the client ID in their MBean attributes. Hopefully this does not
   negatively impact or complicate how external reporting systems' aggregate
   metrics from multiple workers.
   3. Stated explicitly that aggregating metrics across workers was out of
   scope of this KIP.
   4. Added metrics to report the connector class and version for both sink
   and source connectors.

Check this KIP's history for details of these changes.

Please let me know if you have any other suggestions. I hope to start the
voting soon!

Best regards,

Randall

On Thu, Sep 7, 2017 at 9:35 PM, Gwen Shapira <g...@confluent.io> wrote:

> Thanks for the KIP, Randall. Those are badly needed!
>
> Can we have two metrics with record rate per task? One before SMT and one
> after?
> We can have cases where we read 5000 rows from JDBC but write 5 to Kafka,
> or read 5000 records from Kafka and write 5 due to filtering. I think its
> important to know both numbers.
>
>
> Gwen
>
> On Thu, Sep 7, 2017 at 7:50 PM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Hi everyone.
> >
> > I've created a new KIP to add metrics to the Kafka Connect framework:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 196%3A+Add+metrics+to+Kafka+Connect+framework
> >
> > The KIP approval deadline is looming, so if you're interested in Kafka
> > Connect metrics please review and provide feedback as soon as possible.
> I'm
> > interested not only in whether the metrics are sufficient and
> appropriate,
> > but also in whether the MBean naming conventions are okay.
> >
> > Best regards,
> >
> > Randall
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> <http://www.confluent.io/blog>
>


Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

2017-09-10 Thread Randall Hauch
Thanks for the comments! Specific comments inline below.

Regards,

Randall

On Sun, Sep 10, 2017 at 2:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> bq. connector restart and the next message
>
> The last part of the sentence seems to be incomplete.
>

Fixed.


> bq. command line tool called kafka-connect-source-offset-reset.sh
>
> From the description, the tool does more than resetting (e.g. deleting).
> How about calling it kafka-connect-source-offset-tool.sh
>

None of the other tools or scripts in the Kafka distribution have "tool" in
the name, so how about "kafka-connect-source-offsets.sh"?


>
> bq. bin/kafka-connect-source-offset-reset.sh
> --config=my-worker-config.properties --export
>
> From the table above, export mode is mentioned as required. However, either
> --export or --import is required.
> Better note this in the KIP.
>

Addressed in a few places. Hopefully it is more clear now.


>
> bq. but will remove the offsets for the partition with file "b"
>
> Please move the sample JSON below the above description.
>

Done.


>
> Cheers
>
> On Sun, Sep 10, 2017 at 11:12 AM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Hi all,
> >
> > KIP-199 [1] describes a new tool that will allow Connect operators to
> read,
> > update, and remove offsets stored by Connect runtime. This capability has
> > been often asked for by Connect users. The proposal is simple but
> flexible.
> > Please review and add feedback.
> >
> > Best regards,
> >
> > Randall
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 199%3A+Add+Kafka+Connect+offset+reset+tool
> >
>


Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Randall Hauch
On Tue, Sep 12, 2017 at 10:36 AM, Roger Hoover <roger.hoo...@gmail.com>
wrote:

> Randall/Ewen,
>
> I think the timing info is still useful even if it's measured since the
> last rebalance.  How else do you know where time is being spent?
>

I think Ewen's concern (correct me if I'm wrong) is that measuring
time-based metrics might result in excessive performance degradation,
especially when batch sizes are small.


>
> The use case for seeing the batch size is that you generally have two knobs
> to configure - max batch size and max wait time.  The batch size metrics
> would tell you how full your batches are based on your current linger time
> so you can adjust the config.
>

It does seem that batch sizes are useful, and the KIP includes these
("batch-size-max" and "batch-size-avg").


>
> Cheers,
>
> Roger
>
> On Mon, Sep 11, 2017 at 7:50 PM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > re: questions about additional metrics, I think we'll undoubtedly find
> more
> > that people want in practice, but as I mentioned earlier I think it's
> > better to add the ones we know we need and then fill out the rest as we
> > figure it out. So, e.g., batch size metrics sound like they could be
> > useful, but I'd probably wait until we have a clear use case. It seems
> > likely that it could be useful in diagnosing slow connectors (e.g. the
> > implementation just does something inefficient), but I'm not really sure
> > about that yet.
> >
> > -Ewen
> >
> > On Mon, Sep 11, 2017 at 7:11 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> > > Based on Roger and Ewen's feedback, I removed the aggregate metrics as
> > they
> > > would be difficult to make use of without extra work. This simplified
> > > things a great deal, and I took the opportunity to reorganize the
> groups
> > of
> > > metrics. Also, based upon Ewen's concerns regarding measuring
> > > times/durations, I removed all time-related metrics except for the
> offset
> > > commits and rebalances, which are infrequent enough to warrant the
> > capture
> > > of percentiles. Roger asked about capturing batch size metrics for
> source
> > > and sink tasks, and offset lag metrics for sink tasks. Finally, Ewen
> > > pointed out that all count/total metrics are only valid since the most
> > > recent rebalance and are therefore less meaningful, and were removed.
> > >
> > > On Mon, Sep 11, 2017 at 6:50 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > Thanks, Ewen. Comments inline below.
> > > >
> > > > On Mon, Sep 11, 2017 at 5:46 PM, Ewen Cheslack-Postava <
> > > e...@confluent.io>
> > > > wrote:
> > > >
> > > >> Randall,
> > > >>
> > > >> A couple of questions:
> > > >>
> > > >> * Some metrics don't seem to have unique names? e.g.
> > > >> source-record-produce-rate and source-record-produce-total seem like
> > > they
> > > >> are duplicated. Looks like maybe just an oversight that the second
> > ones
> > > >> should be changed from "produce" to "write".
> > > >>
> > > >
> > > > Nice catch. You are correct - should be "write" instead of
> "produce". I
> > > > will correct.
> > > >
> > > >
> > > >> * I think there's a stray extra character in a couple of
> > > >> places: kafka.connect:type=source-task-metrics,name=source-record-
> > > >> produce-total,worker=([-.\w]+)l,connector=([-.\w]+),task=([\d]+)
> > > >> has an extra char after the worker name.
> > > >>
> > > >
> > > > Thanks. Removed in 2 places.
> > > >
> > > >
> > > >> * Are the produce totals actually useful given rebalancing would
> > cancel
> > > >> them out anyway? Doesn't seem like you could do much with them.
> > > >>
> > > >
> > > > Yes, the totals would be since the last rebalance. Maybe that isn't
> > that
> > > > useful. Might be better to capture the offsets and lag as Roger was
> > > > suggestion. Thoughts?
> > > >
> > > >
> > > >> * Why do transformations get their own metric but not converters?
> And
> > > are
> > > >> we concerned at all about the performance impact of getting such
> fine
> > > >> grained info? G

Re: [VOTE] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Randall Hauch
The KIP was modified (most changes due to reorganization of metrics). Feel
free to re-vote if you dislike the changes.

On Mon, Sep 11, 2017 at 8:40 PM, Sriram Subramanian <r...@confluent.io>
wrote:

> +1
>
> On Mon, Sep 11, 2017 at 2:56 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > +1
> >
> > Thanks for this. Can't wait for more complete monitoring for Connect.
> >
> > On Mon, Sep 11, 2017 at 7:40 AM Randall Hauch <rha...@gmail.com> wrote:
> >
> > > I'd like to start the vote on KIP-196 to add metrics to the Kafka
> Connect
> > > framework so the worker processes can be measured. Details are here:
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 196%3A+Add+metrics+to+Kafka+Connect+framework
> > >
> > > Thanks, and best regards.
> > >
> > > Randall
> > >
> >
>


Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Randall Hauch
Okay, I think I've incorporated all feedback except for Gwen and Roger than
would like to have timing metrics. Given the deadline and Ewen's concern
about degraded performance, I think it's prudent to leave those out of this
KIP and proceed as is.



On Tue, Sep 12, 2017 at 12:48 PM, Randall Hauch <rha...@gmail.com> wrote:

>
>
> On Tue, Sep 12, 2017 at 10:36 AM, Roger Hoover <roger.hoo...@gmail.com>
> wrote:
>
>> Randall/Ewen,
>>
>> I think the timing info is still useful even if it's measured since the
>> last rebalance.  How else do you know where time is being spent?
>>
>
> I think Ewen's concern (correct me if I'm wrong) is that measuring
> time-based metrics might result in excessive performance degradation,
> especially when batch sizes are small.
>
>
>>
>> The use case for seeing the batch size is that you generally have two
>> knobs
>> to configure - max batch size and max wait time.  The batch size metrics
>> would tell you how full your batches are based on your current linger time
>> so you can adjust the config.
>>
>
> It does seem that batch sizes are useful, and the KIP includes these
> ("batch-size-max" and "batch-size-avg").
>
>
>>
>> Cheers,
>>
>> Roger
>>
>> On Mon, Sep 11, 2017 at 7:50 PM, Ewen Cheslack-Postava <e...@confluent.io
>> >
>> wrote:
>>
>> > re: questions about additional metrics, I think we'll undoubtedly find
>> more
>> > that people want in practice, but as I mentioned earlier I think it's
>> > better to add the ones we know we need and then fill out the rest as we
>> > figure it out. So, e.g., batch size metrics sound like they could be
>> > useful, but I'd probably wait until we have a clear use case. It seems
>> > likely that it could be useful in diagnosing slow connectors (e.g. the
>> > implementation just does something inefficient), but I'm not really sure
>> > about that yet.
>> >
>> > -Ewen
>> >
>> > On Mon, Sep 11, 2017 at 7:11 PM, Randall Hauch <rha...@gmail.com>
>> wrote:
>> >
>> > > Based on Roger and Ewen's feedback, I removed the aggregate metrics as
>> > they
>> > > would be difficult to make use of without extra work. This simplified
>> > > things a great deal, and I took the opportunity to reorganize the
>> groups
>> > of
>> > > metrics. Also, based upon Ewen's concerns regarding measuring
>> > > times/durations, I removed all time-related metrics except for the
>> offset
>> > > commits and rebalances, which are infrequent enough to warrant the
>> > capture
>> > > of percentiles. Roger asked about capturing batch size metrics for
>> source
>> > > and sink tasks, and offset lag metrics for sink tasks. Finally, Ewen
>> > > pointed out that all count/total metrics are only valid since the most
>> > > recent rebalance and are therefore less meaningful, and were removed.
>> > >
>> > > On Mon, Sep 11, 2017 at 6:50 PM, Randall Hauch <rha...@gmail.com>
>> wrote:
>> > >
>> > > > Thanks, Ewen. Comments inline below.
>> > > >
>> > > > On Mon, Sep 11, 2017 at 5:46 PM, Ewen Cheslack-Postava <
>> > > e...@confluent.io>
>> > > > wrote:
>> > > >
>> > > >> Randall,
>> > > >>
>> > > >> A couple of questions:
>> > > >>
>> > > >> * Some metrics don't seem to have unique names? e.g.
>> > > >> source-record-produce-rate and source-record-produce-total seem
>> like
>> > > they
>> > > >> are duplicated. Looks like maybe just an oversight that the second
>> > ones
>> > > >> should be changed from "produce" to "write".
>> > > >>
>> > > >
>> > > > Nice catch. You are correct - should be "write" instead of
>> "produce". I
>> > > > will correct.
>> > > >
>> > > >
>> > > >> * I think there's a stray extra character in a couple of
>> > > >> places: kafka.connect:type=source-task-metrics,name=source-record-
>> > > >> produce-total,worker=([-.\w]+)l,connector=([-.\w]+),task=([\d]+)
>> > > >> has an extra char after the worker name.
>> > > >>
>> > > >
>> > > > Thanks. Removed in 2 places.
>> > > >
>

Re: [DISCUSS] KIP-196: Add metrics to Kafka Connect framework

2017-09-12 Thread Randall Hauch
Hi, James. I was mistaken about how the Kafka metrics are converted to
MBeans and attributes. The MBean is constructed from the group and tags,
and the metrics show up as attributes on the MBean. I'll update the KIP to
reflect this.

On Tue, Sep 12, 2017 at 1:43 AM, James Cheng <wushuja...@gmail.com> wrote:

> Thanks for the KIP, Randall.
>
> The KIP has one MBean per metric name. Can I suggest an alternate grouping?
>
> kafka.connect:type=connector-metrics,connector=([-.\w]+)
> connector-type
> connector-class
> connector-version
> status
>
> kafka.connect:type=task-metrics,connector=([-.\w]+),task=([\d]+)
> status
> pause-ratio
> offset-commit-success-percentage
> offset-commit-failure-percentage
> offset-commit-max-time
> offset-commit-99p-time
> offset-commit-95p-time
> offset-commit-90p-time
> offset-commit-75p-time
> offset-commit-50p-time
> batch-size-max
> batch-size-avg
>
> kafka.connect:type=source-task-metrics,connector=([-.\w]+),task=([\d]+)
> source-record-poll-rate
> source-record-write-rate
>
> kafka.connect:type=sink-task-metrics,connector=([-.\w]+),task=([\d]+)
> sink-record-read-rate
> sink-record-send-rate
> sink-record-lag-max
> partition-count
> offset-commit-95p-time
> offset-commit-90p-time
> offset-commit-75p-time
> offset-commit-50p-time
> batch-size-max
> batch-size-avg
>
> kafka.connect:type=sink-task-metrics,connector=([-.\w]+),
> task=([\d]+),topic=([-.\w]+),partition=([\d]+)
> sink-record-lag
> sink-record-lag-avg
> sink-record-lag-max
>
> kafka.connect:type=connect-coordinator-metrics
> task-count
> connector-count
> leader-name
> state
> rest-request-rate
>
> kafka.connect:type=connect-coordinator-metrics,name=assigned-tasks
> assigned-tasks (existing metric, so can't merge in above without
> breaking compatibility)
> kafka.connect:type=connect-coordinator-metrics,name=assigned-connectors
> (existing metric, so can't merge in above without breaking compatibility)
> assigned-connectors (existing metric, so can't merge in above
> without breaking compatibility)
>
> kafka.connect:type=connect-worker-rebalance-metrics
> rebalance-success-total
> rebalance-success-percentage
> rebalance-failure-total
> rebalance-failure-percentage
> rebalance-max-time
> rebalance-99p-time
> rebalance-95p-time
> rebalance-90p-time
> rebalance-75p-time
> rebalance-50p-time
> time-since-last-rebalance
> task-failure-rate
>
> This lets you use a single MBean selector to select multiple related
> attributes all at once. You can use JMX's wildcards to target which
> connectors or tasks or topics or partitions you care about.
>
> Also notice that for the topic and partition level metrics, the attributes
> are named identically ("sink-record-lag-avg" instead of
> "sink-record-{topic}-{partition}.records-lag-avg"), so monitoring systems
> have a consistent string they can use, instead of needing to
> prefix-and-suffix matching against the attribute name. And TBH, it
> integrates better with the work I'm doing in https://issues.apache.org/
> jira/browse/KAFKA-3480
>
> -James
>
> > On Sep 7, 2017, at 4:50 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> > Hi everyone.
> >
> > I've created a new KIP to add metrics to the Kafka Connect framework:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 196%3A+Add+metrics+to+Kafka+Connect+framework
> >
> > The KIP approval deadline is looming, so if you're interested in Kafka
> > Connect metrics please review and provide feedback as soon as possible.
> I'm
> > interested not only in whether the metrics are sufficient and
> appropriate,
> > but also in whether the MBean naming conventions are okay.
> >
> > Best regards,
> >
> > Randall
>
>


Re: [VOTE] KIP-199: Add Kafka Connect Offset Tool

2017-09-12 Thread Randall Hauch
I'm actually going to withdraw this vote. Recent discussion has made
apparent there are a few issues still to be worked out, and the proposal is
not complete. Apologies.

Best regards,

Randall

On Mon, Sep 11, 2017 at 8:42 PM, Sriram Subramanian <r...@confluent.io>
wrote:

> +1
>
> On Mon, Sep 11, 2017 at 2:56 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > +1
> >
> > On Mon, Sep 11, 2017 at 1:33 PM Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > > +1
> > >
> > > On Mon, Sep 11, 2017 at 7:43 AM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > I'd like to start the vote on KIP-199 to add a command line tool that
> > > will
> > > > allow Connect operators to read, modify, and update source connector
> > > > offsets. Details are here:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 199%3A+Add+Kafka+Connect+offset+tool
> > > >
> > > > Thanks, and best regards.
> > > >
> > > > Randall
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

2017-09-12 Thread Randall Hauch
y easier to use. I considered that at first, but
then I thought that just having the pairs may be cleaner for representing
history. I'm not sure that's true, and the nested approach definitely seems
better. Another reason to pull the vote.


> * > and output only those partition-offset pairs for a named connector:
>  I am unclear whether the output file is compatible as a normal connect
> offsets file, is only for this tool, or something else
>

ack. It's not a normal connect output file. I thought making them the same
would couple this tool too much to the current file storage implementation.


> * mentioned in the first item, but since deleting offsets w/ a null value
> is mentioned at the end of the KIP, I'll reiterate that I think it is
> important to call out when this applies -- if done on a live connector,
> only a rebalance would guarantee it applies, and in the meantime something
> could overwrite. only safe way to apply any of this, including offset
> delete, is to stop the connector, apply operation, then continue.
>

ack.


>
> -Ewen
>
>
> On Sun, Sep 10, 2017 at 3:12 PM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > bq. How about calling it kafka-connect-source-offset-tool.sh
> >
> > This is better.
> > Going over existing .sh files, some have verb in their names while some
> > don't.
> >
> > +1 from me.
> >
> > On Sun, Sep 10, 2017 at 2:53 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> > > Thanks for the comments! Specific comments inline below.
> > >
> > > Regards,
> > >
> > > Randall
> > >
> > > On Sun, Sep 10, 2017 at 2:42 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > bq. connector restart and the next message
> > > >
> > > > The last part of the sentence seems to be incomplete.
> > > >
> > >
> > > Fixed.
> > >
> > >
> > > > bq. command line tool called kafka-connect-source-offset-reset.sh
> > > >
> > > > From the description, the tool does more than resetting (e.g.
> > deleting).
> > > > How about calling it kafka-connect-source-offset-tool.sh
> > > >
> > >
> > > None of the other tools or scripts in the Kafka distribution have
> "tool"
> > in
> > > the name, so how about "kafka-connect-source-offsets.sh"?
> > >
> > >
> > > >
> > > > bq. bin/kafka-connect-source-offset-reset.sh
> > > > --config=my-worker-config.properties --export
> > > >
> > > > From the table above, export mode is mentioned as required. However,
> > > either
> > > > --export or --import is required.
> > > > Better note this in the KIP.
> > > >
> > >
> > > Addressed in a few places. Hopefully it is more clear now.
> > >
> > >
> > > >
> > > > bq. but will remove the offsets for the partition with file "b"
> > > >
> > > > Please move the sample JSON below the above description.
> > > >
> > >
> > > Done.
> > >
> > >
> > > >
> > > > Cheers
> > > >
> > > > On Sun, Sep 10, 2017 at 11:12 AM, Randall Hauch <rha...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > KIP-199 [1] describes a new tool that will allow Connect operators
> to
> > > > read,
> > > > > update, and remove offsets stored by Connect runtime. This
> capability
> > > has
> > > > > been often asked for by Connect users. The proposal is simple but
> > > > flexible.
> > > > > Please review and add feedback.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Randall
> > > > >
> > > > > [1]
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 199%3A+Add+Kafka+Connect+offset+reset+tool
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-10-23 Thread Randall Hauch
Here's the link to KIP-212:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74684586

I do think it's worthwhile to define the rules for connector names.
However, I think it would be better to describe the current restrictions
for names outside of them appearing within URLs. For example, if we can
keep connector names relatively free of constraints but instead define how
names should be encoded when used within URLs (e.g., URL encoding), then we
may not have (m)any backward compatibility issues other than fixing some
bugs related to proper encoding/decoding.

Thoughts?


On Mon, Oct 23, 2017 at 3:44 PM, Sönke Liebau <
soenke.lie...@opencore.com.invalid> wrote:

> All,
>
> I've created a KIP to discuss enforcing of rules on what characters are
> allowed in connector names.
>
> Since this may break api calls that are currently working I figured a KIP
> is the better way to go than to just create a jira.
>
> I'd love to hear your input on this!
>


Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2017-12-14 Thread Randall Hauch
Hi, Michael. Yeah, I liked your PR a lot, and there definitely are a lot of
similarities. But here are the more significant differences from my
perspective (none of which are really that big):

First, your `SubjectConverter` and my `HeaderConverter` are pretty similar
-- mine is just more closely tied to headers. Also, we used slightly
different approaches to dealing with the fact that the `Converter`
interface does not extend `Configurable`, which Connect now uses for
transforms, connectors, etc. And our implementations take very different
approaches (see below).

Second, I tried to follow Kafka client's `Header` and `Headers` interfaces
(at least in concept) so that ConnectRecord has a `Headers` rather than a
list of headers. It's a minor distinction, but I do think it's important
for future-proofing to have an interface for the collection to abstract and
encapsulate logic/behavior as well as leaving room for alternative
implementations. It also a convenient place to add methods for source
connectors and SMTs to easily add/modify/remove/transform headers.

Third, our "header converter" implementations are where most of the
differences lie. Again, this goes back to my assertion that we should make
the serdes and cast/conversion orthogonal. If we allow sink connectors and
SMTs to get header values in the type they want (e.g.,
`Header.valueAsFloat()`), then we can tolerate a bit more variation in how
the header values are serialized and deserialized, since the serdes
mechanism doesn't have to get the type exactly right for the sink connector
and SMT. My `SimpleHeaderConverter` serializes all of the types to strings,
but during deserialization it attempts to infer the schemas (easy for
primitive values, a bit harder for structured types). IIUC, neither your
approach or mine is really able to maintain Struct schemas, but IMO we can
add that over time with improved/different header converters if people
really need it.

Fourth, we use different defaults for the serdes implementation. I dislike
the StringConverter because it converts everything to strings that are then
difficult to convert back to the original form, especially for the
structured types. This is why I created the `SimpleHeaderConverter`
implementation, which doesn't need explicit configuration or explicit
mapping of header names to types, and thus can be used as the default.

Finally, while I hope that `SimpleHeaderConverter` and its schema inference
will work most of the time with no special configuration, especially since
the `Header` interface makes it easy to cast/convert in sink connectors and
SMTs, I do like how your `PrimativeSubjectConverter` allows the user to
manually control how the values are serialized. I thought of doing
something similar, but I think that can be done at a later time if/when
needed.

I hope that makes sense.

Randall

On Tue, Dec 12, 2017 at 11:35 PM, Michael André Pearce <
michael.andre.pea...@me.com> wrote:

> Hi Randall
>
> What’s the main difference between this and my earlier alternative option
> PR
> https://github.com/apache/kafka/pull/2942/files
>
> If none then +1.
> From what I can tell the only difference I make is the headers you support
> being able to cross convert primitive types eg if value after conversion is
> integer you can still ask for float and it will type concert if possible.
>
> Cheers
> Mike
>
>
> Sent from my iPhone
>
> > On 13 Dec 2017, at 01:36, Randall Hauch <rha...@gmail.com> wrote:
> >
> > Trying to revive this after several months of inactivity
> >
> > I've spent quite a bit of time evaluating the current KIP-145 proposal
> and
> > several of the suggested PRs. The original KIP-145 proposal is relatively
> > minimalist (which is very nice), and it adopts Kafka's approach to
> headers
> > where header keys are strings and header values are byte arrays. IMO,
> this
> > places too much responsibility on the connector developers to know how to
> > serialize and deserialize, which means that it's going to be difficult to
> > assemble into pipelines connectors and stream processors that make
> > different, incompatible assumptions. It also makes Connect headers very
> > different than Connect's keys and values, which are generally structured
> > and describable with Connect schemas. I think we need Connect headers to
> do
> > more.
> >
> > The other proposals attempt to do more, but even my first proposal
> doesn't
> > seem to really provide a solution that works for Connect users and
> > connector developers. After looking at this feature from a variety of
> > perspectives over several months, I now assert that Connect must solve
> two
> > orthogonal problems:
> >
> > 1) Serialization: How different data types are (de)serialized as header
> > valu

Re: [DISCUSS] KIP-238: Expose Kafka cluster ID in Connect REST API

2017-12-14 Thread Randall Hauch
Thanks, Ewen. I think the KIP is clear enough about the intent and the
changed behavior.

On Tue, Dec 12, 2017 at 12:22 AM, Ewen Cheslack-Postava 
wrote:

> And to clarify a bit further: the goal is for both standalone and
> distributed mode to display the same basic information. This hasn't
> *strictly* been required before because standalone had no worker-level
> interaction with the cluster (configs stored in memory, offsets on disk,
> and statuses in memory). However, we've always *expected* that a reasonable
> configuration was available for the worker and that any overrides were just
> that -- customizations on top of the existing config. Although it could
> have been *possible* to leave an invalid config for the worker yet provide
> valid configs for producers and consumers, this was never the intent.
>
> Therefore, the argument here is that we *should* be able to rely on a valid
> config to connect to the Kafka cluster, whether in standalone or
> distributed mode. There should always be a valid "fallback" even if
> overrides are provided. We haven't been explicit about this before, but
> unless someone objects, I don't think it is unreasonable.
>
> Happy to update the KIP w/ these details if someone feels they would be
> valuable.
>
> -Ewen
>
> On Mon, Dec 11, 2017 at 8:21 PM, Ewen Cheslack-Postava 
> wrote:
>
> >
> > On Mon, Dec 11, 2017 at 4:01 PM, Gwen Shapira  wrote:
> >
> >> Thanks, Ewen :)
> >>
> >> One thing that wasn't clear to me from the wiki: Will standalone connect
> >> also have a Kafka cluster ID? While it is true that only tasks have
> >> producers and consumers, I think we assumed that all tasks on one
> >> stand-alone will use one Kafka cluster?
> >>
> >
> > Yeah, maybe not clear enough in the KIP, but this is what I was getting
> at
> > -- while I think it's possible to use different clusters for worker,
> > producer, and consumer, I don't think this is really expected or a use
> case
> > worth bending backwards to support perfectly. In standalone mode,
> > technically a value is not required because a default is included and we
> > only utilize the value currently for the producers/consumers in tasks.
> But
> > I don't think it is unreasonable to require a valid setting at the worker
> > level, even if you override the bootstrap.servers for producer and
> consumer.
> >
> >
> >>
> >> Another suggestion is not to block the REST API on the connection, but
> >> rather not return the cluster ID until we know it (return null instead).
> >> So
> >> clients will need to poll rather than block. Not sure this is better,
> but
> >> you didn't really discuss this, so wanted to raise the option.
> >>
> >
> > It's mentioned briefly in https://cwiki.apache.org/
> > confluence/display/KAFKA/KIP-238%3A+Expose+Kafka+cluster+
> > ID+in+Connect+REST+API#KIP-238:ExposeKafkaclusterIDinConnectR
> > ESTAPI-ProposedChanges I think the tradeoff of blocking the server from
> > being "started" until we can at least make one request to the cluster
> isn't
> > unreasonable since if you can't do that, you're not going to be able to
> do
> > any useful work anyway. Anyone who might otherwise be using this endpoint
> > to monitor health (which it is useful for since it doesn't require any
> > other external services to be running just to give a response) can just
> > interpret connection refused or timeouts as an unhealthy state, as they
> > should anyway.
> >
> > -Ewen
> >
> >
> >>
> >> Gwen
> >>
> >>
> >> On Mon, Dec 11, 2017 at 3:42 PM Ewen Cheslack-Postava <
> e...@confluent.io>
> >> wrote:
> >>
> >> > I'd like to start discussion on a simple KIP to expose Kafka cluster
> ID
> >> > info in the Connect REST API:
> >> >
> >> >
> >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-238%
> >> 3A+Expose+Kafka+cluster+ID+in+Connect+REST+API
> >> >
> >> > Hopefully straightforward, though there are some details on how this
> >> > affects startup behavior that might warrant discussion.
> >> >
> >> > -Ewen
> >> >
> >>
> >
> >
>


Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2017-12-12 Thread Randall Hauch
Trying to revive this after several months of inactivity

I've spent quite a bit of time evaluating the current KIP-145 proposal and
several of the suggested PRs. The original KIP-145 proposal is relatively
minimalist (which is very nice), and it adopts Kafka's approach to headers
where header keys are strings and header values are byte arrays. IMO, this
places too much responsibility on the connector developers to know how to
serialize and deserialize, which means that it's going to be difficult to
assemble into pipelines connectors and stream processors that make
different, incompatible assumptions. It also makes Connect headers very
different than Connect's keys and values, which are generally structured
and describable with Connect schemas. I think we need Connect headers to do
more.

The other proposals attempt to do more, but even my first proposal doesn't
seem to really provide a solution that works for Connect users and
connector developers. After looking at this feature from a variety of
perspectives over several months, I now assert that Connect must solve two
orthogonal problems:

1) Serialization: How different data types are (de)serialized as header
values
2) Conversion: How values of one data type are converted to values of
another data type

For the serialization problem, Ewen suggested quite a while back that we
use something akin to `Converter` for header values. Unfortunately we can't
directly reuse `Converters` since the method signatures don't allow us to
supply the header name and the topic name, but we could define a
`HeaderConverter` that is similar to and compatible with `Converter` such
that a single class could implement both. This would align Connector
headers with how message keys and values are handled. Each connector could
define which converter it wants to use; for backward compatibility purposes
we use a header converter by default that serialize values to strings. If
you want something other than this default, you'd have to specify the
header converter options as part of the connector configuration; this
proposal changes the `StringConverter`, `ByteArrayConverter`, and
`JsonConverter` to all implement `HeaderConverter`, so these are all
options. This approach supposes that a connector will serialize all of its
headers in the same way -- with string-like representations by default. I
think this is a safe assumption for the short term, and if we need more
control to (de)serialize named headers differently for the same connector,
we can always implement a different `HeaderConverter` that gives users more
control.

So that would solve the serialization problem. How about connectors and
transforms that are implemented to expect a certain type of header value,
such as an integer or boolean or timestamp? We could solve this problem
(for the most part) by adding methods to the `Header` interface to get the
value in the desired type, and to support all of the sensible conversions
between Connect's primitives and logical types. So, a connector or
transform could always call `header.valueAsObject()` to get the raw
representation from the converter, but a connector or transform could also
get the string representation by calling `header.valueAsString()`, or the
INT64 representation by calling `header.valueAsLong()`, etc. We could even
have converting methods for the built-in logical types (e.g.,
`header.valueAsTimestamp()` to return a java.util.Date value that is
described by Connect's Timestamp logical type). We can convert between most
primitive and logical types (e.g., anything to a STRING, INT32 to FLOAT32,
etc.), but there are a few that don't make sense (e.g., ARRAY to FLOAT32,
INT32 to STRUCT, BYTE_ARRAY to anything, etc.), so these can throw a
`DataException`.

I've refined this approach over the last few months, and have a PR for a
complete prototype that demonstrates these concepts and techniques:
https://github.com/apache/kafka/pull/4319

This PR does *not* update the documentation, though I can add that if we
approve of this approach. And, we probably want to define (at least on the
KIP) some relatively obvious SMTs for copying header values into record
key/value fields, and extracting record key/value fields into header values.

@Michael, would you mind if I edited KIP-145 to reflect this proposal? I
would be happy to keep the existing proposal at the end of the document (or
remove it if you prefer, since it's already in the page history), and we
can revise as we choose a direction.

Comments? Thoughts?

Best regards,

Randall


On Thu, Oct 19, 2017 at 2:10 PM, Michael André Pearce <
michael.andre.pea...@me.com> wrote:

> @rhauch
>
> Here is the previous discussion thread, just reigniting so we can discuss
> against the original kip thread
>
>
> Cheers
>
> Mike
>
> Sent from my iPhone
>
> > On 5 May 2017, at 02:21, Michael Pearce  wrote:
> >
> > Hi Ewen,
> >
> > Did you get a chance to look at the updated sample showing the idea?
> >
> > Did it 

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-16 Thread Randall Hauch
No, we need to keep the KIP since we want to change/correct the existing 
behavior. But we do need to clarify in the KIP these edge cases that will 
change.

Thanks for the continued work on this, Sönke. 

Regards, 

Randall

> On Nov 16, 2017, at 1:56 AM, Sönke Liebau 
> <soenke.lie...@opencore.com.INVALID> wrote:
> 
> Hi Randall,
> 
> zero length definitely works, that's what sent me down this hole in the
> first place. I had a customer accidentally create a connector without a
> name in his environment and then be unable to delete it. No connector name
> doesn't work, as this throws a null pointer exception due to KAFKA-4938 ,
> but once that is fixed would create a connector named "null" I think. Have
> not retested this, but seen it in the past.
> 
> Also, it is possible to create connectors with trailing and leading
> whitespaces, this errors out on the create request (which will be fixed
> when KAFKA-4827 is merged), but correctly creates the connector and you can
> access it if you percent-escape the curl call. This for me is the main
> reason why a KIP might be needed, as we are changing public facing behavior
> here. I agree with you, that this will probably not affect anyone or hardly
> anyone, but in principle it is a change that should need a KIP I think.
> 
> I've retested and documented this for Confluent 3.3.0:
> https://gist.github.com/soenkeliebau/9363745cff23560fcc234d9b64ac14c4
> 
> I am of course happy to withdraw the KIP if you think it is unnecessary,
> I've also updated the pull request for KAFKA-4930 to reflect the changes
> stated in the KIP and tested the code with Arjuns pull request for
> KAFKA-4827 to ensure they don't interfere with each other.
> 
> Let me know what you think.
> 
> Kind regards,
> Sönke
> 
> ᐧ
> 
>> On Tue, Nov 14, 2017 at 7:03 PM, Randall Hauch <rha...@gmail.com> wrote:
>> 
>> Thanks for updating the KIP to reflect the current process. However, I
>> still question whether it is necessary to have a KIP - it depends on
>> whether it was possible with prior versions to have connectors with
>> zero-length or blank names. Have you tried both of these cases?
>> 
>> On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
>> soenke.lie...@opencore.com.invalid> wrote:
>> 
>>> Hi Randall,
>>> 
>>> I have set aside some time to work on this next week. The fix itself is
>>> quite simple, but I've yet to write tests to properly catch this, which
>>> turns out to be a bit more complex, as it needs a running restserver
>> which
>>> is mocked in the tests I've looked at so far.
>>> 
>>> Should I withdraw the KIP or update it to reflect the documentation
>> changes
>>> and enforced rules around trimming and zero length connector names? This
>> is
>>> a change to existing behavior, even if it is quite small and probably
>> won't
>>> even be noticed by many people..
>>> 
>>> best regards,
>>> Sönke
>>> 
>>>> On Thu, Nov 9, 2017 at 9:10 PM, Randall Hauch <rha...@gmail.com> wrote:
>>>> 
>>>> Any progress on updating the PR and withdrawing KIP-212?
>>>> 
>>>> On Fri, Oct 27, 2017 at 5:19 PM, Randall Hauch <rha...@gmail.com>
>> wrote:
>>>> 
>>>>> Yes, connector names should not be blank or contain just whitespace.
>> In
>>>>> fact, I might recommend that we trim whitespace at the front and rear
>>> of
>>>>> new connector names and then disallowing any zero-length name.
>> Existing
>>>>> connectors would remain valid, and this would not break backward
>>>>> compatibility. That might require a small kip simply to update the
>>>>> documentation and specify what names are valid.
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> Randall
>>>>> 
>>>>> On Fri, Oct 27, 2017 at 1:08 PM, Colin McCabe <cmcc...@apache.org>
>>>> wrote:
>>>>> 
>>>>>>> On Wed, Oct 25, 2017, at 01:07, Sönke Liebau wrote:
>>>>>>> I've spent some time looking at this and testing various
>> characters
>>>> and
>>>>>>> it
>>>>>>> would appear that Randall's suspicion was spot on. I think we can
>>>>>> support
>>>>>>> a
>>>>>>> fairly large set of characters with very minor changes.
>>>>>>> 
>>>>>>> I was put of by the exceptions that were thrown when creating
>>>

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-16 Thread Randall Hauch
Nice job updating the KIP. The PR (
https://github.com/apache/kafka/pull/2755/files) for the proposed
implementation does prevent names from being empty, and it trims whitespace
from the name only when creating a new connector. However, the KIP's
"Proposed Change" section should probably be very clear about this, and the
migration section should address how a connector that was created with
leading and/or trailing whitespace characters will still be able to be
updated and deleted. I think that decreases the likelihood of this change
negatively impacting existing users. Basically, going forward, the names of
new connectors will be trimmed.

WDYT?

On Thu, Nov 16, 2017 at 9:32 AM, Sönke Liebau <
soenke.lie...@opencore.com.invalid> wrote:

> I've added some more detail to the KIP [1] around current scenarios that
> might break in the future. I actually came up with a second limitation that
> we'd impose on users and also documented this.
>
> Let me know what you think.
>
> Kind regards,
> Sönke
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 212%3A+Enforce+set+of+legal+characters+for+connector+names
>
>
> On Thu, Nov 16, 2017 at 2:59 PM, Sönke Liebau <soenke.lie...@opencore.com>
> wrote:
>
> > Hi Randall,
> >
> > I had mentioned this edge case in the KIP, but will add some further
> > detail to further clarify all changing scenarios post pull request.
> >
> > Kind regards,
> > Sönke
> >
> >
> >
> >
> >
> > On Thu, Nov 16, 2017 at 2:06 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> >> No, we need to keep the KIP since we want to change/correct the existing
> >> behavior. But we do need to clarify in the KIP these edge cases that
> will
> >> change.
> >>
> >> Thanks for the continued work on this, Sönke.
> >>
> >> Regards,
> >>
> >> Randall
> >>
> >> > On Nov 16, 2017, at 1:56 AM, Sönke Liebau <soenke.lie...@opencore.com
> >> .INVALID> wrote:
> >> >
> >> > Hi Randall,
> >> >
> >> > zero length definitely works, that's what sent me down this hole in
> the
> >> > first place. I had a customer accidentally create a connector without
> a
> >> > name in his environment and then be unable to delete it. No connector
> >> name
> >> > doesn't work, as this throws a null pointer exception due to
> KAFKA-4938
> >> ,
> >> > but once that is fixed would create a connector named "null" I think.
> >> Have
> >> > not retested this, but seen it in the past.
> >> >
> >> > Also, it is possible to create connectors with trailing and leading
> >> > whitespaces, this errors out on the create request (which will be
> fixed
> >> > when KAFKA-4827 is merged), but correctly creates the connector and
> you
> >> can
> >> > access it if you percent-escape the curl call. This for me is the main
> >> > reason why a KIP might be needed, as we are changing public facing
> >> behavior
> >> > here. I agree with you, that this will probably not affect anyone or
> >> hardly
> >> > anyone, but in principle it is a change that should need a KIP I
> think.
> >> >
> >> > I've retested and documented this for Confluent 3.3.0:
> >> > https://gist.github.com/soenkeliebau/9363745cff23560fcc234d9b64ac14c4
> >> >
> >> > I am of course happy to withdraw the KIP if you think it is
> unnecessary,
> >> > I've also updated the pull request for KAFKA-4930 to reflect the
> changes
> >> > stated in the KIP and tested the code with Arjuns pull request for
> >> > KAFKA-4827 to ensure they don't interfere with each other.
> >> >
> >> > Let me know what you think.
> >> >
> >> > Kind regards,
> >> > Sönke
> >> >
> >> > ᐧ
> >> >
> >> >> On Tue, Nov 14, 2017 at 7:03 PM, Randall Hauch <rha...@gmail.com>
> >> wrote:
> >> >>
> >> >> Thanks for updating the KIP to reflect the current process. However,
> I
> >> >> still question whether it is necessary to have a KIP - it depends on
> >> >> whether it was possible with prior versions to have connectors with
> >> >> zero-length or blank names. Have you tried both of these cases?
> >> >>
> >> >> On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
> >> >> soenke.lie...@opencore.com.inval

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-14 Thread Randall Hauch
Thanks for updating the KIP to reflect the current process. However, I
still question whether it is necessary to have a KIP - it depends on
whether it was possible with prior versions to have connectors with
zero-length or blank names. Have you tried both of these cases?

On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
soenke.lie...@opencore.com.invalid> wrote:

> Hi Randall,
>
> I have set aside some time to work on this next week. The fix itself is
> quite simple, but I've yet to write tests to properly catch this, which
> turns out to be a bit more complex, as it needs a running restserver which
> is mocked in the tests I've looked at so far.
>
> Should I withdraw the KIP or update it to reflect the documentation changes
> and enforced rules around trimming and zero length connector names? This is
> a change to existing behavior, even if it is quite small and probably won't
> even be noticed by many people..
>
> best regards,
> Sönke
>
> On Thu, Nov 9, 2017 at 9:10 PM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Any progress on updating the PR and withdrawing KIP-212?
> >
> > On Fri, Oct 27, 2017 at 5:19 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> > > Yes, connector names should not be blank or contain just whitespace. In
> > > fact, I might recommend that we trim whitespace at the front and rear
> of
> > > new connector names and then disallowing any zero-length name. Existing
> > > connectors would remain valid, and this would not break backward
> > > compatibility. That might require a small kip simply to update the
> > > documentation and specify what names are valid.
> > >
> > > WDYT?
> > >
> > > Randall
> > >
> > > On Fri, Oct 27, 2017 at 1:08 PM, Colin McCabe <cmcc...@apache.org>
> > wrote:
> > >
> > >> On Wed, Oct 25, 2017, at 01:07, Sönke Liebau wrote:
> > >> > I've spent some time looking at this and testing various characters
> > and
> > >> > it
> > >> > would appear that Randall's suspicion was spot on. I think we can
> > >> support
> > >> > a
> > >> > fairly large set of characters with very minor changes.
> > >> >
> > >> > I was put of by the exceptions that were thrown when creating
> > connectors
> > >> > with certain characters and suspected a larger underlying problem
> when
> > >> in
> > >> > fact the only issue is, that the URL in the rest request used to
> > >> retrieve
> > >> > the response for the create connector request needs to be percent
> > >> encoded
> > >> > [1].
> > >> >
> > >> > I've fixed this and done some local testing which worked out quite
> > >> > nicely,
> > >> > apart from two special cases, I've not been able to find characters
> > that
> > >> > created issues, even space and slash work.
> > >> > The mentioned special cases are:
> > >> >   \  - if the name contains a backslash that is not the beginning
> of a
> > >> > valid escape sequence the request fails before we ever get it in
> > >> > ConnectorsResource, so a backslash would need to be escaped: \\
> > >> >   "  - Quotation marks need to be escaped as well to keep the json
> > body
> > >> >   of
> > >> > the request legal: \"
> > >> > In both cases the escape character will be part of the connector
> name
> > >> and
> > >> > need to be specified in the url to retrieve the connector as well,
> > even
> > >> > though we could URL encode it in a legal way without escaping here.
> So
> > >> > they
> > >> > work, not sure if I'd recommend using those characters, but no real
> > >> > reason
> > >> > to prohibit people from using them that I can see either.
> > >>
> > >> Good research, Sönke.
> > >>
> > >> >
> > >> >
> > >> > What I'd do going forward is:
> > >> > - withdraw the KIP, as I don't see a real need for one, since this
> is
> > >> not
> > >> > changing anything, just fixing things.
> > >> > - add a section to the documentation around legal characters,
> specify
> > >> the
> > >> > ones I tested explicitly (url encoded %20 - %7F) and mention that
> most
> > >> > other characters should work as well but no guarantees are g

Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-01 Thread Randall Hauch
+1 (non-binding)

Thanks for pushing this through. Great work!

Randall Hauch

On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <jklu...@simple.com> wrote:

> I haven't heard any additional concerns over the proposal, so I'd like to
> get the voting process started for:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 215%3A+Add+topic+regex+support+for+Connect+sinks
>
> It adds a topics.regex option for Kafka Connect sinks as an alternative to
> the existing required topics option.
>


Re: [VOTE] KIP-215: Add topic regex support for Connect sinks

2017-11-09 Thread Randall Hauch
Jeff,

This KIP does pass with 3 binding +1s and no other binding votes, and with
more than 72 hour for voting. Do you want to update the KIP-215 and KIP
list pages accordingly? We can also merge the PR.

Thanks, and very nice work!

Randall

On Fri, Nov 3, 2017 at 8:12 PM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> +1! The S3 or hdfs connector will now be super powerful !
>
> On 4 Nov. 2017 11:27 am, "Konstantine Karantasis" <
> konstant...@confluent.io>
> wrote:
>
> > Nice addition!
> >
> > +1 (non-binding)
> >
> > Konstantine
> >
> > On Fri, Nov 3, 2017 at 4:52 PM, Jeff Klukas <jklu...@simple.com> wrote:
> >
> > > So sorry for skirting the process there. I wasn't aware of the 72 hour
> > > window and I don't see that mentioned in in
> > > https://cwiki.apache.org/confluence/display/KAFKA/Bylaws#Bylaws-Voting
> > >
> > > Should I feel free to update that wiki page with a note about the
> window?
> > >
> > > On Fri, Nov 3, 2017 at 7:49 PM, Ewen Cheslack-Postava <
> e...@confluent.io
> > >
> > > wrote:
> > >
> > > > Jeff,
> > > >
> > > > Just FYI re: process, I think you're pretty much definitely in the
> > clear
> > > > hear since this one is a straightforward design I doubt anybody would
> > > > object to, but voting normally stays open 72h to ensure everyone has
> a
> > > > chance to weigh in.
> > > >
> > > > Again thanks for the KIP and we can move any final discussion over to
> > the
> > > > PR!
> > > >
> > > > -Ewen
> > > >
> > > > On Fri, Nov 3, 2017 at 4:43 PM, Jeff Klukas <jklu...@simple.com>
> > wrote:
> > > >
> > > > > Looks like we've achieved lazy majority, so I'll move the KIP to
> > > > approved.
> > > > >
> > > > > Thanks all for looking this over.
> > > > >
> > > > > On Fri, Nov 3, 2017 at 7:31 PM, Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > +1. Thanks for the KIP!
> > > > > >
> > > > > > On Fri, Nov 3, 2017 at 2:15 PM, Guozhang Wang <
> wangg...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > +1 binding
> > > > > > >
> > > > > > > On Fri, Nov 3, 2017 at 1:25 PM, Ewen Cheslack-Postava <
> > > > > e...@confluent.io
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 binding
> > > > > > > >
> > > > > > > > Thanks Jeff!
> > > > > > > >
> > > > > > > > On Wed, Nov 1, 2017 at 5:21 PM, Randall Hauch <
> > rha...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > Thanks for pushing this through. Great work!
> > > > > > > > >
> > > > > > > > > Randall Hauch
> > > > > > > > >
> > > > > > > > > On Wed, Nov 1, 2017 at 9:40 AM, Jeff Klukas <
> > > jklu...@simple.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I haven't heard any additional concerns over the
> proposal,
> > so
> > > > I'd
> > > > > > > like
> > > > > > > > to
> > > > > > > > > > get the voting process started for:
> > > > > > > > > >
> > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > 215%3A+Add+topic+regex+support+for+Connect+sinks
> > > > > > > > > >
> > > > > > > > > > It adds a topics.regex option for Kafka Connect sinks as
> an
> > > > > > > alternative
> > > > > > > > > to
> > > > > > > > > > the existing required topics option.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > -- Guozhang
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-09 Thread Randall Hauch
Any progress on updating the PR and withdrawing KIP-212?

On Fri, Oct 27, 2017 at 5:19 PM, Randall Hauch <rha...@gmail.com> wrote:

> Yes, connector names should not be blank or contain just whitespace. In
> fact, I might recommend that we trim whitespace at the front and rear of
> new connector names and then disallowing any zero-length name. Existing
> connectors would remain valid, and this would not break backward
> compatibility. That might require a small kip simply to update the
> documentation and specify what names are valid.
>
> WDYT?
>
> Randall
>
> On Fri, Oct 27, 2017 at 1:08 PM, Colin McCabe <cmcc...@apache.org> wrote:
>
>> On Wed, Oct 25, 2017, at 01:07, Sönke Liebau wrote:
>> > I've spent some time looking at this and testing various characters and
>> > it
>> > would appear that Randall's suspicion was spot on. I think we can
>> support
>> > a
>> > fairly large set of characters with very minor changes.
>> >
>> > I was put of by the exceptions that were thrown when creating connectors
>> > with certain characters and suspected a larger underlying problem when
>> in
>> > fact the only issue is, that the URL in the rest request used to
>> retrieve
>> > the response for the create connector request needs to be percent
>> encoded
>> > [1].
>> >
>> > I've fixed this and done some local testing which worked out quite
>> > nicely,
>> > apart from two special cases, I've not been able to find characters that
>> > created issues, even space and slash work.
>> > The mentioned special cases are:
>> >   \  - if the name contains a backslash that is not the beginning of a
>> > valid escape sequence the request fails before we ever get it in
>> > ConnectorsResource, so a backslash would need to be escaped: \\
>> >   "  - Quotation marks need to be escaped as well to keep the json body
>> >   of
>> > the request legal: \"
>> > In both cases the escape character will be part of the connector name
>> and
>> > need to be specified in the url to retrieve the connector as well, even
>> > though we could URL encode it in a legal way without escaping here. So
>> > they
>> > work, not sure if I'd recommend using those characters, but no real
>> > reason
>> > to prohibit people from using them that I can see either.
>>
>> Good research, Sönke.
>>
>> >
>> >
>> > What I'd do going forward is:
>> > - withdraw the KIP, as I don't see a real need for one, since this is
>> not
>> > changing anything, just fixing things.
>> > - add a section to the documentation around legal characters, specify
>> the
>> > ones I tested explicitly (url encoded %20 - %7F) and mention that most
>> > other characters should work as well but no guarantees are given
>> > - update the pull request for KAFKA-4930 to allow all characters but
>> > still
>> > prohibit creating a connector with an empty name. I'd propose to keep
>> the
>> > validator though as it'll give us a central location to do any checking
>> > that might turn out to be necessary later on.
>>
>> Are empty names currently allowed?  That's unfortunate.
>>
>> > - add some integration tests to check connectors with special characters
>> > in
>> > their names work
>> > - fix the url encoding line in ConnectorsResource
>> >
>> > Does that sound fair to everybody?
>>
>> It sounds good to me, but I will let someone more knowledgeable about
>> connect chime in.
>>
>> best,
>> Colin
>>
>> >
>> > Kind regards,
>> > Sönke
>> >
>> > [1]
>> > https://github.com/apache/kafka/blob/trunk/connect/runtime/
>> src/main/java/org/apache/kafka/connect/runtime/rest/
>> resources/ConnectorsResource.java#L102
>> >
>> > On Tue, Oct 24, 2017 at 8:40 PM, Colin McCabe <cmcc...@apache.org>
>> wrote:
>> >
>> > > On Tue, Oct 24, 2017, at 11:28, Sönke Liebau wrote:
>> > > > Hi,
>> > > >
>> > > > after reading your messages I'll grant that I might have picked a
>> > > > somewhat
>> > > > draconic option to solve these issues.
>> > > >
>> > > > In general I believe that properly encoding the URLs after having
>> created
>> > > > the connectors should solve a lot of the issues already. For some
>> > > > characters the rest api returns

Re: [DISCUSS] KIP-285: Connect Rest Extension Plugin

2018-04-27 Thread Randall Hauch
Great work, Magesh. I like the overall approach a lot, so I left some
pretty nuanced comments about specific details.

Best regards,

Randall

On Wed, Apr 25, 2018 at 3:03 PM, Magesh Nandakumar <mage...@confluent.io>
wrote:

> Thanks Randall for your thoughts. I have created a replica of the required
> entities in the draft implementation. If you can take a look at the PR and
> let me know your thoughts, I will update the KIP to reflect the same
>
> https://github.com/apache/kafka/pull/4931
>
> On Tue, Apr 24, 2018 at 11:44 AM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Magesh, I think our last emails cross in mid-stream.
> >
> > We definitely want to put the new public interfaces/classes in the API
> > module, and implementation in the runtime module. Yes, this will affect
> the
> > design, since for example we don't want to expose runtime types to the
> API,
> > and we want to prevent breaking changes. We don't really want to move the
> > REST entities if we don't have to, since that may break projects that are
> > extending the runtime module -- even though the runtime module is not a
> > public API we still want to _try_ to change things.
> >
> > Do you want to try to create a prototype to see what kind of impact and
> > choices we'll have to make?
> >
> > Best regards,
> >
> > Randall
> >
> > On Tue, Apr 24, 2018 at 12:48 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> >
> > > Thanks for updating the KIP, Magesh. You've resolved all of my
> concerns,
> > > though I have one more: we should specify the package names for all new
> > > interfaces/classes.
> > >
> > > I'm looking forward to more feedback from others.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> > > On Fri, Apr 20, 2018 at 12:17 AM, Magesh Nandakumar <
> > mage...@confluent.io>
> > > wrote:
> > >
> > >> Hi All,
> > >>
> > >> I have updated the KIP with following changes
> > >>
> > >>1. Expanded the Motivation section
> > >>2. Included details about the interface in the public interface
> > section
> > >>3. Modified the config name to rest.extension.classes
> > >>4. Modified the ConnectRestExtension to include Configurable
> instead
> > of
> > >>ResourceConfig
> > >>5. Modified the "Rest Extension Integration with Connect" in
> > "Proposed
> > >>Approach" to include a new Custom implementation for Configurable
> > >>6. Provided examples for the Java Service provider mechanism
> > >>7. Included a reference implementation in scope
> > >>
> > >> Kindly let me know your thoughts on the updates.
> > >>
> > >> Thanks
> > >> Magesh
> > >>
> > >> On Thu, Apr 19, 2018 at 10:39 AM, Magesh Nandakumar <
> > mage...@confluent.io
> > >> >
> > >> wrote:
> > >>
> > >> > Hi Randall,
> > >> >
> > >> > Thanks for your feedback. I also would like to go with
> > >> > rest.extension.classes`.
> > >> >
> > >> > For exposing Configurable, my original intention was just to expose
> > that
> > >> > to the extension because that's all one needs to register JAX RS
> > >> resources.
> > >> > The fact that we use Jersey shouldn't even be exposed in the
> > interface.
> > >> > Hence it doesn't affect the public API by any means.
> > >> >
> > >> > I will update the KIP and let everyone know.
> > >> >
> > >> > Thanks
> > >> > Magesh
> > >> >
> > >> > On Thu, Apr 19, 2018 at 9:51 AM, Randall Hauch <rha...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> On Thu, Apr 12, 2018 at 10:59 AM, Magesh Nandakumar <
> > >> mage...@confluent.io
> > >> >> >
> > >> >> wrote:
> > >> >>
> > >> >> > Hi Randall,
> > >> >> >
> > >> >> > Thanks a lot for your feedback.
> > >> >> >
> > >> >> > I will update the KIP to reflect your comments in (1), (2), (7)
> and
> > >> (8).
> > >> >> >
> > >> >>
> > >> >> Looking forward to these.
> > >> >>
> > >> 

Re: [DISCUSS] KIP-305: Add Connect primitive number converters

2018-05-19 Thread Randall Hauch
Considering this KIP is straightforward, what do you think about kicking off a 
vote? Or does it need more discussion time?

Regards,
Randall

> On May 18, 2018, at 4:30 PM, Ewen Cheslack-Postava <e...@confluent.io> wrote:
> 
> Yeah, the usefulness of short seems questionable, but int is probably a
> large enough range for some identifiers (e.g. we use an int in schema
> registry). But yeah, I don't really have a problem with having Converters
> for each of the existing serdes just for symmetry and since presumably
> somebody finds them useful for something if they exist.
> 
> -Ewen
> 
>> On Fri, May 18, 2018 at 11:55 AM Randall Hauch <rha...@gmail.com> wrote:
>> 
>> Thanks, Ewen.
>> 
>> You make several good points, and I've updated the KIP to hopefully address
>> your comments. I think the symmetry with the Kafka serdes is useful, so
>> I've kept all 5 converters in the KIP.
>> 
>> Interestingly, perhaps the short and int converters (with the reduced
>> ranges) are not necessarily that useful for keys either.
>> 
>> Regards,
>> 
>> Randall
>> 
>> On Thu, May 17, 2018 at 10:08 PM, Ewen Cheslack-Postava <e...@confluent.io
>>> 
>> wrote:
>> 
>>> Just a couple of minor points that don't really affect the
>> implementation:
>>> 
>>> * For nulls, let's just mention the underlying serializers already
>> support
>>> this. I'm actually not sure why they should/need to, but given they do,
>>> let's just defer to that implementation.
>>> * I'm not sure where Float and Double converters are actually useful. The
>>> use cases I know for integer serdes is for keys, but floats seem like a
>> bad
>>> choice for keys. These aren't a lot of overhead to build and maintain,
>> but
>>> if we don't know use cases for the specific types, it might be silly to
>>> spend time and effort building and maintaining them.
>>> 
>>> Otherwise, this seems simple and straightforward. Generally +1 on the
>>> proposal.
>>> 
>>> -Ewen
>>> 
>>> On Thu, May 17, 2018 at 6:04 PM Magesh Nandakumar <mage...@confluent.io>
>>> wrote:
>>> 
>>>> Thanks Randall for the KIP. I think it will be super useful and looks
>>>> pretty straightforward to me.
>>>> 
>>>> Thanks
>>>> Magesh
>>>> 
>>>> On Thu, May 17, 2018 at 4:15 PM, Randall Hauch <rha...@gmail.com>
>> wrote:
>>>> 
>>>>> I'd like to start discussion of a very straightforward proposal for
>>>> Connect
>>>>> to add converters for the basic primitive number types: integer,
>> short,
>>>>> long, double, and float. Here is the KIP:
>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>> 305%3A+Add+Connect+primitive+number+converters
>>>>> 
>>>>> As mentioned in the KIP, I've created a pull request (
>>>>> https://github.com/apache/kafka/pull/5034) for those looking for
>>>>> implementation details.
>>>>> 
>>>>> Any feedback is appreciated.
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Randall
>>>>> 
>>>> 
>>> 
>> 


Re: [DISCUSS] KIP-285: Connect Rest Extension Plugin

2018-05-16 Thread Randall Hauch
A few very minor suggestions:


   1. There are a few formatting issues with paragraphs that use a
   monospace font. Minor, but it would be nice to fix.
   2. Would be nice to link to the PR
   3. Do we need the org.apache.kafka.connect.rest.extension.entities
   package? Could we just move the two classes into the parent
   org.apache.kafka.connect.rest.extension package?
   4. This sentence "The above approach helps alleviate any issues that
   could arise if Extension accidentally reregister the" is cut off.
   5. The "ConnectRestExtensionContext.configure(...)" method's JavaDoc
   should describe the behaviors that are mentioned in the "Rest Extension
   Integration with Connect" section; e.g., behavior when an extension adds a
   resource that is already registered, whether unregistering works, etc.
   Also, ideally the "close()" method would have JavaDoc that explained when
   it is called (e.g., no other methods will be called on the extension after
   this, etc.).
   6. Packaging requirements are different for this component vs
   connectors, transformations, and converters, since this now mandates the
   Service Loader manifest file. This should be called out more explicitly.
   7. It'd be nice if the example included how extension-specific config
   properties are to be defined in the worker configuration file.

As I said, these are all minor suggestions that only affect the KIP
document. Once these are fixed, I think this is ready to move to voting.

Best regards,

Randall

On Tue, May 15, 2018 at 11:30 AM, Magesh Nandakumar <mage...@confluent.io>
wrote:

> Randall- I think I have addressed all the comments. Let me know if we can
> take this to Vote.
>
> Thanks
> Magesh
>
> On Tue, May 8, 2018 at 10:12 PM, Magesh Nandakumar <mage...@confluent.io>
> wrote:
>
> > Hi All,
> >
> > I have updated the KIP to reflect changes based on the PR
> > https://github.com/apache/kafka/pull/4931. Its mostly has minor changes
> > to the interfaces and includes details on packages for the interfaces and
> > the classes. Let me know your thoughts.
> >
> > Thanks
> > Magesh
> >
> > On Fri, Apr 27, 2018 at 12:03 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> >
> >> Great work, Magesh. I like the overall approach a lot, so I left some
> >> pretty nuanced comments about specific details.
> >>
> >> Best regards,
> >>
> >> Randall
> >>
> >> On Wed, Apr 25, 2018 at 3:03 PM, Magesh Nandakumar <
> mage...@confluent.io>
> >> wrote:
> >>
> >> > Thanks Randall for your thoughts. I have created a replica of the
> >> required
> >> > entities in the draft implementation. If you can take a look at the PR
> >> and
> >> > let me know your thoughts, I will update the KIP to reflect the same
> >> >
> >> > https://github.com/apache/kafka/pull/4931
> >> >
> >> > On Tue, Apr 24, 2018 at 11:44 AM, Randall Hauch <rha...@gmail.com>
> >> wrote:
> >> >
> >> > > Magesh, I think our last emails cross in mid-stream.
> >> > >
> >> > > We definitely want to put the new public interfaces/classes in the
> API
> >> > > module, and implementation in the runtime module. Yes, this will
> >> affect
> >> > the
> >> > > design, since for example we don't want to expose runtime types to
> the
> >> > API,
> >> > > and we want to prevent breaking changes. We don't really want to
> move
> >> the
> >> > > REST entities if we don't have to, since that may break projects
> that
> >> are
> >> > > extending the runtime module -- even though the runtime module is
> not
> >> a
> >> > > public API we still want to _try_ to change things.
> >> > >
> >> > > Do you want to try to create a prototype to see what kind of impact
> >> and
> >> > > choices we'll have to make?
> >> > >
> >> > > Best regards,
> >> > >
> >> > > Randall
> >> > >
> >> > > On Tue, Apr 24, 2018 at 12:48 PM, Randall Hauch <rha...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Thanks for updating the KIP, Magesh. You've resolved all of my
> >> > concerns,
> >> > > > though I have one more: we should specify the package names for
> all
> >> new
> >> > > > interfaces/classes.
> >> > > >
> >> > > > I'm looking forward to more fee

Re: [VOTE] KIP-298: Error Handling in Connect kafka

2018-05-21 Thread Randall Hauch
Thanks, Arjun. +1 (non-binding)

Regards,
Randall

On Mon, May 21, 2018 at 11:14 AM, Guozhang Wang  wrote:

> Thanks for the KIP. +1 (binding)
>
>
> Guozhang
>
> On Fri, May 18, 2018 at 3:36 PM, Gwen Shapira  wrote:
>
> > +1
> >
> > Thank you! Error handling in Connect will be a huge improvement.
> >
> > On Thu, May 17, 2018, 1:58 AM Arjun Satish 
> wrote:
> >
> > > All,
> > >
> > > Many thanks for all the feedback on KIP-298. Highly appreciate the time
> > and
> > > effort you all put into it.
> > >
> > > I've updated the KIP accordingly, and would like to start to start a
> vote
> > > on it.
> > >
> > > KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 298%3A+Error+Handling+in+Connect
> > > JIRA: https://issues.apache.org/jira/browse/KAFKA-6738
> > > Discussion Thread: https://www.mail-archive.com/
> > > dev@kafka.apache.org/msg87660.html
> > >
> > > Thanks very much!
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-242: Mask password fields in Kafka Connect REST response

2018-05-21 Thread Randall Hauch
See also
https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations,
which just passed.

On Mon, Mar 19, 2018 at 11:16 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> SSL authentication was added in KIP-208, which will be included in Kafka
> 1.1.0:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 208%3A+Add+SSL+support+to+Kafka+Connect+REST+interface
>
> Connect isn't much different from the core Kafka/client configs currently
> where in some security setups you need to pass in passwords directly, and
> since there's various dynamic broker config improvements in the works, the
> fact that Connect exposes these in a REST API doesn't make it any
> different. I think the real long term solution to this is to add pluggable
> password support where you could, e.g., get these values out of a separate
> secrets management system instead of specifying them directly. Masking
> passwords as described in this solution feels like it's more of a temporary
> workaround and in order to be able to edit and update these connector
> configs by working with the REST API, we'd have to address these issues
> anyway.
>
> -Ewen
>
> On Mon, Mar 19, 2018 at 2:33 PM, Matt Farmer <m...@frmr.me> wrote:
>
> > What’s the status of this? This is a pretty hard blocker for us to meet
> > requirements internally to deploy connect in a distributed fashion.
> >
> > @Ewen - Regarding the concern of accessing information securely - has
> > there been any consideration of adding authentication to the connect api?
> >
> > > On Jan 17, 2018, at 3:55 PM, Randall Hauch <rha...@gmail.com> wrote:
> > >
> > > Vincent,
> > >
> > > Can the KIP more explicitly say that this is opt-in, and that by
> default
> > > nothing will change?
> > >
> > > Randall
> > >
> > > On Tue, Jan 16, 2018 at 11:18 PM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > >> Vincent,
> > >>
> > >> I think with the addition of a configuration to control this for
> > >> compatibility, people would generally be ok with it. If you want to
> > start a
> > >> VOTE thread, the KIP deadline is coming up and the PR looks pretty
> > small. I
> > >> will take a pass at reviewing the PR so we'll be ready to merge if we
> > can
> > >> get the KIP voted through.
> > >>
> > >> Thanks,
> > >> Ewen
> > >>
> > >> On Fri, Jan 12, 2018 at 10:18 AM, Vincent Meng <vm...@zefr.com>
> wrote:
> > >>
> > >>> @Ted: The issue is kinda hard to reproduce. It's just something we
> > >> observe
> > >>> over time.
> > >>>
> > >>> @Ewen: I agree. Opt-in seems to be a good solution to me. To your
> > >> question,
> > >>> if there is no ConfDef that defines which fields are Passwords we can
> > >> just
> > >>> return the config as is.
> > >>>
> > >>> There is a PR for this KIP already. Comments/Discussions are welcome.
> > >>> https://github.com/apache/kafka/pull/4269
> > >>>
> > >>> On Tue, Jan 2, 2018 at 8:52 PM, Ewen Cheslack-Postava <
> > e...@confluent.io
> > >>>
> > >>> wrote:
> > >>>
> > >>>> Vincent,
> > >>>>
> > >>>> Thanks for the KIP. This is definitely an issue we know is a problem
> > >> for
> > >>>> some users.
> > >>>>
> > >>>> I think the major problem with the KIP as-is is that it makes it
> > >>> impossible
> > >>>> to get the original value back out of the API. This KIP probably
> ties
> > >> in
> > >>>> significantly with ideas for securing the REST API (SSL) and adding
> > >> ACLs
> > >>> to
> > >>>> it. Both are things we know people want, but haven't happened yet.
> > >>> However,
> > >>>> it also interacts with other approaches to adding those features,
> e.g.
> > >>>> layering proxies on top of the existing API (e.g. nginx, apache,
> etc).
> > >>> Just
> > >>>> doing a blanket replacement of password values with a constant would
> > >>> likely
> > >>>> break things for people who secure things via a proxy (and may just
> > not
> > >>>> allow reads of configs unl

[VOTE] KIP-305: Add Connect primitive number converters

2018-05-22 Thread Randall Hauch
I'd like to start a vote of a very straightforward proposal for Connect to
add converters for the basic primitive number types: integer, short, long,
double, and float that reuse Kafka's corresponding serdes. Here is the KIP:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-
305%3A+Add+Connect+primitive+number+converters


Best regards,

Randall


Re: [VOTE] KIP-305: Add Connect primitive number converters

2018-05-22 Thread Randall Hauch
+1 (non-binding)

On Tue, May 22, 2018 at 4:05 PM, Matthias J. Sax <matth...@confluent.io>
wrote:

> +1 (binding)
>
> On 5/22/18 1:49 PM, Gwen Shapira wrote:
> > +1 (I can't believe we didn't have it until now...)
> >
> > On Tue, May 22, 2018 at 1:26 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> >> +1 (binding)
> >>
> >> -Ewen
> >>
> >> On Tue, May 22, 2018 at 9:29 AM Ted Yu <yuzhih...@gmail.com> wrote:
> >>
> >>> +1
> >>>
> >>> On Tue, May 22, 2018 at 9:19 AM, Randall Hauch <rha...@gmail.com>
> wrote:
> >>>
> >>>> I'd like to start a vote of a very straightforward proposal for
> Connect
> >>> to
> >>>> add converters for the basic primitive number types: integer, short,
> >>> long,
> >>>> double, and float that reuse Kafka's corresponding serdes. Here is the
> >>> KIP:
> >>>>
> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>> 305%3A+Add+Connect+primitive+number+converters
> >>>>
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Randall
> >>>>
> >>>
> >>
> >
> >
> >
>
>


Re: [VOTE] KIP-305: Add Connect primitive number converters

2018-05-22 Thread Randall Hauch
(bump so a few new subscribers see this thread.)

On Tue, May 22, 2018 at 4:39 PM, Randall Hauch <rha...@gmail.com> wrote:

> +1 (non-binding)
>
> On Tue, May 22, 2018 at 4:05 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
>> +1 (binding)
>>
>> On 5/22/18 1:49 PM, Gwen Shapira wrote:
>> > +1 (I can't believe we didn't have it until now...)
>> >
>> > On Tue, May 22, 2018 at 1:26 PM, Ewen Cheslack-Postava <
>> e...@confluent.io>
>> > wrote:
>> >
>> >> +1 (binding)
>> >>
>> >> -Ewen
>> >>
>> >> On Tue, May 22, 2018 at 9:29 AM Ted Yu <yuzhih...@gmail.com> wrote:
>> >>
>> >>> +1
>> >>>
>> >>> On Tue, May 22, 2018 at 9:19 AM, Randall Hauch <rha...@gmail.com>
>> wrote:
>> >>>
>> >>>> I'd like to start a vote of a very straightforward proposal for
>> Connect
>> >>> to
>> >>>> add converters for the basic primitive number types: integer, short,
>> >>> long,
>> >>>> double, and float that reuse Kafka's corresponding serdes. Here is
>> the
>> >>> KIP:
>> >>>>
>> >>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >>>> 305%3A+Add+Connect+primitive+number+converters
>> >>>>
>> >>>>
>> >>>> Best regards,
>> >>>>
>> >>>> Randall
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>>
>>
>


Re: [VOTE] KIP-305: Add Connect primitive number converters

2018-05-25 Thread Randall Hauch
Thanks, everyone!

This vote passes with 3 binding +1 votes, 5 non-binding +1 votes, and no -1
votes. All binding votes were in before the KIP deadline for 2.0, so this
has been added to the 2.0 release plan:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=80448820

Best regards,
Randall

On Wed, May 23, 2018 at 4:49 AM, Rahul Singh <rahul.xavier.si...@gmail.com>
wrote:

> +1 non binding
>
> --
> Rahul Singh
> rahul.si...@anant.us
>
> Anant Corporation
>
> On May 22, 2018, 9:31 PM -0400, Yeva Byzek <y...@confluent.io>, wrote:
> > +1
> >
> > Thanks,
> > Yeva
> >
> >
> > On Tue, May 22, 2018 at 7:48 PM, Magesh Nandakumar <mage...@confluent.io
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thanks
> > > Magesh
> > >
> > > On Tue, May 22, 2018 at 4:23 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > > > (bump so a few new subscribers see this thread.)
> > > >
> > > > On Tue, May 22, 2018 at 4:39 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Tue, May 22, 2018 at 4:05 PM, Matthias J. Sax <
> > > matth...@confluent.io
> > > > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > On 5/22/18 1:49 PM, Gwen Shapira wrote:
> > > > > > > +1 (I can't believe we didn't have it until now...)
> > > > > > >
> > > > > > > On Tue, May 22, 2018 at 1:26 PM, Ewen Cheslack-Postava <
> > > > > > e...@confluent.io
> > > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > -Ewen
> > > > > > > >
> > > > > > > > On Tue, May 22, 2018 at 9:29 AM Ted Yu <yuzhih...@gmail.com
> > > wrote:
> > > > > > > >
> > > > > > > > > +1
> > > > > > > > >
> > > > > > > > > On Tue, May 22, 2018 at 9:19 AM, Randall Hauch <
> rha...@gmail.com
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > I'd like to start a vote of a very straightforward
> proposal for
> > > > > > Connect
> > > > > > > > > to
> > > > > > > > > > add converters for the basic primitive number types:
> integer,
> > > > short,
> > > > > > > > > long,
> > > > > > > > > > double, and float that reuse Kafka's corresponding
> serdes. Here
> > > is
> > > > > > the
> > > > > > > > > KIP:
> > > > > > > > > >
> > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > 305%3A+Add+Connect+primitive+number+converters
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Best regards,
> > > > > > > > > >
> > > > > > > > > > Randall
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
>


Re: [VOTE] KIP-285: Connect Rest Extension Plugin

2018-05-18 Thread Randall Hauch
+1 (non-binding)

Regards,
Randall

On Fri, May 18, 2018 at 11:32 AM, Konstantine Karantasis <
konstant...@confluent.io> wrote:

> +1 (non-binding)
>
> - Konstantine
>
> On Thu, May 17, 2018 at 10:05 PM, Ewen Cheslack-Postava  >
> wrote:
>
> > +1 (binding)
> >
> > Thanks,
> > Ewen
> >
> > On Thu, May 17, 2018 at 12:16 PM Ted Yu  wrote:
> >
> > > +1
> > >  Original message From: Gwen Shapira <
> g...@confluent.io>
> > > Date: 5/17/18  12:02 PM  (GMT-08:00) To: dev 
> > > Subject: Re: [VOTE] KIP-285: Connect Rest Extension Plugin
> > > LGTM. +1.
> > >
> > > On Wed, May 16, 2018 at 8:19 PM, Magesh Nandakumar <
> mage...@confluent.io
> > >
> > > wrote:
> > >
> > > > Hello everyone,
> > > >
> > > > After a good round of discussions with excellent feedback and no
> major
> > > > objections, I would like to start a vote on KIP-285: Connect Rest
> > > Extension
> > > > Plugin.
> > > >
> > > > KIP: <
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 285%3A+Connect+Rest+Extension+Plugin
> > > > >
> > > >
> > > >
> > > > JIRA: <*https://issues.apache.org/jira/browse/KAFKA-6776
> > > > *>
> > > >
> > > > Discussion thread: <
> > > > https://www.mail-archive.com/dev@kafka.apache.org/msg86910.html>
> > > >
> > > > Thanks,
> > > > Magesh
> > > >
> > >
> > >
> > >
> > > --
> > > *Gwen Shapira*
> > > Product Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter  | blog
> > > 
> > >
> >
>


Re: [VOTE] KIP-297: Externalizing Secrets for Connect Configurations

2018-05-18 Thread Randall Hauch
Looks great.

+1 (non-binding)

Regards,
Randall

On Fri, May 18, 2018 at 10:23 AM, Rajini Sivaram 
wrote:

> Thanks, Robert! Sounds good. And thanks for the KIP.
>
> +1 (binding)
>
> Regards,
>
> Rajini
>
> On Fri, May 18, 2018 at 4:17 PM, Robert Yokota  wrote:
>
> > HI Rajini,
> >
> > Good questions.
> >
> > First, if no ConfigProviders are configured, then config values of the
> form
> > ${vault:mypassword} will remain as is.
> >
> > Second, I mention in the KIP that if a provider does not have a value
> for a
> > given key, the variable will remain unresolved and the final value will
> be
> > of the form ${vault:mypassword} still.
> >
> > If one wants to use a config value ${vault:mypassword}, as well as the
> > VaultConfigProvider, one can choose to use a different prefix besides
> > "vault" when referring to the VaultConfigProvider since the prefixes are
> > arbitrary and specified in a config file.
> >
> > Finally, if one want to use a config value ${vault:mypassword}, as well
> as
> > the VaultConfigProvider, and one wants to use the prefix "vault" and not
> > something else, then yes, one could use a LiteralConfigProvider as you
> > described, or even put the ${vault:mypassword} in a different file and
> use
> > the FileConfigProvider to pull in the value (since there is only one
> level
> > of indirection).
> >
> > Thanks,
> > Robert
> >
> >
> >
> > On Fri, May 18, 2018 at 3:42 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Hi Robert,
> > >
> > > A couple of questions:
> > >
> > >
> > >1. Since we always expand config values, don't we also need a way to
> > >include values that never get expanded? I may want to use
> > >"${vault:mypassword}" as my literal password without a lookup. Since
> > we
> > >allow only level of indirection, perhaps all we need is a
> > ConfigProvider
> > >that uses the string inside, for example:
> > ${literal:${vault:mypassword}}
> > > ?
> > >It would avoid having restrictions on what passwords can look like.
> > >2. What is the behaviour if I specify a password that is
> > >"${notavault:something}" that matches the config provider syntax,
> but
> > > for
> > >which there is no config provider?
> > >
> > >
> > >
> > > On Fri, May 18, 2018 at 5:41 AM, Ewen Cheslack-Postava <
> > e...@confluent.io>
> > > wrote:
> > >
> > > > Thanks for addressing this Robert, it's a pretty common user need.
> > > >
> > > > First, +1 (binding) generally.
> > > >
> > > > Two very minor comments that I think could be clarified but wouldn't
> > > affect
> > > > votes:
> > > >
> > > > * Let's list in the KIP what package the ConfigProvider,
> > > > ConfigChangeCallback, ConfigData and ConfigTransformer interfaces are
> > > > defined in. Very, very minor, but given the aim to possibly reuse
> > > elsewhere
> > > > and the fact that it'll likely end up in the common packages might
> mean
> > > > devs focused more on the common/core packages will have strong
> opinions
> > > > where they should be. I think it'd definitely be good to get input
> from
> > > > folks focusing on the broker on where they think it should go since I
> > > think
> > > > it would be very natural to extend this to security settings there.
> > > (Also,
> > > > I think ConfigData is left out of the list of new interfaces by
> > accident,
> > > > but I think it's clear what's being added anyway.)
> > > > * I may have glanced past it, but we're not shipping any
> > ConfigProviders
> > > > out of the box? This mentions file and vault, but just as examples.
> > Just
> > > > want to make sure everyone knows up front that this is a pluggable
> API,
> > > but
> > > > you need to add more jars to take advantage of it. I think this is
> fine
> > > as
> > > > I don't think there are truly common secrets provider
> > > > formats/apis/protocols, just want to make sure it is clear.
> > > >
> > > > Thanks,
> > > > Ewen
> > > >
> > > > On Thu, May 17, 2018 at 6:19 PM Ted Yu  wrote:
> > > >
> > > > > +1
> > > > >  Original message From: Magesh Nandakumar <
> > > > > mage...@confluent.io> Date: 5/17/18  6:05 PM  (GMT-08:00) To:
> > > > > dev@kafka.apache.org Subject: Re: [VOTE] KIP-297: Externalizing
> > > Secrets
> > > > > for Connect Configurations
> > > > > Thanks Robert, this looks great
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Thu, May 17, 2018 at 5:35 PM, Colin McCabe 
> > > > wrote:
> > > > >
> > > > > > Thanks, Robert!
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Colin
> > > > > >
> > > > > >
> > > > > > On Thu, May 17, 2018, at 14:15, Robert Yokota wrote:
> > > > > > > Hi Colin,
> > > > > > >
> > > > > > > I've changed the KIP to have a composite object returned from
> > > get().
> > > > > > It's
> > > > > > > probably the most straightforward option.  Please let me know
> if
> > > you
> > > > > have
> > > > > > > any other concerns.
> > 

Re: [DISCUSS] KIP-301 Schema Inferencing in JsonConverters

2018-05-18 Thread Randall Hauch
There are lots of potential details/gotchas here, with no obvious defaults.
Even if we did have optional configs for the type(s) to use for empty
arrays and null values, how likely is it that those would apply for all
occurrences? Two fields might have empty arrays, but the best schema for
each might be different.

Allowing the user to specify a schema makes sense as a powerful fallback,
though it wouldn't be the most usable for cases where you might just want
to correct one thing.

I think we should pick one approach that works. If we can't find a simple
way for a few configuration options to work in nearly all cases, then
perhaps the explicit schema override is the least complex approach.

I'd like to hear what other people think.

Randall

On Tue, May 15, 2018 at 4:37 PM, Allen Tang  wrote:

> I've went through several iterations of back-and-forth with @rhauch on the
> PR and on Confluent's Slack Community. The current thinking is that
> assuming
> an empty array is a String array is not necessarily the best option, nor is
> assuming that all null values in a JSON node is a String.
>
> We might be able to account for these potentially false
> assumptions/inferences by introducing new task properties (with
> value.converter prefix) that explicitly define overrides for either
> specific json field keys, or give the option for Kafka Connect users to
> provide a full immutabl schema they know are true for the topics impacted
> by the Sink Connector.
>
> What do you think?
>
> - Allen
>
>
> On Mon, May 14, 2018 at 2:58 PM, Allen Tang  wrote:
>
> > Hi,
> >
> > I just opened a KIP to add Schema Inferencing in JsonConverters for
> Kafka Connect.
> >
> > The details of the proposal can be found here: https://cwiki.apache.org/
> confluence/display/KAFKA/KIP-301%3A+Schema+Inferencing+for+JsonConverter
> >
> > Also, I have created a -
> >
> > 1.) JIRA ticket: https://issues.apache.org/jira/browse/KAFKA-6895
> >
> > 2.) Provisional PR with initial discussion: https://github.com/apache/
> kafka/pull/5001
> >
> > Looking forward to the community's feedback! Cheers!
> >
> > -Allen
> >
> >
>


[DISCUSS] KIP-305: Add Connect primitive number converters

2018-05-17 Thread Randall Hauch
I'd like to start discussion of a very straightforward proposal for Connect
to add converters for the basic primitive number types: integer, short,
long, double, and float. Here is the KIP:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-305%3A+Add+Connect+primitive+number+converters

As mentioned in the KIP, I've created a pull request (
https://github.com/apache/kafka/pull/5034) for those looking for
implementation details.

Any feedback is appreciated.

Best regards,

Randall


Re: [DISCUSS] KIP-305: Add Connect primitive number converters

2018-05-18 Thread Randall Hauch
Thanks, Ewen.

You make several good points, and I've updated the KIP to hopefully address
your comments. I think the symmetry with the Kafka serdes is useful, so
I've kept all 5 converters in the KIP.

Interestingly, perhaps the short and int converters (with the reduced
ranges) are not necessarily that useful for keys either.

Regards,

Randall

On Thu, May 17, 2018 at 10:08 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Just a couple of minor points that don't really affect the implementation:
>
> * For nulls, let's just mention the underlying serializers already support
> this. I'm actually not sure why they should/need to, but given they do,
> let's just defer to that implementation.
> * I'm not sure where Float and Double converters are actually useful. The
> use cases I know for integer serdes is for keys, but floats seem like a bad
> choice for keys. These aren't a lot of overhead to build and maintain, but
> if we don't know use cases for the specific types, it might be silly to
> spend time and effort building and maintaining them.
>
> Otherwise, this seems simple and straightforward. Generally +1 on the
> proposal.
>
> -Ewen
>
> On Thu, May 17, 2018 at 6:04 PM Magesh Nandakumar <mage...@confluent.io>
> wrote:
>
> > Thanks Randall for the KIP. I think it will be super useful and looks
> > pretty straightforward to me.
> >
> > Thanks
> > Magesh
> >
> > On Thu, May 17, 2018 at 4:15 PM, Randall Hauch <rha...@gmail.com> wrote:
> >
> > > I'd like to start discussion of a very straightforward proposal for
> > Connect
> > > to add converters for the basic primitive number types: integer, short,
> > > long, double, and float. Here is the KIP:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 305%3A+Add+Connect+primitive+number+converters
> > >
> > > As mentioned in the KIP, I've created a pull request (
> > > https://github.com/apache/kafka/pull/5034) for those looking for
> > > implementation details.
> > >
> > > Any feedback is appreciated.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> >
>


Re: [DISCUSS] KIP-285: Connect Rest Extension Plugin

2018-05-16 Thread Randall Hauch
Looks good to me. Thanks for quickly making the changes! Great work!
 
Best regards,

Randall

> On May 16, 2018, at 8:07 PM, Magesh Nandakumar <mage...@confluent.io> wrote:
> 
> Randall,
> 
> I have adjusted the package names per Ewen's suggestions and also made some
> minor edits per your suggestions. Since there are no major outstanding
> issues, i'm moving this to vote.
> 
> Thanks
> Magesh
> 
>> On Wed, May 16, 2018 at 4:38 PM, Randall Hauch <rha...@gmail.com> wrote:
>> 
>> A few very minor suggestions:
>> 
>> 
>>   1. There are a few formatting issues with paragraphs that use a
>>   monospace font. Minor, but it would be nice to fix.
>>   2. Would be nice to link to the PR
>>   3. Do we need the org.apache.kafka.connect.rest.extension.entities
>>   package? Could we just move the two classes into the parent
>>   org.apache.kafka.connect.rest.extension package?
>>   4. This sentence "The above approach helps alleviate any issues that
>>   could arise if Extension accidentally reregister the" is cut off.
>>   5. The "ConnectRestExtensionContext.configure(...)" method's JavaDoc
>>   should describe the behaviors that are mentioned in the "Rest Extension
>>   Integration with Connect" section; e.g., behavior when an extension
>> adds a
>>   resource that is already registered, whether unregistering works, etc.
>>   Also, ideally the "close()" method would have JavaDoc that explained
>> when
>>   it is called (e.g., no other methods will be called on the extension
>> after
>>   this, etc.).
>>   6. Packaging requirements are different for this component vs
>>   connectors, transformations, and converters, since this now mandates the
>>   Service Loader manifest file. This should be called out more explicitly.
>>   7. It'd be nice if the example included how extension-specific config
>>   properties are to be defined in the worker configuration file.
>> 
>> As I said, these are all minor suggestions that only affect the KIP
>> document. Once these are fixed, I think this is ready to move to voting.
>> 
>> Best regards,
>> 
>> Randall
>> 
>> On Tue, May 15, 2018 at 11:30 AM, Magesh Nandakumar <mage...@confluent.io>
>> wrote:
>> 
>>> Randall- I think I have addressed all the comments. Let me know if we can
>>> take this to Vote.
>>> 
>>> Thanks
>>> Magesh
>>> 
>>> On Tue, May 8, 2018 at 10:12 PM, Magesh Nandakumar <mage...@confluent.io
>>> 
>>> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> I have updated the KIP to reflect changes based on the PR
>>>> https://github.com/apache/kafka/pull/4931. Its mostly has minor
>> changes
>>>> to the interfaces and includes details on packages for the interfaces
>> and
>>>> the classes. Let me know your thoughts.
>>>> 
>>>> Thanks
>>>> Magesh
>>>> 
>>>> On Fri, Apr 27, 2018 at 12:03 PM, Randall Hauch <rha...@gmail.com>
>>> wrote:
>>>> 
>>>>> Great work, Magesh. I like the overall approach a lot, so I left some
>>>>> pretty nuanced comments about specific details.
>>>>> 
>>>>> Best regards,
>>>>> 
>>>>> Randall
>>>>> 
>>>>> On Wed, Apr 25, 2018 at 3:03 PM, Magesh Nandakumar <
>>> mage...@confluent.io>
>>>>> wrote:
>>>>> 
>>>>>> Thanks Randall for your thoughts. I have created a replica of the
>>>>> required
>>>>>> entities in the draft implementation. If you can take a look at the
>> PR
>>>>> and
>>>>>> let me know your thoughts, I will update the KIP to reflect the same
>>>>>> 
>>>>>> https://github.com/apache/kafka/pull/4931
>>>>>> 
>>>>>> On Tue, Apr 24, 2018 at 11:44 AM, Randall Hauch <rha...@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Magesh, I think our last emails cross in mid-stream.
>>>>>>> 
>>>>>>> We definitely want to put the new public interfaces/classes in the
>>> API
>>>>>>> module, and implementation in the runtime module. Yes, this will
>>>>> affect
>>>>>> the
>>>>>>> design, since for example we don't want to expose runtime types to
>>> the
>>>>>

Re: Doubt about developing a kafka connect with avro

2017-10-19 Thread Randall Hauch
Sounds like you're using Kafka Connect and trying to write a source
connector. The Connect framework separates responsibilities:

1. Your source connector is responsible for generating the SourceRecord
objects and returning them to the Connect framework via your task's
"poll()" method.
2. The Connect framework passes each SourceRecord through the chain of
Single Message Transforms (SMTs), if any are configured for that connector.
Each SMT accepts one SourceRecord and outputs a (presumably) modified
SourceRecord.
3. The SourceRecord output from the last SMT (or the source connector, if
there are no SMTs) is the *serialized* using the configured Converter
4. The Connect framework writes the serialized record to the specified
Kafka topic.

The whole purpose of this separation of responsibilities is to make it
easier to write a source connector, since the connector can focus on
reading the external system and generating the SourceRecord objects. It
also means that somebody can use a source connector they didn't write, and
modify the generated SourceRecord objects using SMTs to, for example,
reroute them to different topics or partitions, or change the structure of
the SourceRecord's keys and values. Finally, this separation also means
that somebody can use a connector (and 0 or more SMTs) they didn't write
and control how the records are serialized to Kafka by changing the
converter.

So, your Twitter source connector should create the SourceRecord objects
representing the tweets (or whatever you're ingesting) and return them to
the Connect framework via your task's "poll()" method. This is the same no
matter how you want to serialize the records into a binary form written to
Kafka topics.

You have a number of converters to choose from. First, Connect ships with
the JSON converter, which writes out to a JSON representation. To serialize
to Avro, use Confluent's Avro Converter, or implement and use your own
converter if you so choose. There are other converters out there.

By the way, Connect defines it's own Schema framework that doesn't tie
itself to Avro, but still gives connectors nearly all of the power and
flexibility of using Avro. What it doesn't allow you to do, however, is
create Java classes for your schemas; instead, you use the Struct and
Schema objects and populate your Structs similar to how you might do it
with Avro's GenericRecords.

Now, you may decide that you want your code to do a lot more and directly
write out to Kafka. In that case, your code will have to set up and use a
producer, know how to read the Twitter firehose, know how to serialize the
records (e.g., hard-code using the Avro serializer), know how to write
those records to Kafka via the producer, and track the progress of where
you are in the firehose. If you want to share this and let other people use
it, you'd need to make it all configurable so that people can adapt the
structure of the records to their own needs, and change how they're
serializing (e.g., using Protobuf rather than Avro). If you take this
approach, you'll be replicating a lot of what Connect is *already* doing
for you.

I hope this helps explain why Connect is the way it is.

Randall



, for serializing those objects  the framework separates the responsibility
of creating the source records serializing the messages

On Thu, Oct 19, 2017 at 2:24 AM, Juan Garcia Losada <
juan.garcialos...@plexus.es> wrote:

> Hello to everyone,
>
> First I would like to give thanks in advantage to everyone willing to help
> my in this issue.
>
> I have been coding a custom kafka connector that read from the twitter api
> and write in a kafka topic, I have been trying to use avro coding to write
> those message and i found that it was very difficult. The technologies used
> were confluent docker kafka images 3.1.2 and kafka client 0.10.1.0. For
> generating the avro object I used the maven plugin avro-maven-plugin, that
> generates the object with the schema correctly. But, I have found that, for
> creating the source record, I have to replicate the same schema in java
> code creating a custom "org.apache.kafka.connect.data.Schema" and then,
> inform a org.apache.kafka.connect.data.Struct object with all the
> information. As a value converter class, I use the AvroConverter, and
> reviewing the code, the only form to send a custom object, is sending it as
> Struct object, it the, create the avro format and use it for encoding the
> object. I don't understand why it can not digest my custom class created by
> the avro maven plugin to serialize the object. I dont now if this is the
> correct form to create a connector with avro serializer.
>
> Could I get some feedbacks about this solution? Thanks to everyone. Xoan.
>


Re: Before creating KIP : Kafka Connect / Add a configuration provider class

2017-10-23 Thread Randall Hauch
Very interesting. Would the proposed configuration provider be set at the
connector level or the worker level? The latter would obviously be required
to handle all/multiple connector configurations. Either way, the provider
class(es) would need to be installed on the worker (really, every worker),
correct?

Would all provider implementations be custom implementations, or are there
some provider implementations that are general enough for Connect to
include them?

Best regards,

Randall

On Fri, Oct 20, 2017 at 5:08 AM, Florian Hussonnois 
wrote:

> Hi Team
>
> Before submitting a new KIP I would like to open the discussion regarding
> an enhancement of Kafka Connect.
>
> Currently, the only way to configure a connector (in distributed mode) is
> through REST endpoints while creating or updating a connector.
>
> It would be nice to have the possibility to specify a configs provider
> class (as we specify the connector class) in the JSON payload sent over the
> REST API.
> This class would be called during the connector creation to complete the
> configs submitted via REST.
>
> The motivations for a such functionality is for example to enforce a
> configuration for all deployed connectors, to provide default configs or to
> provide sensitive configs like user/password.
>
> I've met these requirements on different projects.
>
> Do you think, this feature merits a new KIP ?
>
> Thanks,
>
> --
> Florian HUSSONNOIS
>


Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-10-27 Thread Randall Hauch
the end user would have to encode the characters
> himself.
> > > > And due to entity encoding ending every character with a ; which
> causes
> > > > the
> > > > embedded jetty server to cut the connector name at that character
> we'd
> > > > probably need to encode that character in URL encoding again for
> that to
> > > > work out - which might get a bit too complex tbh.
> > >
> > > Sorry, I meant to write percent-encoding, not entity refs.
> > > https://en.wikipedia.org/wiki/Percent-encoding
> > >
> > > best,
> > > Colin
> > >
> > >
> > > > I will further investigate which characters the url decoding that
> jetty
> > > > brings to the table will let us use and if all of these are correctly
> > > > handled during connector creation and report back with a new list of
> > > > characters that I think we can support fairly easily.
> > > >
> > > > Kind regards,
> > > > Sönke
> > > >
> > > >
> > > > On Tue, Oct 24, 2017 at 6:42 PM, Colin McCabe <cmcc...@apache.org>
> > > wrote:
> > > >
> > > > > It should be possible to use entity references to encode these
> > > > > characters in URLs.  See https://dev.w3.org/html5/html-
> author/charref
> > > > > Maybe I'm misunderstanding the problem, but can we simply encode
> the
> > > > > URLs, rather than restricting the names?
> > > > >
> > > > > best,
> > > > > Colin
> > > > >
> > > > >
> > > > > On Mon, Oct 23, 2017, at 14:12, Randall Hauch wrote:
> > > > > > Here's the link to KIP-212:
> > > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > > > action?pageId=74684586
> > > > > >
> > > > > > I do think it's worthwhile to define the rules for connector
> names.
> > > > > > However, I think it would be better to describe the current
> > > restrictions
> > > > > > for names outside of them appearing within URLs. For example, if
> we
> > > can
> > > > > > keep connector names relatively free of constraints but instead
> > > define
> > > > > > how
> > > > > > names should be encoded when used within URLs (e.g., URL
> encoding),
> > > then
> > > > > > we
> > > > > > may not have (m)any backward compatibility issues other than
> fixing
> > > some
> > > > > > bugs related to proper encoding/decoding.
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > >
> > > > > > On Mon, Oct 23, 2017 at 3:44 PM, Sönke Liebau <
> > > > > > soenke.lie...@opencore.com.invalid> wrote:
> > > > > >
> > > > > > > All,
> > > > > > >
> > > > > > > I've created a KIP to discuss enforcing of rules on what
> > > characters are
> > > > > > > allowed in connector names.
> > > > > > >
> > > > > > > Since this may break api calls that are currently working I
> > > figured a
> > > > > KIP
> > > > > > > is the better way to go than to just create a jira.
> > > > > > >
> > > > > > > I'd love to hear your input on this!
> > > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Sönke Liebau
> > > > Partner
> > > > Tel. +49 179 7940878
> > > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
> > >
> >
> >
> >
> > --
> > Sönke Liebau
> > Partner
> > Tel. +49 179 7940878
> > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
>


Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2017-12-21 Thread Randall Hauch
All,

I've updated KIP-145 to reflect my proposal. The proposal addresses SMTs
and a different HeaderConverter default, but I'll be updating my PR (
https://github.com/apache/kafka/pull/4319) soon. Feedback is very welcome!

Best regards,

Randall

On Thu, Dec 14, 2017 at 10:20 AM, Randall Hauch <rha...@gmail.com> wrote:

> Hi, Michael. Yeah, I liked your PR a lot, and there definitely are a lot
> of similarities. But here are the more significant differences from my
> perspective (none of which are really that big):
>
> First, your `SubjectConverter` and my `HeaderConverter` are pretty similar
> -- mine is just more closely tied to headers. Also, we used slightly
> different approaches to dealing with the fact that the `Converter`
> interface does not extend `Configurable`, which Connect now uses for
> transforms, connectors, etc. And our implementations take very different
> approaches (see below).
>
> Second, I tried to follow Kafka client's `Header` and `Headers` interfaces
> (at least in concept) so that ConnectRecord has a `Headers` rather than a
> list of headers. It's a minor distinction, but I do think it's important
> for future-proofing to have an interface for the collection to abstract and
> encapsulate logic/behavior as well as leaving room for alternative
> implementations. It also a convenient place to add methods for source
> connectors and SMTs to easily add/modify/remove/transform headers.
>
> Third, our "header converter" implementations are where most of the
> differences lie. Again, this goes back to my assertion that we should make
> the serdes and cast/conversion orthogonal. If we allow sink connectors and
> SMTs to get header values in the type they want (e.g.,
> `Header.valueAsFloat()`), then we can tolerate a bit more variation in how
> the header values are serialized and deserialized, since the serdes
> mechanism doesn't have to get the type exactly right for the sink connector
> and SMT. My `SimpleHeaderConverter` serializes all of the types to strings,
> but during deserialization it attempts to infer the schemas (easy for
> primitive values, a bit harder for structured types). IIUC, neither your
> approach or mine is really able to maintain Struct schemas, but IMO we can
> add that over time with improved/different header converters if people
> really need it.
>
> Fourth, we use different defaults for the serdes implementation. I dislike
> the StringConverter because it converts everything to strings that are then
> difficult to convert back to the original form, especially for the
> structured types. This is why I created the `SimpleHeaderConverter`
> implementation, which doesn't need explicit configuration or explicit
> mapping of header names to types, and thus can be used as the default.
>
> Finally, while I hope that `SimpleHeaderConverter` and its schema
> inference will work most of the time with no special configuration,
> especially since the `Header` interface makes it easy to cast/convert in
> sink connectors and SMTs, I do like how your `PrimativeSubjectConverter`
> allows the user to manually control how the values are serialized. I
> thought of doing something similar, but I think that can be done at a later
> time if/when needed.
>
> I hope that makes sense.
>
> Randall
>
> On Tue, Dec 12, 2017 at 11:35 PM, Michael André Pearce <
> michael.andre.pea...@me.com> wrote:
>
>> Hi Randall
>>
>> What’s the main difference between this and my earlier alternative option
>> PR
>> https://github.com/apache/kafka/pull/2942/files
>>
>> If none then +1.
>> From what I can tell the only difference I make is the headers you
>> support being able to cross convert primitive types eg if value after
>> conversion is integer you can still ask for float and it will type concert
>> if possible.
>>
>> Cheers
>> Mike
>>
>>
>> Sent from my iPhone
>>
>> > On 13 Dec 2017, at 01:36, Randall Hauch <rha...@gmail.com> wrote:
>> >
>> > Trying to revive this after several months of inactivity
>> >
>> > I've spent quite a bit of time evaluating the current KIP-145 proposal
>> and
>> > several of the suggested PRs. The original KIP-145 proposal is
>> relatively
>> > minimalist (which is very nice), and it adopts Kafka's approach to
>> headers
>> > where header keys are strings and header values are byte arrays. IMO,
>> this
>> > places too much responsibility on the connector developers to know how
>> to
>> > serialize and deserialize, which means that it's going to be difficult
>> to
>> > assemble into pipelines connectors and stream processors that make
>&g

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-05 Thread Randall Hauch
Sönke, I'm happy with the current proposal.

Ewen, the proposal allows any characters in the name as long as they are
properly escaped/encoded. That seems to adhere to the robustness principle.
The only exception is that the proposal trims leading and trailing
whitespace characters in an attempt to reduce user errors. Can you please
clarify that you're okay with this behavior? I agree that technically we
can (and currently do) support whitespace-only names, but users have
reported this as problematic, and it also would be confusing for most user
interfaces.

Best regards,

Randall

On Thu, Jan 4, 2018 at 10:31 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Very late to the game here, but a few thoughts:
>
> 1. Regarding whether KIP is necessary, I don't mind doing it for
> documentation sake, but I would classify any mishandling of connector names
> here as a bug. Which doesn't require a KIP to fix.
>
> 2. For support of characters, Kafka has some history of just being
> restrictive (e.g., see topic name restrictions), but I personally disagree
> with this approach. I think it is better to be liberal in what we accept
> and just document limitations. I think our default should be to accept any
> user input and document why we can't handle certain inputs and how the user
> should adapt if we can't. In general I try to work under the robustness
> principle: *Be conservative in what you do, be liberal in what you accept
> from others*
>
> 3. Related to 2, there were some cases like whitespace-only connector
> names. This seems extremely weird and not critical, so I'm fine not
> supporting it officially, but technically I don't see any reason it
> shouldn't be supported with any appropriate escaping (i.e. what would it
> break for us?).
>
> But in general, I think just being more explicit about expectations is
> great and it'd be great to set baseline expectations.
>
> -Ewen
>
>
>
> On Mon, Nov 20, 2017 at 12:33 AM, Sönke Liebau <
> soenke.lie...@opencore.com.invalid> wrote:
>
> > @Randall: are you happy with the KIP as it stands so I can call for a
> vote,
> > or are there any outstanding items still to discuss?
> >
> > Same question to anybody else who'd like to participate of course :)
> >
> > On Thu, Nov 16, 2017 at 5:35 PM, Sönke Liebau <
> soenke.lie...@opencore.com>
> > wrote:
> >
> > > Sounds good. I've added a few sentences to this effect to the KIP.
> > >
> > > On Thu, Nov 16, 2017 at 5:02 PM, Randall Hauch <rha...@gmail.com>
> wrote:
> > >
> > >> Nice job updating the KIP. The PR (
> > >> https://github.com/apache/kafka/pull/2755/files) for the proposed
> > >> implementation does prevent names from being empty, and it trims
> > >> whitespace
> > >> from the name only when creating a new connector. However, the KIP's
> > >> "Proposed Change" section should probably be very clear about this,
> and
> > >> the
> > >> migration section should address how a connector that was created with
> > >> leading and/or trailing whitespace characters will still be able to be
> > >> updated and deleted. I think that decreases the likelihood of this
> > change
> > >> negatively impacting existing users. Basically, going forward, the
> names
> > >> of
> > >> new connectors will be trimmed.
> > >>
> > >> WDYT?
> > >>
> > >> On Thu, Nov 16, 2017 at 9:32 AM, Sönke Liebau <
> > >> soenke.lie...@opencore.com.invalid> wrote:
> > >>
> > >> > I've added some more detail to the KIP [1] around current scenarios
> > that
> > >> > might break in the future. I actually came up with a second
> limitation
> > >> that
> > >> > we'd impose on users and also documented this.
> > >> >
> > >> > Let me know what you think.
> > >> >
> > >> > Kind regards,
> > >> > Sönke
> > >> >
> > >> > [1]
> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >> > 212%3A+Enforce+set+of+legal+characters+for+connector+names
> > >> >
> > >> >
> > >> > On Thu, Nov 16, 2017 at 2:59 PM, Sönke Liebau <
> > >> soenke.lie...@opencore.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Randall,
> > >> > >
> > >> > > I had mentioned this edge case in the KIP, but will add some
> further
> > >> > > detail to further clarify all cha

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-08 Thread Randall Hauch
So are we ready to start a vote on this KIP?

On Sat, Jan 6, 2018 at 6:00 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> re: whitespace characters, I'm fine with the restriction since I don't see
> it becoming an issue in practice. I just don't see any reason to restrict
> it so it seems like we're going out of our way and doing extra work to be
> restrictive, but without clear motivation.
>
> In general my default approach (without context of a specific system) would
> be to accept anything that we can encode in UTF-8 and only apply
> restrictions where it becomes necessary (e.g. we need to define a delimiter
> for some reason). The constraints of URLs introduce some complexity (you
> need escaping), but probably generally still allow this. If I can use an
> emoji when naming things, then I'm probably happy :) Whitespace characters
> definitely have some other issues (e.g. you can have non-visible whitespace
> which obscures which connector you're actually working with), but despite
> the JIRA linked, I wasn't really convinced they need special handling. It
> seems like a really weird issue to encounter in the first place.
>
> -Ewen
>
> On Fri, Jan 5, 2018 at 8:10 AM, Randall Hauch <rha...@gmail.com> wrote:
>
> > Sönke, I'm happy with the current proposal.
> >
> > Ewen, the proposal allows any characters in the name as long as they are
> > properly escaped/encoded. That seems to adhere to the robustness
> principle.
> > The only exception is that the proposal trims leading and trailing
> > whitespace characters in an attempt to reduce user errors. Can you please
> > clarify that you're okay with this behavior? I agree that technically we
> > can (and currently do) support whitespace-only names, but users have
> > reported this as problematic, and it also would be confusing for most
> user
> > interfaces.
> >
> > Best regards,
> >
> > Randall
> >
> > On Thu, Jan 4, 2018 at 10:31 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > Very late to the game here, but a few thoughts:
> > >
> > > 1. Regarding whether KIP is necessary, I don't mind doing it for
> > > documentation sake, but I would classify any mishandling of connector
> > names
> > > here as a bug. Which doesn't require a KIP to fix.
> > >
> > > 2. For support of characters, Kafka has some history of just being
> > > restrictive (e.g., see topic name restrictions), but I personally
> > disagree
> > > with this approach. I think it is better to be liberal in what we
> accept
> > > and just document limitations. I think our default should be to accept
> > any
> > > user input and document why we can't handle certain inputs and how the
> > user
> > > should adapt if we can't. In general I try to work under the robustness
> > > principle: *Be conservative in what you do, be liberal in what you
> accept
> > > from others*
> > >
> > > 3. Related to 2, there were some cases like whitespace-only connector
> > > names. This seems extremely weird and not critical, so I'm fine not
> > > supporting it officially, but technically I don't see any reason it
> > > shouldn't be supported with any appropriate escaping (i.e. what would
> it
> > > break for us?).
> > >
> > > But in general, I think just being more explicit about expectations is
> > > great and it'd be great to set baseline expectations.
> > >
> > > -Ewen
> > >
> > >
> > >
> > > On Mon, Nov 20, 2017 at 12:33 AM, Sönke Liebau <
> > > soenke.lie...@opencore.com.invalid> wrote:
> > >
> > > > @Randall: are you happy with the KIP as it stands so I can call for a
> > > vote,
> > > > or are there any outstanding items still to discuss?
> > > >
> > > > Same question to anybody else who'd like to participate of course :)
> > > >
> > > > On Thu, Nov 16, 2017 at 5:35 PM, Sönke Liebau <
> > > soenke.lie...@opencore.com>
> > > > wrote:
> > > >
> > > > > Sounds good. I've added a few sentences to this effect to the KIP.
> > > > >
> > > > > On Thu, Nov 16, 2017 at 5:02 PM, Randall Hauch <rha...@gmail.com>
> > > wrote:
> > > > >
> > > > >> Nice job updating the KIP. The PR (
> > > > >> https://github.com/apache/kafka/pull/2755/files) for the proposed
> > > > >> implementation does prevent names from being empty, and it trims
> > &g

Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2018-01-08 Thread Randall Hauch
Nice feedback, Ewen. Thanks!

On Thu, Jan 4, 2018 at 5:11 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Hey Jakub,
>
> Sorry for not getting to this sooner. Overall the proposal looks good to
> me, I just had a couple of questions.
>
> 1. For the configs/overrides, does this happen on a per-setting basis or if
> one override is included do we not use any of the original settings? I
> suspect that if you need to override one setting, it probably means you're
> using an entirely different config and so the latter behavior seems better
> to me. We've talked a bit about doing something similar for the
> producer/consumer security settings as well so you don't have to specify
> security configs in 3 places in your worker config.
>

Not sure if you were referring to
https://issues.apache.org/jira/browse/KAFKA-6387, but I just withdrew that
proposal (and the corresponding KIP-246) because behavior with existing
configurations was not backward compatible, so existing configs might have
very different behavior after the "inheritance" was implemented.

But regardless, I do think that in this case if you have to override one of
the settings you probably need to override multiple. So I'd be in favor of
requiring all configs to be specified in the overridden `listeners.*`
properties.


>
> 2. For using default values from the worker config, I am wondering how
> convinced we are that it will be common for them to be the same? I really
> don't have enough experience w/ these setups to know, so just a question
> here. I think the other thing to take into account here is that even though
> we're not dealing with authorization in this KIP, we will eventually want
> it for these APIs. Would we expect to be using the same principal for Kafka
> and the Connect REST API? In a case where a company has a Connect cluster
> that, e.g., an ops team manages and they are the only ones that are
> supposed to make changes, that would make sense to me. But for a setup
> where some dev team is allowed to use the REST API to create new connectors
> but the cluster is managed by an ops team, I would think the Kafka
> credentials would be different. I'm not sure how frequent each case would
> be, so I'm a bit unsure about the default of using the worker security
> configs by default. Thoughts?
>
> 3. We should probably specify the default in the table for
> rest.advertised.security.protocol because in ConfigDef if you don't
> specify
> a default value it becomes a required config. The HTTP default will
> probably need to be in there anyway.
>
> 4. Do we want to list the existing settings as deprecated and just move to
> using listeners for consistency? We don't need to remove them anytime soon,
> but given that the broker is doing the same, maybe we should just do that
> in this KIP?
>

Marking them as deprecated in this KIP sounds good to me.

>
> I think these are mostly small details, overall it looks like a good plan!
>

+1

Randall


>
> Thanks,
> Ewen
>
> On Tue, Oct 24, 2017 at 5:19 AM, Jakub Scholz <ja...@scholz.cz> wrote:
>
> > There has been no discussion since my last update week ago. Unless
> someone
> > has some further comments in the next 48 hours, I will start the voting
> for
> > this KIP.
> >
> > Thanks & Regards
> > Jakub
> >
> > On Tue, Oct 17, 2017 at 5:54 PM, Jakub Scholz <ja...@scholz.cz> wrote:
> >
> > > Ok, so I updated the KIP according to what we discussed. Please have a
> > > look at the updates. Two points I'm not 100% sure about:
> > >
> > > 1) Should we mark the rest.host.name and rest.port options as
> > deprecated?
> > >
> > > 2) I needed to also address the advertised hostname / port. With
> multiple
> > > listeners it is not clear anymore which one should be used. I saw as
> one
> > > option to add advertised.listeners option and some modified version of
> > > inter.broker.listener.name option to follow what is done in Kafka
> > > brokers. But for the Connect REST interface, we do not advertise the
> > > address to the clients like in Kafka broker. So we only need to tell
> > other
> > > workers how to connect - and for that we need only one advertised
> > address.
> > > So I decided to reuse the existing rest.advertised.host.name and
> > > rest.advertised.port options and add additional option
> > > rest.advertised.security.protocol to specify whether HTTP or HTTPS
> > should
> > > be used. Does this make sense to you? DO you think this is the right
> > > approach?
> > >
> > > Thanks & Regards
> > > Jakub
> > >
> > > On Mon, 

Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2018-01-17 Thread Randall Hauch
remove the field. I prefer DropHeader since it's simpler to use.


> In general I think this is the right direction for making headers work both
> flexibly but also easily in the default case.
>
> -Ewen
>
>
> On Tue, Jan 2, 2018 at 8:42 AM, Gwen Shapira <g...@confluent.io> wrote:
>
> > I got the impression that use of Strings in headers is really common, so
> > the SimpleHeaderConverter makes a lot of sense to me. Agree that this
> > introduces overhead, but perhaps simply documenting an easy
> "optimization"
> > will be enough to help those who are concerned about it? Since the
> > connector-devs decide whether they'll use the header data or not, they
> can
> > override the converter as needed.
> >
> > Gwen
> >
> >
> > On Tue, Jan 2, 2018 at 3:52 PM Randall Hauch <rha...@gmail.com> wrote:
> >
> > > There's been a bit of discussion on the PR about the choice of the
> > default
> > > header converter. The proposal currently uses the new
> > > `SimpleHeaderConverter` so that by default connector devs and users get
> > > meaningful header values by default without much work. An alternative
> is
> > to
> > > default to `ByteArrayConverter` so that by default the framework
> doesn't
> > > have to do much effort if headers aren't used/needed.
> > >
> > > Thoughts?
> > >
> > > On Tue, Dec 26, 2017 at 11:47 AM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > >
> > > > Does anyone have any thoughts about this proposal for Connect header
> > > > support?
> > > >
> > > > On Thu, Dec 21, 2017 at 4:14 PM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > > >
> > > >> All,
> > > >>
> > > >> I've updated KIP-145 to reflect my proposal. The proposal addresses
> > SMTs
> > > >> and a different HeaderConverter default, but I'll be updating my PR
> (
> > > >> https://github.com/apache/kafka/pull/4319) soon. Feedback is very
> > > >> welcome!
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Randall
> > > >>
> > > >> On Thu, Dec 14, 2017 at 10:20 AM, Randall Hauch <rha...@gmail.com>
> > > wrote:
> > > >>
> > > >>> Hi, Michael. Yeah, I liked your PR a lot, and there definitely are
> a
> > > lot
> > > >>> of similarities. But here are the more significant differences from
> > my
> > > >>> perspective (none of which are really that big):
> > > >>>
> > > >>> First, your `SubjectConverter` and my `HeaderConverter` are pretty
> > > >>> similar -- mine is just more closely tied to headers. Also, we used
> > > >>> slightly different approaches to dealing with the fact that the
> > > `Converter`
> > > >>> interface does not extend `Configurable`, which Connect now uses
> for
> > > >>> transforms, connectors, etc. And our implementations take very
> > > different
> > > >>> approaches (see below).
> > > >>>
> > > >>> Second, I tried to follow Kafka client's `Header` and `Headers`
> > > >>> interfaces (at least in concept) so that ConnectRecord has a
> > `Headers`
> > > >>> rather than a list of headers. It's a minor distinction, but I do
> > think
> > > >>> it's important for future-proofing to have an interface for the
> > > collection
> > > >>> to abstract and encapsulate logic/behavior as well as leaving room
> > for
> > > >>> alternative implementations. It also a convenient place to add
> > methods
> > > for
> > > >>> source connectors and SMTs to easily add/modify/remove/transform
> > > headers.
> > > >>>
> > > >>> Third, our "header converter" implementations are where most of the
> > > >>> differences lie. Again, this goes back to my assertion that we
> should
> > > make
> > > >>> the serdes and cast/conversion orthogonal. If we allow sink
> > connectors
> > > and
> > > >>> SMTs to get header values in the type they want (e.g.,
> > > >>> `Header.valueAsFloat()`), then we can tolerate a bit more variation
> > in
> > > how
> > > >>> the header values are serialized and deserialized, since the serdes
> > > &

Re: [DISCUSS] KIP-208: Add SSL support to Kafka Connect REST interface

2018-01-18 Thread Randall Hauch
Jakub, have you had a chance to update the KIP with the latest changes?
Would be great to start the vote today so that it's open long enough to
adopt before the deadline on Tuesday. Let me know if I can help.

On Wed, Jan 17, 2018 at 1:25 AM, Jakub Scholz  wrote:

> I have been thinking about this a bit more yesterday while updating the
> code. I think you are right, we should use only the prefixed values if at
> least one of them exists. Even I got quite easily confused what setup is
> actually used when the fields are mixed :-). Randall was also in favour of
> this approach. So I think we should go this way. I will update the KIP
> accordingly.
>
>
> > I'm fine with consistency, but maybe the thing to do here then is to
> ensure
> > that we definitely log the "effective" or "derived" config before using
> it
> > so there is at least some useful trace of what we actually used that can
> be
> > helpful in debugging.
>


Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2018-01-19 Thread Randall Hauch
t; > >>
> > > >> There are a bunch of CVEs about this, too.  Because of the (in my
> > opinion,
> > > >> mistaken) decision to allow control characters in UNIX filenames,
> even
> > > >> echoing a file name to your terminal is a security vulnerability.
> > > >>
> > > >> best,
> > > >> Colin
> > > >>
> > > >>
> > > >> >
> > > >> > In general my default approach (without context of a specific
> > system)
> > > >> would
> > > >> > be to accept anything that we can encode in UTF-8 and only apply
> > > >> > restrictions where it becomes necessary (e.g. we need to define a
> > > >> delimiter
> > > >> > for some reason). The constraints of URLs introduce some
> complexity
> > (you
> > > >> > need escaping), but probably generally still allow this. If I can
> > use an
> > > >> > emoji when naming things, then I'm probably happy :) Whitespace
> > > >> characters
> > > >> > definitely have some other issues (e.g. you can have non-visible
> > > >> whitespace
> > > >> > which obscures which connector you're actually working with), but
> > despite
> > > >> > the JIRA linked, I wasn't really convinced they need special
> > handling. It
> > > >> > seems like a really weird issue to encounter in the first place.
> > > >> >
> > > >> > -Ewen
> > > >> >
> > > >> > On Fri, Jan 5, 2018 at 8:10 AM, Randall Hauch <rha...@gmail.com>
> > wrote:
> > > >> >
> > > >> > > Sönke, I'm happy with the current proposal.
> > > >> > >
> > > >> > > Ewen, the proposal allows any characters in the name as long as
> > they
> > > >> are
> > > >> > > properly escaped/encoded. That seems to adhere to the robustness
> > > >> principle.
> > > >> > > The only exception is that the proposal trims leading and
> trailing
> > > >> > > whitespace characters in an attempt to reduce user errors. Can
> you
> > > >> please
> > > >> > > clarify that you're okay with this behavior? I agree that
> > technically
> > > >> we
> > > >> > > can (and currently do) support whitespace-only names, but users
> > have
> > > >> > > reported this as problematic, and it also would be confusing for
> > most
> > > >> user
> > > >> > > interfaces.
> > > >> > >
> > > >> > > Best regards,
> > > >> > >
> > > >> > > Randall
> > > >> > >
> > > >> > > On Thu, Jan 4, 2018 at 10:31 PM, Ewen Cheslack-Postava <
> > > >> e...@confluent.io>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Very late to the game here, but a few thoughts:
> > > >> > > >
> > > >> > > > 1. Regarding whether KIP is necessary, I don't mind doing it
> for
> > > >> > > > documentation sake, but I would classify any mishandling of
> > connector
> > > >> > > names
> > > >> > > > here as a bug. Which doesn't require a KIP to fix.
> > > >> > > >
> > > >> > > > 2. For support of characters, Kafka has some history of just
> > being
> > > >> > > > restrictive (e.g., see topic name restrictions), but I
> > personally
> > > >> > > disagree
> > > >> > > > with this approach. I think it is better to be liberal in what
> > we
> > > >> accept
> > > >> > > > and just document limitations. I think our default should be
> to
> > > >> accept
> > > >> > > any
> > > >> > > > user input and document why we can't handle certain inputs and
> > how
> > > >> the
> > > >> > > user
> > > >> > > > should adapt if we can't. In general I try to work under the
> > > >> robustness
> > > >> > > > principle: *Be conservative in what you do, be liberal in what
> > you
> > > >> accept
> > > >&g

[VOTE] KIP-145: Expose Record Headers in Kafka Connect

2018-01-19 Thread Randall Hauch
Hi everyone,

I'd like to start the voting on this KIP to add support for headers in
Connect.:

*https://cwiki.apache.org/confluence/display/KAFKA/KIP-145+-+Expose+Record+Headers+in+Kafka+Connect
*

This does add a fair number of interfaces to our public API, and defines
some behavioral changes as well.

Thanks! Your feedback is highly appreciated.

Randall


Re: [DISCUSS] KIP-242: Mask password fields in Kafka Connect REST response

2018-01-17 Thread Randall Hauch
Vincent,

Can the KIP more explicitly say that this is opt-in, and that by default
nothing will change?

Randall

On Tue, Jan 16, 2018 at 11:18 PM, Ewen Cheslack-Postava 
wrote:

> Vincent,
>
> I think with the addition of a configuration to control this for
> compatibility, people would generally be ok with it. If you want to start a
> VOTE thread, the KIP deadline is coming up and the PR looks pretty small. I
> will take a pass at reviewing the PR so we'll be ready to merge if we can
> get the KIP voted through.
>
> Thanks,
> Ewen
>
> On Fri, Jan 12, 2018 at 10:18 AM, Vincent Meng  wrote:
>
> > @Ted: The issue is kinda hard to reproduce. It's just something we
> observe
> > over time.
> >
> > @Ewen: I agree. Opt-in seems to be a good solution to me. To your
> question,
> > if there is no ConfDef that defines which fields are Passwords we can
> just
> > return the config as is.
> >
> > There is a PR for this KIP already. Comments/Discussions are welcome.
> > https://github.com/apache/kafka/pull/4269
> >
> > On Tue, Jan 2, 2018 at 8:52 PM, Ewen Cheslack-Postava  >
> > wrote:
> >
> > > Vincent,
> > >
> > > Thanks for the KIP. This is definitely an issue we know is a problem
> for
> > > some users.
> > >
> > > I think the major problem with the KIP as-is is that it makes it
> > impossible
> > > to get the original value back out of the API. This KIP probably ties
> in
> > > significantly with ideas for securing the REST API (SSL) and adding
> ACLs
> > to
> > > it. Both are things we know people want, but haven't happened yet.
> > However,
> > > it also interacts with other approaches to adding those features, e.g.
> > > layering proxies on top of the existing API (e.g. nginx, apache, etc).
> > Just
> > > doing a blanket replacement of password values with a constant would
> > likely
> > > break things for people who secure things via a proxy (and may just not
> > > allow reads of configs unless the user is authorized for the particular
> > > connector). These are the types of concerns we like to think through in
> > the
> > > compatibility section. One option to get the masking functionality in
> > > without depending on a bunch of other security improvements might be to
> > > make this configurable so users that need this (and can forgo seeing a
> > > valid config via the API) can opt-in.
> > >
> > > Regarding your individual points:
> > >
> > > * I don't think the particular value for the masked content matters
> much.
> > > Any constant indicating a password field is good. Your value seems fine
> > to
> > > me.
> > > * I don't think ConnectorInfo has enough info on its own to do proper
> > > masking. In fact, I think you need to parse the config enough to get
> the
> > > Connector-specific ConfigDef out in order to determine which fields are
> > > Password fields. I would probably try to push this to be as central as
> > > possible, maybe adding a method to AbstractHerder that can get configs
> > with
> > > a boolean indicating whether they need to have sensitive fields
> removed.
> > > That method could deal with parsing the config to get the right
> > connector,
> > > getting the connector config, and then sanitizing any configs that are
> > > sensitive. We could have this in one location, then have the relevant
> > REST
> > > APIs just use the right flag to determine if they get sanitized or
> > > unsanitized data.
> > >
> > > That second point raises another interesting point -- what happens if
> the
> > > connector configuration references a connector which the worker serving
> > the
> > > REST request *does not know about*? In that case, there will be no
> > > corresponding ConfigDef that defines which fields are Passwords and
> need
> > to
> > > be sensitized. Does it return an error? Or just return the config as
> is?
> > >
> > > -Ewen
> > >
> > > On Thu, Dec 28, 2017 at 3:34 AM, Ted Yu  wrote:
> > >
> > > > For the last point you raised, can you come up with a unit test that
> > > shows
> > > > what you observed ?
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Dec 18, 2017 at 11:14 AM, Vincent Meng 
> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I've created KIP-242, a proposal to secure credentials in kafka
> > connect
> > > > > rest endpoint.
> > > > >
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 242%3A+Mask+password+in+Kafka+Connect+Rest+API+response
> > > > >
> > > > > Here are something I'd like to discuss:
> > > > >
> > > > >- The "masked" value is set to "*" (9 stars) currently.
> > It's
> > > > an
> > > > >arbitrary value I picked. Are there any better options?
> > > > >- The proposal change is in the
> > > > >*org.apache.kafka.connect.runtime.rest.resources.
> > > ConnectorsResource*
> > > > >class, where before the response is returned we go through
> config
> > > and
> > > > > mask
> > > > >the password. This has been proven 

Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2018-01-17 Thread Randall Hauch
Okay, I made a few more updates to the KIP:

- Not all headers have a raw value, so exposing a method to get the raw
value is not as useful as I thought. Removed the `Header.rawValue()` method.
- The `Value.convertToX(...)` methods have to take both the value and the
original schema. This makes them harder to use, and quite a bit harder than
the Header conversion methods mentioned earlier. These methods are still
reusable outside of headers, though.

I've pushed my latest changes to the PR, so that it reflects the current
KIP proposal (minus the proposed header-related SMTs):
https://github.com/apache/kafka/pull/4319

Still looking for feedback on my previous email and these updates.

Randall

On Wed, Jan 17, 2018 at 10:57 AM, Randall Hauch <rha...@gmail.com> wrote:

> Ewen and Michael,
>
> Thanks for the feedback. I've updated the KIP based upon your feedback.
> Detailed responses inline below.
>
> Randall
>
> On Tue, Jan 2, 2018 at 3:44 PM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
>> A few thoughts, mostly just details:
>>
>> * Is the SchemaAndValue return type from Headers necessary? We needed to
>> use that in Converters, but otherwise I believe no public API has to use
>> that type. If possible I think it is better to avoid making Connector
>> developers aware of that type.
>>
>
> I was not aware the SchemaAndValue was not really part of the public API.
> I've removed it from the Header interface in the proposal.
>
>
>> * For SimpleHeaderConverter, if we encounter a bare byte array (i.e. not
>> within some other structure), should it just get written directly similar
>> to how strings are handled? I guess the problem with this is that you then
>> either don't know how to decode or might get a different type (e.g. if the
>> bytes were utf-8 compatible, they'd parse as a string). But I'm not sure
>> many people will expect the current format.
>> * Also, did you mean utf-8 there or something like base64? utf-8 might not
>> handle all byte arrays.
>>
>
> Right, I originally used UTF-8 rather than base 64 so that it can be
> converted to/from a string using the normal String methods. I agree base64
> would be more universal, but it'd be harder to parse.
>
> I added a Header.rawValue() method that returns the raw byte[] form of the
> header value. If we stick with UTF-8 encoded strings for BYTES, this may
> provide an out if people want the raw version.
>
>
>> * Header.with and Header.rename don't seem like they're going to be
>> particularly useful or common, what's the expected use case for these?
>> We're getting a lot of new API surface area here, so I think it'd be good
>> to try to keep it to the necessities and most valuable extras.
>> * Header.valueAsType seems like it doesn't need to be exposed publicly
>>
>
> Agreed. Removed.
>
>
>> * How much of the conversion stuff should be in the Header class vs as
>> generic utilities available in another class. Having them in the Header
>> API
>> makes it obvious they are available and headers seem like they may be the
>> most common use case. But none of that functionality is really specific to
>> headers and seems like it could be useful in writing connectors that
>> robustly handle different formats (e.g. keys might be a good example of
>> something you want to preserve, but a connector could easily encounter
>> int,
>> long, and string keys under very common circumstances).
>>
>
> Per your suggestion and Michael's, I moved these to static methods in a
> new Values class. They do exactly the same thing, but they're reusable
> outside of headers.
>
>
>> * In the Headers class, why deviate from the naming used in the core
>> Headers class? Specifically, at least allWithName and lastWithName are
>> different.
>>
>
> Because `lastHeader(String)` seemed less readable, whereas
> `lastWithName(String)` results in code that is IMO easier to read. Also, I
> didn't see the need to mirror the core's interface exactly since we're
> already deviating from it to some degree. I don't feel strongly about it.
>
>
>> * Headers.apply - this seems like a departure from other APIs that don't
>> have map-like functionality. Is the reason for this to avoid allocating a
>> new Headers object or do you get a new one out? I think we need to take
>> care when doing this since if we mutate the existing Headers object, then
>> connectors that may allocate a single Headers object and use it repeatedly
>> would see it changing out from under them (and something like
>> prefixing/suffixing a header as part of a transform could result in it
>> being done repeatedly).

Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2018-01-26 Thread Randall Hauch
The KIP deadline for 1.1 has already passed, but I'd like to restart this
discussion so that we make the next release. I've not yet addressed the
previous comment about *existing* topics, but I'll try to do that over the
next few weeks. Any other comments/suggestions/questions?

Best regards,

Randall

On Thu, Oct 5, 2017 at 12:13 AM, Randall Hauch <rha...@gmail.com> wrote:

> Oops. Yes, I meant “replication factor”.
>
> > On Oct 4, 2017, at 7:18 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> >
> > Randall:
> > bq. AdminClient currently allows changing the replication factory.
> >
> > By 'replication factory' did you mean 'replication factor' ?
> >
> > Cheers
> >
> >> On Wed, Oct 4, 2017 at 9:58 AM, Randall Hauch <rha...@gmail.com> wrote:
> >>
> >> Currently the KIP's scope is only topics that don't yet exist, and we
> have
> >> to cognizant of race conditions between tasks with the same connector. I
> >> think it is worthwhile to consider whether the KIP's scope should
> expand to
> >> also address *existing* partitions, though it may not be appropriate to
> >> have as much control when changing the topic settings for an existing
> >> topic. For example, changing the number of partitions (which the KIP
> >> considers a "topic-specific setting" even though technically it is not)
> >> shouldn't be done blindly due to the partitioning impacts, and IIRC you
> >> can't reduce them (which we could verify before applying). Also, I don't
> >> think the AdminClient currently allows changing the replication
> factory. I
> >> think changing the topic configs is less problematic both from what
> makes
> >> sense for connectors to verify/change and from what the AdminClient
> >> supports.
> >>
> >> Even if we decide that it's not appropriate to change the settings on an
> >> existing topic, I do think it's advantageous to at least notify the
> >> connector (or task) prior to the first record sent to a given topic so
> that
> >> the connector can fail or issue a warning if it doesn't meet its
> >> requirements.
> >>
> >> Best regards,
> >>
> >> Randall
> >>
> >> On Wed, Oct 4, 2017 at 12:52 AM, Stephane Maarek <
> >> steph...@simplemachines.com.au> wrote:
> >>
> >>> Hi Randall,
> >>>
> >>> Thanks for the KIP. I like it
> >>> What happens when the target topic is already created but the configs
> do
> >>> not match?
> >>> i.e. wrong RF, num partitions, or missing / additional configs? Will
> you
> >>> attempt to apply the necessary changes or throw an error?
> >>>
> >>> Thanks!
> >>> Stephane
> >>>
> >>>
> >>> On 24/5/17, 5:59 am, "Mathieu Fenniak" <mathieu.fenn...@replicon.com>
> >>> wrote:
> >>>
> >>>Ah, yes, I see you a highlighted part that should've made this clear
> >>>to me the first read. :-)  Much clearer now!
> >>>
> >>>By the way, enjoyed your Debezium talk in NYC.
> >>>
> >>>Looking forward to this Kafka Connect change; it will allow me to
> >>>remove a post-deployment tool that I hacked together for the purpose
> >>>of ensuring auto-created topics have the right config.
> >>>
> >>>Mathieu
> >>>
> >>>
> >>>On Tue, May 23, 2017 at 11:38 AM, Randall Hauch <rha...@gmail.com>
> >>> wrote:
> >>>> Thanks for the quick feedback, Mathieu. Yes, the first
> >> configuration
> >>> rule
> >>>> whose regex matches will be applied, and no other rules will be
> >>> used. I've
> >>>> updated the KIP to try to make this more clear, but let me know if
> >>> it's
> >>>> still not clear.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Randall
> >>>>
> >>>> On Tue, May 23, 2017 at 10:07 AM, Mathieu Fenniak <
> >>>> mathieu.fenn...@replicon.com> wrote:
> >>>>
> >>>>> Hi Randall,
> >>>>>
> >>>>> Awesome, very much looking forward to this.
> >>>>>
> >>>>> It isn't 100% clear from the KIP how multiple config-based rules
> >>> would
> >>>>> be applied; it looks like the first configuration rule whose regex
> >>>>> ma

Re: [VOTE] KIP-174 Deprecate and remove internal converter configs in WorkerConfig

2018-01-12 Thread Randall Hauch
+1 (non-binding)

On Mon, Jan 8, 2018 at 7:09 PM, Gwen Shapira  wrote:

> +1 binding
>
> On Mon, Jan 8, 2018 at 4:59 PM Ewen Cheslack-Postava 
> wrote:
>
> > +1 binding. Thanks for the KIP!
> >
> > -Ewen
> >
> > On Mon, Jan 8, 2018 at 8:34 AM, Ted Yu  wrote:
> >
> > > +1
> > >
> > > On Mon, Jan 8, 2018 at 4:27 AM, UMESH CHAUDHARY 
> > > wrote:
> > >
> > > > Hello All,
> > > > Since there are no outstanding comments on this, so I'd like to
> start a
> > > > vote.
> > > >
> > > > Please find the KIP here
> > > >  > > > 174+-+Deprecate+and+remove+internal+converter+configs+in+
> WorkerConfig>
> > > > and
> > > > the related JIRA here <
> > https://issues.apache.org/jira/browse/KAFKA-5540
> > > >.
> > > >
> > > > The KIP suggests to deprecate and remove the configs:
> > > > internal.key.converter, internal.value.converter
> > > >
> > > > Appreciate your comments.
> > > >
> > > > Regards,
> > > > Umesh
> > > >
> > >
> >
>


Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2017-12-26 Thread Randall Hauch
Does anyone have any thoughts about this proposal for Connect header
support?

On Thu, Dec 21, 2017 at 4:14 PM, Randall Hauch <rha...@gmail.com> wrote:

> All,
>
> I've updated KIP-145 to reflect my proposal. The proposal addresses SMTs
> and a different HeaderConverter default, but I'll be updating my PR (
> https://github.com/apache/kafka/pull/4319) soon. Feedback is very welcome!
>
> Best regards,
>
> Randall
>
> On Thu, Dec 14, 2017 at 10:20 AM, Randall Hauch <rha...@gmail.com> wrote:
>
>> Hi, Michael. Yeah, I liked your PR a lot, and there definitely are a lot
>> of similarities. But here are the more significant differences from my
>> perspective (none of which are really that big):
>>
>> First, your `SubjectConverter` and my `HeaderConverter` are pretty
>> similar -- mine is just more closely tied to headers. Also, we used
>> slightly different approaches to dealing with the fact that the `Converter`
>> interface does not extend `Configurable`, which Connect now uses for
>> transforms, connectors, etc. And our implementations take very different
>> approaches (see below).
>>
>> Second, I tried to follow Kafka client's `Header` and `Headers`
>> interfaces (at least in concept) so that ConnectRecord has a `Headers`
>> rather than a list of headers. It's a minor distinction, but I do think
>> it's important for future-proofing to have an interface for the collection
>> to abstract and encapsulate logic/behavior as well as leaving room for
>> alternative implementations. It also a convenient place to add methods for
>> source connectors and SMTs to easily add/modify/remove/transform headers.
>>
>> Third, our "header converter" implementations are where most of the
>> differences lie. Again, this goes back to my assertion that we should make
>> the serdes and cast/conversion orthogonal. If we allow sink connectors and
>> SMTs to get header values in the type they want (e.g.,
>> `Header.valueAsFloat()`), then we can tolerate a bit more variation in how
>> the header values are serialized and deserialized, since the serdes
>> mechanism doesn't have to get the type exactly right for the sink connector
>> and SMT. My `SimpleHeaderConverter` serializes all of the types to strings,
>> but during deserialization it attempts to infer the schemas (easy for
>> primitive values, a bit harder for structured types). IIUC, neither your
>> approach or mine is really able to maintain Struct schemas, but IMO we can
>> add that over time with improved/different header converters if people
>> really need it.
>>
>> Fourth, we use different defaults for the serdes implementation. I
>> dislike the StringConverter because it converts everything to strings that
>> are then difficult to convert back to the original form, especially for the
>> structured types. This is why I created the `SimpleHeaderConverter`
>> implementation, which doesn't need explicit configuration or explicit
>> mapping of header names to types, and thus can be used as the default.
>>
>> Finally, while I hope that `SimpleHeaderConverter` and its schema
>> inference will work most of the time with no special configuration,
>> especially since the `Header` interface makes it easy to cast/convert in
>> sink connectors and SMTs, I do like how your `PrimativeSubjectConverter`
>> allows the user to manually control how the values are serialized. I
>> thought of doing something similar, but I think that can be done at a later
>> time if/when needed.
>>
>> I hope that makes sense.
>>
>> Randall
>>
>> On Tue, Dec 12, 2017 at 11:35 PM, Michael André Pearce <
>> michael.andre.pea...@me.com> wrote:
>>
>>> Hi Randall
>>>
>>> What’s the main difference between this and my earlier alternative
>>> option PR
>>> https://github.com/apache/kafka/pull/2942/files
>>>
>>> If none then +1.
>>> From what I can tell the only difference I make is the headers you
>>> support being able to cross convert primitive types eg if value after
>>> conversion is integer you can still ask for float and it will type concert
>>> if possible.
>>>
>>> Cheers
>>> Mike
>>>
>>>
>>> Sent from my iPhone
>>>
>>> > On 13 Dec 2017, at 01:36, Randall Hauch <rha...@gmail.com> wrote:
>>> >
>>> > Trying to revive this after several months of inactivity
>>> >
>>> > I've spent quite a bit of time evaluating the current KIP-145 proposal
>>> and
>>> > severa

Re: [DISCUSS] KIP 145 - Expose Record Headers in Kafka Connect

2018-01-02 Thread Randall Hauch
There's been a bit of discussion on the PR about the choice of the default
header converter. The proposal currently uses the new
`SimpleHeaderConverter` so that by default connector devs and users get
meaningful header values by default without much work. An alternative is to
default to `ByteArrayConverter` so that by default the framework doesn't
have to do much effort if headers aren't used/needed.

Thoughts?

On Tue, Dec 26, 2017 at 11:47 AM, Randall Hauch <rha...@gmail.com> wrote:

> Does anyone have any thoughts about this proposal for Connect header
> support?
>
> On Thu, Dec 21, 2017 at 4:14 PM, Randall Hauch <rha...@gmail.com> wrote:
>
>> All,
>>
>> I've updated KIP-145 to reflect my proposal. The proposal addresses SMTs
>> and a different HeaderConverter default, but I'll be updating my PR (
>> https://github.com/apache/kafka/pull/4319) soon. Feedback is very
>> welcome!
>>
>> Best regards,
>>
>> Randall
>>
>> On Thu, Dec 14, 2017 at 10:20 AM, Randall Hauch <rha...@gmail.com> wrote:
>>
>>> Hi, Michael. Yeah, I liked your PR a lot, and there definitely are a lot
>>> of similarities. But here are the more significant differences from my
>>> perspective (none of which are really that big):
>>>
>>> First, your `SubjectConverter` and my `HeaderConverter` are pretty
>>> similar -- mine is just more closely tied to headers. Also, we used
>>> slightly different approaches to dealing with the fact that the `Converter`
>>> interface does not extend `Configurable`, which Connect now uses for
>>> transforms, connectors, etc. And our implementations take very different
>>> approaches (see below).
>>>
>>> Second, I tried to follow Kafka client's `Header` and `Headers`
>>> interfaces (at least in concept) so that ConnectRecord has a `Headers`
>>> rather than a list of headers. It's a minor distinction, but I do think
>>> it's important for future-proofing to have an interface for the collection
>>> to abstract and encapsulate logic/behavior as well as leaving room for
>>> alternative implementations. It also a convenient place to add methods for
>>> source connectors and SMTs to easily add/modify/remove/transform headers.
>>>
>>> Third, our "header converter" implementations are where most of the
>>> differences lie. Again, this goes back to my assertion that we should make
>>> the serdes and cast/conversion orthogonal. If we allow sink connectors and
>>> SMTs to get header values in the type they want (e.g.,
>>> `Header.valueAsFloat()`), then we can tolerate a bit more variation in how
>>> the header values are serialized and deserialized, since the serdes
>>> mechanism doesn't have to get the type exactly right for the sink connector
>>> and SMT. My `SimpleHeaderConverter` serializes all of the types to strings,
>>> but during deserialization it attempts to infer the schemas (easy for
>>> primitive values, a bit harder for structured types). IIUC, neither your
>>> approach or mine is really able to maintain Struct schemas, but IMO we can
>>> add that over time with improved/different header converters if people
>>> really need it.
>>>
>>> Fourth, we use different defaults for the serdes implementation. I
>>> dislike the StringConverter because it converts everything to strings that
>>> are then difficult to convert back to the original form, especially for the
>>> structured types. This is why I created the `SimpleHeaderConverter`
>>> implementation, which doesn't need explicit configuration or explicit
>>> mapping of header names to types, and thus can be used as the default.
>>>
>>> Finally, while I hope that `SimpleHeaderConverter` and its schema
>>> inference will work most of the time with no special configuration,
>>> especially since the `Header` interface makes it easy to cast/convert in
>>> sink connectors and SMTs, I do like how your `PrimativeSubjectConverter`
>>> allows the user to manually control how the values are serialized. I
>>> thought of doing something similar, but I think that can be done at a later
>>> time if/when needed.
>>>
>>> I hope that makes sense.
>>>
>>> Randall
>>>
>>> On Tue, Dec 12, 2017 at 11:35 PM, Michael André Pearce <
>>> michael.andre.pea...@me.com> wrote:
>>>
>>>> Hi Randall
>>>>
>>>> What’s the main difference between this and my earlier alternative
>>>> option PR
>>>> https://githu

Re: [DISCUSS] KIP-318: Make Kafka Connect Source idempotent

2018-06-20 Thread Randall Hauch
Thanks for starting this conversation, Stephane. I have a few questions.

The worker already accepts nearly all producer properties already, and all
`producer.*` properties override any hard-coded properties defined in
`Worker.java`. So isn't it currently possible for a user to define these
properties in their worker configuration if they want?

Second, wouldn't this change the default behavior for existing worker
configurations that have not overridden these properties? IOW, we would
need to address the migration path to ensure backward compatibility.

Third, the KIP mentions but does not really address the problem of running
workers against pre-1.0 Kafka clusters. That definitely is something that
happens frequently, so what is the planned approach for addressing this
compatibility concern?

All of these factors are likely why this has not yet been addressed to
date: it's already possible for users to enable this feature, but doing it
by default has compatibility concerns.

Thoughts?

Best regards,

Randall


On Wed, Jun 20, 2018 at 1:17 AM, Stephane Maarek <
steph...@simplemachines.com.au> wrote:

> KIP link:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 318%3A+Make+Kafka+Connect+Source+idempotent
>
>
> By looking at the code, it seems Worker.java is where the magic happens,
> but do we need to also operate changes to KafkaBasedLog.java (~line 241) ?
>
> Love to hear your thoughts!
>


Re: [VOTE] KIP-316: Command-line overrides for ConnectDistributed worker properties

2018-06-20 Thread Randall Hauch
IMO we should not request a vote without additional time for discussion.

Best regards,

Randall

On Wed, Jun 20, 2018 at 7:29 AM, Jakub Scholz  wrote:

> +1 (non-binding)
>
> On Mon, Jun 18, 2018 at 8:42 PM Kevin Lafferty 
> wrote:
>
> > Hi all,
> >
> > I got a couple notes of interest on the discussion thread and no
> > objections, so I'd like to kick off a vote. This is a very small change.
> >
> > KIP:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 316%3A+Command-line+overrides+for+ConnectDistributed+worker+properties
> >
> > Jira: https://issues.apache.org/jira/browse/KAFKA-7060
> >
> > GitHub PR: https://github.com/apache/kafka/pull/5234
> >
> > -Kevin
> >
>


Re: [DISCUSS] KIP-316: Command-line overrides for ConnectDistributed worker properties

2018-06-20 Thread Randall Hauch
My only concern with this proposal is that it adds yet another way to
specify configuration properties, making it a bit more difficult to track
down how/whether a configuration property has been set. Configuring Kafka
Connect is already too challenging, so we need to be very clear that this
additional complexity is worth the price. IMO the KIP should explicitly
address this.

Also, any reason why this KIP singles out the Connect distributed worker
and doesn't address the standalone worker?

And finally, the KIP does not clearly explain the command line behavior. It
simply says:

An additional command-line argument, --override key=value, will be
accepted for ConnectDistributed.

which makes it seem like only a single key-value pair can be specified.
Clearly this is not the intention, but is `--override` used once and
followed by multiple key-value pairs, or is `--override` required for every
key-value pair? Does it need to follow the property file reference, or can
the overrides precede or be interspersed around the property file
reference? Does this happen to exactly match the broker command line
behavior? The KIP should be very clear about the behavior and should fully
specify all of these details.

Best regards,

Randall


On Fri, Jun 15, 2018 at 11:14 AM, Jakub Scholz  wrote:

> I think this makes perfect sense. Thanks for opening this KIP.
>
> Thanks & Regards
> Jakub
>
> On Fri, Jun 15, 2018 at 2:10 AM Kevin Lafferty 
> wrote:
>
> > Hi all,
> >
> > I created KIP-316, and I would like to initiate discussion.
> >
> > The KIP is here:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 316%3A+Command-line+overrides+for+ConnectDistributed+worker+properties
> >
> > Thanks,
> > Kevin
> >
>


Re: [DISCUSS] KIP-158: Kafka Connect should allow source connectors to set topic-specific settings for new topics

2018-08-21 Thread Randall Hauch
Okay, after much delay let's try this again for AK 2.1. Has anyone found
any concerns? Stephane suggested that we allow updating topic
configurations (everything but partition count). I'm unconvinced that it's
worth the additional complexity in the implementation and the documentation
to explain the behavior. Changing several of the topic-specific
configurations have significant impact on broker behavior / functionality,
so IMO we need to proceed more cautiously.

Stephane, do you have a particular use case in mind for updating topic
configurations on an existing topic?

Randall


On Fri, Jan 26, 2018 at 4:20 PM Randall Hauch  wrote:

> The KIP deadline for 1.1 has already passed, but I'd like to restart this
> discussion so that we make the next release. I've not yet addressed the
> previous comment about *existing* topics, but I'll try to do that over the
> next few weeks. Any other comments/suggestions/questions?
>
> Best regards,
>
> Randall
>
> On Thu, Oct 5, 2017 at 12:13 AM, Randall Hauch  wrote:
>
>> Oops. Yes, I meant “replication factor”.
>>
>> > On Oct 4, 2017, at 7:18 PM, Ted Yu  wrote:
>> >
>> > Randall:
>> > bq. AdminClient currently allows changing the replication factory.
>> >
>> > By 'replication factory' did you mean 'replication factor' ?
>> >
>> > Cheers
>> >
>> >> On Wed, Oct 4, 2017 at 9:58 AM, Randall Hauch 
>> wrote:
>> >>
>> >> Currently the KIP's scope is only topics that don't yet exist, and we
>> have
>> >> to cognizant of race conditions between tasks with the same connector.
>> I
>> >> think it is worthwhile to consider whether the KIP's scope should
>> expand to
>> >> also address *existing* partitions, though it may not be appropriate to
>> >> have as much control when changing the topic settings for an existing
>> >> topic. For example, changing the number of partitions (which the KIP
>> >> considers a "topic-specific setting" even though technically it is not)
>> >> shouldn't be done blindly due to the partitioning impacts, and IIRC you
>> >> can't reduce them (which we could verify before applying). Also, I
>> don't
>> >> think the AdminClient currently allows changing the replication
>> factory. I
>> >> think changing the topic configs is less problematic both from what
>> makes
>> >> sense for connectors to verify/change and from what the AdminClient
>> >> supports.
>> >>
>> >> Even if we decide that it's not appropriate to change the settings on
>> an
>> >> existing topic, I do think it's advantageous to at least notify the
>> >> connector (or task) prior to the first record sent to a given topic so
>> that
>> >> the connector can fail or issue a warning if it doesn't meet its
>> >> requirements.
>> >>
>> >> Best regards,
>> >>
>> >> Randall
>> >>
>> >> On Wed, Oct 4, 2017 at 12:52 AM, Stephane Maarek <
>> >> steph...@simplemachines.com.au> wrote:
>> >>
>> >>> Hi Randall,
>> >>>
>> >>> Thanks for the KIP. I like it
>> >>> What happens when the target topic is already created but the configs
>> do
>> >>> not match?
>> >>> i.e. wrong RF, num partitions, or missing / additional configs? Will
>> you
>> >>> attempt to apply the necessary changes or throw an error?
>> >>>
>> >>> Thanks!
>> >>> Stephane
>> >>>
>> >>>
>> >>> On 24/5/17, 5:59 am, "Mathieu Fenniak" > >
>> >>> wrote:
>> >>>
>> >>>Ah, yes, I see you a highlighted part that should've made this
>> clear
>> >>>to me the first read. :-)  Much clearer now!
>> >>>
>> >>>By the way, enjoyed your Debezium talk in NYC.
>> >>>
>> >>>Looking forward to this Kafka Connect change; it will allow me to
>> >>>remove a post-deployment tool that I hacked together for the
>> purpose
>> >>>of ensuring auto-created topics have the right config.
>> >>>
>> >>>Mathieu
>> >>>
>> >>>
>> >>>On Tue, May 23, 2017 at 11:38 AM, Randall Hauch 
>> >>> wrote:
>> >>>> Thanks for the quick feedback, Mathieu. Yes, the first
>> >> configuration
>> >>> rule
>> >>&

  1   2   3   4   5   6   7   8   >