[GitHub] kafka pull request #1537: KAFKA-3846: include timestamp in Connect record ty...

2016-06-21 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1537

KAFKA-3846: include timestamp in Connect record types

KIP to come

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-3846

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1537.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1537


commit 2dc7e91c1ec06fcf987dda07a63acf45e1b2d13e
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-06-22T00:42:48Z

KAFKA-3846: include timestamp in Connect record types; add Builder for 
`SourceRecord`

`SinkRecord` gets `timestampType` and `timestamp`
`SourceRecord` gets `timestamp`
`SourceRecord.Builder` is the new preferred way to construct `SourceRecord`s




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-2935: Remove vestigial WorkerConfig.CLUS...

2016-05-18 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1404

KAFKA-2935: Remove vestigial WorkerConfig.CLUSTER_CONFIG



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-2935

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1404


commit 49d5c7694b40c3aaed27f2ad2fb5d0b072be1f27
Author: shikhar <shik...@schmizz.net>
Date:   2016-05-19T02:07:29Z

KAFKA-2935: Remove vestigial WorkerConfig.CLUSTER_CONFIG




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1729: Assign ConfigDef.NO_DEFAULT_VALUE as a literal, us...

2016-08-12 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1729

Assign ConfigDef.NO_DEFAULT_VALUE as a literal, use equals() for comp…

…arison rather than ==

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka config-nodefval

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1729.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1729


commit 4ff39e6ef6cc1dc94578dda0bd160d3b8376cb61
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-08-12T22:28:59Z

Assign ConfigDef.NO_DEFAULT_VALUE as a literal, use equals() for comparison 
rather than ==




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1729: Assign ConfigDef.NO_DEFAULT_VALUE as a literal, us...

2016-08-12 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1567: Minor: fix Bash shebang on vagrant/ scripts

2016-06-28 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1567

Minor: fix Bash shebang on vagrant/ scripts



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka vagrant-scripts-shebang

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1567.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1567


commit 4519806fe0de669097236f0976a3085ada611e3f
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-06-28T18:06:14Z

Minor: fix Bash shebang on vagrant/ scripts




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1745: WIP: KAFKA-4042: prevent DistributedHerder thread ...

2016-08-16 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1745

WIP: KAFKA-4042: prevent DistributedHerder thread from dying from 
connector/task lifecycle exceptions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1745


commit 715e7535d4aaae6174b3d6c1607617b382f0a8b8
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-08-16T19:06:56Z

KAFKA-4042: prevent DistributedHerder thread from dying from connector/task 
lifecycle exceptions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-19 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1745


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-19 Thread shikhar
GitHub user shikhar reopened a pull request:

https://github.com/apache/kafka/pull/1745

KAFKA-4042: prevent DistributedHerder thread from dying from connector/task 
lifecycle exceptions

- `worker.startConnector()` and `worker.startTask()` can throw (e.g. 
`ClassNotFoundException`, `ConnectException`, or any other exception arising 
from the constructor of the connector or task class when we `newInstance()`), 
so add catch blocks around those calls from the `DistributedHerder` and handle 
by invoking `onFailure()` which updates the `StatusBackingStore`.
- `worker.stopConnector()` throws `ConnectException` if start failed 
causing the connector to not be registered with the worker, so guard with 
`worker.ownsConnector()`
- `worker.stopTasks()` and `worker.awaitStopTasks()` throw 
`ConnectException` if any of them failed to start and are hence not registered 
with the worker, so guard those calls by filtering the task IDs with 
`worker.ownsTask()`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1745.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1745


commit 4bb02e610b01d7b425f5c39b435d4d7484b89ee9
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-08-17T23:29:30Z

KAFKA-4042: prevent `DistributedHerder` thread from dying from 
connector/task lifecycle exceptions

- `worker.startConnector()` and `worker.startTask()` can throw (e.g. 
`ClassNotFoundException`, `ConnectException`), so add catch blocks around those 
calls from the `DistributedHerder` and handle by invoking `onFailure()` which 
updates the `StatusBackingStore`.
- `worker.stopConnector()` throws `ConnectException` if start failed 
causing the connector to not be registered with the worker, so guard with 
`worker.ownsConnector()`
- `worker.stopTasks()` and `worker.awaitStopTasks()` throw 
`ConnectException` if any of them failed to start and are hence not registered 
with the worker, so guard those calls by filtering the task IDs with 
`worker.ownsTask()`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1815: KAFKA-4115: grow default heap size for connect-dis...

2016-09-01 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1815

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka connect-heap-opts

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1815.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1815


commit 1ceb80ed040ed0a5b988857273af5a2606f1df0b
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-09-02T04:01:27Z

KAFKA-4115: grow default heap size for connect-distributed.sh to 1G




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1800: KAFKA-4100: ensure 'fields' and 'fieldsByName' are...

2016-08-29 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1800

KAFKA-4100: ensure 'fields' and 'fieldsByName' are not null for Struct 
schemas



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4100

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1800.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1800


commit 1918046f3247d135cbeeddfbadafbe333bde2d55
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-08-29T21:55:33Z

KAFKA-4100: ensure 'fields' and 'fieldsByName' are not null for Struct 
schemas




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1969: MINOR: missing fullstop in doc for `max.partition....

2016-10-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1969

MINOR: missing fullstop in doc for `max.partition.fetch.bytes`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1969.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1969


commit 3a9568911ca226979f4058129a3f238d1f0187c1
Author: Shikhar Bhushan <shik...@schmizz.net>
Date:   2016-10-04T22:43:21Z

MINOR: missing fullstop in doc for `max.partition.fetch.bytes`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1964: KAFKA-4010: add ConfigDef toEnrichedRst() for addi...

2016-10-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1964

KAFKA-4010: add ConfigDef toEnrichedRst() for additional fields in output

followup on https://github.com/apache/kafka/pull/1696

cc @rekhajoshm 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4010

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1964.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1964


commit 630fd4c5220ca5c934492318037f8a493a305b01
Author: Joshi <rekhajo...@gmail.com>
Date:   2016-08-03T04:08:41Z

KAFKA-4010; ConfigDef.toEnrichedRst() to have grouped sections with 
dependents info

commit b7d4e80f32714de351b3af3a26e34817258be0cc
Author: Joshi <rekhajo...@gmail.com>
Date:   2016-09-08T22:24:25Z

KAFKA-4010; updated for review comments

commit 70cb9ff98f075376c1537feb8abc8fe41bea1b83
Author: Joshi <rekhajo...@gmail.com>
Date:   2016-09-09T00:59:18Z

KAFKA-4010; updated for review comments

commit 4fabee350b0a7279c891d1a291fa04c346258703
Author: Joshi <rekhajo...@gmail.com>
Date:   2016-09-15T19:22:55Z

KAFKA-4010; updated for review comments

commit fff244e6e523d6b506656fbacf252b6730d5ed98
Author: Joshi <rekhajo...@gmail.com>
Date:   2016-09-16T05:32:52Z

KAFKA-4010; updated for review comments

commit ffc35f0f5a7fd8727465cbb5e481dfabe8c6b438
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-10-04T21:18:25Z

Merge branch 'KAFKA-4010' of https://github.com/rekhajoshm/kafka into 
kafka-4010

commit d8f1d8122a2b24442bf586e6be130970cdfda016
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-10-04T21:25:02Z

tweaks and tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1968: MINOR: missing whitespace in doc for `ssl.cipher.s...

2016-10-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1968

MINOR: missing whitespace in doc for `ssl.cipher.suites`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1968.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1968


commit 44001af72e226a3d683ab8169f8816e7cdf67a49
Author: Shikhar Bhushan <shik...@schmizz.net>
Date:   2016-10-04T22:42:01Z

MINOR: missing whitespace in doc for `ssl.cipher.suites`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1872: KAFKA-4183: centralize checking for optional and d...

2016-09-16 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1872

KAFKA-4183: centralize checking for optional and default values to avoid 
bugs

Cleaner to just check once for optional & default value from the 
`convertToConnect()` function.

It also helps address an issue with conversions for logical type schemas 
that have default values and null as the included value. That test case is 
_probably_ not an issue in practice, since when using the `JsonConverter` to 
serialize a missing field with a default value, it will serialize the default 
value for the field. But in the face of JSON data streaming in from a topic 
being [generous on input, strict on 
output](http://tedwise.com/2009/05/27/generous-on-input-strict-on-output) seems 
best.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4183

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1872.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1872


commit 1e09c6431f11361e7f3a5af4c09a8174c3547669
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-09-16T23:17:40Z

KAFKA-4183: centralize checking for optional and default values to avoid 
bugs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1865: KAFKA-4173: SchemaProjector should successfully pr...

2016-09-16 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1865

KAFKA-4173: SchemaProjector should successfully project missing Struct 
field when target field is optional



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4173

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1865


commit 085e30ee5ab49981dbb9b1f353b3ea02fdadec7e
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-09-16T18:16:18Z

KAFKA-4173: SchemaProjector should successfully project missing Struct 
field when target field is optional




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1778: KAFKA-4042: Contain connector & task start/stop fa...

2016-08-24 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1778

KAFKA-4042: Contain connector & task start/stop failures within the Worker

Invoke the statusListener.onFailure() callback on start failures so that 
the statusBackingStore is updated. This involved a fix to the putSafe() 
functionality which prevented any update that was not preceded by a (non-safe) 
put() from completing, so here when a connector or task is transitioning 
directly to FAILED.

Worker start methods can still throw if the same connector name or task ID 
is already registered with the worker, as this condition should not happen.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka distherder-stayup-take4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1778.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1778


commit 050b80331f63ec71f16a644e7fa8006823c94ecc
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-08-23T23:00:10Z

KAFKA-4042: Contain connector & task start/stop failures within the Worker

Invoke the statusListener.onFailure() callback on start failures so that 
the statusBackingStore is updated. This involved a fix to the putSafe() 
functionality which prevented any update that was not preceded by a (non-safe) 
put() from completing, so here when a connector or task is transitioning 
directly to FAILED.

Worker start methods can still throw if the same connector name or task ID 
is already registered with the worker, as this condition should not happen.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1745: KAFKA-4042: prevent DistributedHerder thread from ...

2016-08-24 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1745


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1790: KAFKA-4070: implement Connect Struct.toString()

2016-08-25 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1790

KAFKA-4070: implement Connect Struct.toString()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka add-struct-tostring

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1790.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1790


commit 00a47cca8f18f9de8f69718fc41c02a2162e07c6
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-08-25T23:38:27Z

KAFKA-4070: implement Connect Struct.toString()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1968: MINOR: missing whitespace in doc for `ssl.cipher.s...

2016-11-04 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/1968


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2040: KAFKA-4161: prototype for exploring API change

2016-10-18 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2040

KAFKA-4161: prototype for exploring API change



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4161

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2040


commit ed75ad7b5618aff9fc85573748c23a5229144bc3
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-10-18T19:50:28Z

KAFKA-4161: prototype for exploring API change




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2131: Remove failing ConnectDistributedTest.test_bad_con...

2016-11-14 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2131

Remove failing ConnectDistributedTest.test_bad_connector_class

Since #1911 was merged it is hard to externally test a connector 
transitioning to FAILED state due to an initialization failure, which is what 
this test was attempting to verify.

The unit test added in #1778 already exercises exception-handling around 
Connector instantiation.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka test_bad_connector_class

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2131


commit 32a2b8e7a09f5b002e5e058d70bdcec90cb12944
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-11-15T00:50:31Z

Remove failing ConnectDistributedTest.test_bad_connector_class

Since #1911 was merged it is hard to externally test a connector 
transitioning to FAILED state due to an initialization failure, which is what 
this test was attempting to verify.

The unit test added in #1778 already exercises exception-handling around 
Connector instantiation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2182: ConfigDef experimentation - support List and Ma...

2016-11-28 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2182

ConfigDef experimentation - support List and Map<String, T>



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka configdef-experimentation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2182.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2182


commit 39a10054606c2ae9d26d7b0625c7c59210129b09
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-11-28T19:27:21Z

ConfigDef experimentation - support List and Map<String, T>




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2139: KAFKA-4161: KIP-89: Allow sink connectors to decou...

2016-11-15 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2139

KAFKA-4161: KIP-89: Allow sink connectors to decouple flush and offset 
commit



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4161-deux

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2139


commit 706bcc860fed939a00171ebf61fdab8639d99b06
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-11-15T00:29:43Z

KAFKA-4161: KIP-89: Allow sink connectors to decouple flush and offset 
commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2040: KAFKA-4161: prototype for exploring API change

2016-10-26 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/2040


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2232: HOTFIX: Fix HerderRequest.compareTo()

2016-12-08 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2232

HOTFIX: Fix HerderRequest.compareTo()

With KAFKA-3008 (#1788), the implementation does not respect the contract 
that 'sgn(x.compareTo(y)) == -sgn(y.compareTo(x))'

This fix addresses the hang with JDK8 in DistributedHerderTest.compareTo()

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka herderreq-compareto

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2232


commit 4c5a8102a340478509a1c9331bf53ceccac394fb
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-12-08T23:18:59Z

HOTFIX: Fix HerderRequest.compareTo()

With KAFKA-3008 (#1788), the implementation does not respect the contract 
that 'sgn(x.compareTo(y)) == -sgn(y.compareTo(x))'

This fix addresses the hang with JDK8 in DistributedHerderTest.compareTo()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2374: KAFKA-3209: KIP-66: more single message transforms

2017-01-13 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2374

KAFKA-3209: KIP-66: more single message transforms

WIP, in this PR I'd also like to add doc generation for transformations.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka more-smt

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2374.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2374


commit f34cc71c9931ea7ec5dd045512c623196928a2a3
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2017-01-13T20:00:31Z

SetSchemaMetadata SMT

commit 09af66b3f1861bf2a0718aaf79ca1a3bd22adcec
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2017-01-13T21:44:57Z

Support schemaless and rename HoistToStruct->Hoist; add the inverse Extract 
transform

commit 022f4920c5f09d068bbf49e47091a1333dc48be2
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2017-01-13T21:51:43Z

InsertField transform -- fix bad name for interface containing config name 
constants

commit c5260a718e2f0ade66c4607a4b9c21abda61b90c
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2017-01-13T22:01:25Z

ValueToKey SMT




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2182: ConfigDef experimentation - support List and Ma...

2017-01-11 Thread shikhar
Github user shikhar closed the pull request at:

https://github.com/apache/kafka/pull/2182


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2365: MINOR: avoid closing over both pre & post-transfor...

2017-01-12 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2365

MINOR: avoid closing over both pre & post-transform record in 
WorkerSourceTask

Followup to #2299 for KAFKA-3209

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka 2299-followup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2365.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2365


commit b0abf743d0329bbbc9620b1b0ef6acd4b3b035b3
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2017-01-13T00:32:11Z

MINOR: avoid closing over both pre & post-transform record in 
WorkerSourceTask

Followup to #2299 for KAFKA-3209




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2196: KAFKA-3910: prototype of another approach to cycli...

2016-11-30 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2196

KAFKA-3910: prototype of another approach to cyclic schemas



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka KAFKA-3910

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2196.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2196


commit 03a85d774e9d6a2feebba1ae1f50967619857040
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-12-01T01:39:22Z

KAFKA-3910: prototype of another approach to cyclic schemas




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2313: KAFKA-4575: ensure topic created before starting s...

2017-01-04 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2313

KAFKA-4575: ensure topic created before starting sink for 
ConnectDistributedTest.test_pause_resume_sink

Otherwise in this test the sink task goes through the pause/resume cycle 
with 0 assigned partitions, since the default metadata refresh interval is 
quite long

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4575

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2313.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2313


commit fdc2bb353de995ba09398d0851b934d3aee4570c
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2017-01-05T00:28:14Z

KAFKA-4575: ensure topic created before starting sink for 
ConnectDistributedTest.test_pause_resume_sink

Otherwise in this test the sink task goes through the pause/resume cycle 
with 0 assigned partitions, since the default metadata refresh interval is 
quite long




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2299: KAFKA-3209: KIP-66: single message transforms

2017-01-03 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2299

KAFKA-3209: KIP-66: single message transforms

*WIP* particularly around testing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka smt-2017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2299


commit 1178670f36d8fbdf5cbdbb2728ace7bf4f0e7300
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2017-01-03T19:21:17Z

KAFKA-3209: KIP-66: single message transforms




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2277: KAFKA-4527: task status was being updated before a...

2016-12-19 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/2277

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-4527

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2277


commit 9911f25e2b06a02c56deadf7c586b5c263a08027
Author: Shikhar Bhushan <shik...@confluent.io>
Date:   2016-12-19T21:32:26Z

KAFKA-4527: task status was being updated before actual pause/resume

h/t @ewencp for pointing out the issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[DISCUSS] KIP-65 Expose timestamps to Connect

2016-06-23 Thread Shikhar Bhushan
Kafkarati,

Here is a pretty straightforward proposal, for exposing timestamps that
were added in Kafka 0.10 to the connect framework so connectors can make
use of them:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-65%3A+Expose+timestamps+to+Connect

Appreciate your thoughts!

Shikhar


[VOTE] KIP-65 Expose timestamps to Connect

2016-06-24 Thread Shikhar Bhushan
Since there isn't much to discuss with this KIP, bringing it to a vote

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-65%3A+Expose+timestamps+to+Connect

Pull request: https://github.com/apache/kafka/pull/1537

Thanks,

Shikhar


Re: [DISCUSS] KIP-65 Expose timestamps to Connect

2016-06-26 Thread Shikhar Bhushan
I updated the KIP and PR with this change

On Fri, Jun 24, 2016 at 4:58 PM Ismael Juma <ism...@juma.me.uk> wrote:

> Yes, I agree that it would be better to be consistent. I suggest `Long` and
> `null` everywhere if feasible as it's less opaque than the magic -1L value.
> The KIP page should be updated with what you decide.
>
> Ismael
>
> On Sat, Jun 25, 2016 at 1:29 AM, Shikhar Bhushan <shik...@confluent.io>
> wrote:
>
> > Hi Ismael,
> >
> > Good point. This is down to an implementation detail, the getter was
> added
> > to the base class for `SourceRecord` and `SinkRecord`,
> > `ConnectRecord`. `SourceRecord`
> > is treating missing timestamps as null while `SinkRecord` is treating it
> as
> > the default value `Record.NO_TIMESTAMP` (-1L).
> >
> > It probably makes sense to be consistent and use either Long everywhere
> or
> > the primitive long and default values.
> >
> > Feel free to add the comment on the PR
> > <https://github.com/apache/kafka/pull/1537/files> as well and I can
> follow
> > up there :-)
> >
> > Thanks,
> >
> > Shikhar
> >
> > On Fri, Jun 24, 2016 at 3:52 PM Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi Shikhar,
> > >
> > > Thanks for the KIP. One question:
> > >
> > > SinkRecord takes a `long` timestamp, but then exposes it via a method
> > that
> > > returns `Long`. Is this correct? And if so, can you please explain the
> > > reasoning?
> > >
> > > Ismael
> > >
> > > On Thu, Jun 23, 2016 at 8:06 PM, Shikhar Bhushan <shik...@confluent.io
> >
> > > wrote:
> > >
> > > > Kafkarati,
> > > >
> > > > Here is a pretty straightforward proposal, for exposing timestamps
> that
> > > > were added in Kafka 0.10 to the connect framework so connectors can
> > make
> > > > use of them:
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-65%3A+Expose+timestamps+to+Connect
> > > >
> > > > Appreciate your thoughts!
> > > >
> > > > Shikhar
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-65 Expose timestamps to Connect

2016-06-24 Thread Shikhar Bhushan
Hi Ismael,

Good point. This is down to an implementation detail, the getter was added
to the base class for `SourceRecord` and `SinkRecord`,
`ConnectRecord`. `SourceRecord`
is treating missing timestamps as null while `SinkRecord` is treating it as
the default value `Record.NO_TIMESTAMP` (-1L).

It probably makes sense to be consistent and use either Long everywhere or
the primitive long and default values.

Feel free to add the comment on the PR
<https://github.com/apache/kafka/pull/1537/files> as well and I can follow
up there :-)

Thanks,

Shikhar

On Fri, Jun 24, 2016 at 3:52 PM Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Shikhar,
>
> Thanks for the KIP. One question:
>
> SinkRecord takes a `long` timestamp, but then exposes it via a method that
> returns `Long`. Is this correct? And if so, can you please explain the
> reasoning?
>
> Ismael
>
> On Thu, Jun 23, 2016 at 8:06 PM, Shikhar Bhushan <shik...@confluent.io>
> wrote:
>
> > Kafkarati,
> >
> > Here is a pretty straightforward proposal, for exposing timestamps that
> > were added in Kafka 0.10 to the connect framework so connectors can make
> > use of them:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-65%3A+Expose+timestamps+to+Connect
> >
> > Appreciate your thoughts!
> >
> > Shikhar
> >
>


Re: Changing hash algorithm to LogCleaner offset map

2016-07-22 Thread Shikhar Bhushan
Not sure I understand the motivation to use a FIPS-compliant hash function
for log compaction -- what are the security ramifications?

On Fri, Jul 22, 2016 at 2:56 PM Luciano Afranllie 
wrote:

> A little bit of background first.
>
> We are trying to make a deployment of Kafka that is FIPS 140-2 (
> https://en.wikipedia.org/wiki/FIPS_140-2) complaint and one of the
> requirements is not to use MD5.
>
> As far as we could see, Kafka is using MD5 only to hash message keys in a
> offset map (SkimpyOffsetMap) used by the log cleaner. So, we are planning
> to change the hash algorithm to something allowed by FIPS.
>
> With this in mind we are thinking that it would be great if we can add a
> config property LogCleanerHashAlgorithmProp = "log.cleaner.hash.algorithm"
> with a default value equal to "MD5" and use it in the constructor
> of CleanerConfig. In that case in future versions of Kafka we can just
> change the value of this property.
>
> Please let me know if you are Ok with this change.
> It is enough to create a pull request for this? Should I create a Jira
> first?
>
> Regards
> Luciano
>
> On Fri, Jul 22, 2016 at 5:58 PM, Luciano Afranllie <
> listas.luaf...@gmail.com
> > wrote:
>
> > Hi
> >
> > We are evaluating to change the hash algorithm used by the
> SkimpyOffsetMap
> > used by the LogCleaner from MD5 to SHA-1.
> >
> > Besides the impact in performance (more memory, more cpu usage) is there
> > anything that may be impacted?
> >
> > Regards
> > Luciano
> >
>


Re: Changing hash algorithm to LogCleaner offset map

2016-07-24 Thread Shikhar Bhushan
Got it, makes sense to make the hash function customizable if there are
environments in which md5 usage is prevented. The approach you are
proposing sounds good to me.
On Sat, Jul 23, 2016 at 14:56 Luciano Afranllie <listas.luaf...@gmail.com>
wrote:

> Nothing wrong about using MD5 for that from FIPS point of view, but we want
> to deploy with FIPS 140-2 mode enabled using only RSA security providers.
> With this settings it is not possible to use MD5.
>
> On Fri, Jul 22, 2016 at 8:49 PM, Shikhar Bhushan <shik...@confluent.io>
> wrote:
>
> > Not sure I understand the motivation to use a FIPS-compliant hash
> function
> > for log compaction -- what are the security ramifications?
> >
> > On Fri, Jul 22, 2016 at 2:56 PM Luciano Afranllie <
> > listas.luaf...@gmail.com>
> > wrote:
> >
> > > A little bit of background first.
> > >
> > > We are trying to make a deployment of Kafka that is FIPS 140-2 (
> > > https://en.wikipedia.org/wiki/FIPS_140-2) complaint and one of the
> > > requirements is not to use MD5.
> > >
> > > As far as we could see, Kafka is using MD5 only to hash message keys
> in a
> > > offset map (SkimpyOffsetMap) used by the log cleaner. So, we are
> planning
> > > to change the hash algorithm to something allowed by FIPS.
> > >
> > > With this in mind we are thinking that it would be great if we can add
> a
> > > config property LogCleanerHashAlgorithmProp =
> > "log.cleaner.hash.algorithm"
> > > with a default value equal to "MD5" and use it in the constructor
> > > of CleanerConfig. In that case in future versions of Kafka we can just
> > > change the value of this property.
> > >
> > > Please let me know if you are Ok with this change.
> > > It is enough to create a pull request for this? Should I create a Jira
> > > first?
> > >
> > > Regards
> > > Luciano
> > >
> > > On Fri, Jul 22, 2016 at 5:58 PM, Luciano Afranllie <
> > > listas.luaf...@gmail.com
> > > > wrote:
> > >
> > > > Hi
> > > >
> > > > We are evaluating to change the hash algorithm used by the
> > > SkimpyOffsetMap
> > > > used by the LogCleaner from MD5 to SHA-1.
> > > >
> > > > Besides the impact in performance (more memory, more cpu usage) is
> > there
> > > > anything that may be impacted?
> > > >
> > > > Regards
> > > > Luciano
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-22 Thread Shikhar Bhushan
flatMap() / supporting 1->n feels nice and general since filtering is just
the case of going from 1->0

I'm not sure why we'd need to do any more granular offset tracking (like
sub-offsets) for source connectors: after transformation of a given record
to n records, all of those n should map to same offset of the source
partition. The only thing to take care of here would be that we don't
commit a source offset while there are still records with that offset that
haven't been flushed to Kafka, but this is in the control of the connect
runtime.

I see your point for sink connectors, though. Implementors can currently
assume 1:1ness of a record to its Kafka coordinates (topic, partition,
offset).

On Thu, Jul 21, 2016 at 10:57 PM Ewen Cheslack-Postava 
wrote:

> Jun, The problem with it not being 1-1 is that Connect relies heavily on
> offsets, so we'd need to be able to track offsets at this finer
> granularity. Filtering is ok, but flatMap isn't. If you convert one message
> to many, what are the offsets for the new messages? One possibility would
> be to assume that transformations are deterministic and then "enhance" the
> offsets with an extra integer field that indicates its position in the
> subset. For sources this seems attractive since you can then reset to
> whatever the connector-provided offset is and then filter out any of the
> "sub"-messages that are earlier than the recorded "sub"-offset. But this
> might not actually work for sources since a) the offsets will include extra
> fields that the connector doesn't expect (might be ok since we handle that
> data as schemaless anyway) and b) if we allow multiple transformations
> (which seems likely given that people might want to do things like
> rearrange fields + filter messages) then offsets start getting quite
> complex as we add sub-sub-offsets and sub-sub-sub-offsets. It's doable, but
> seems messy.
>
> Things aren't as easy on the sink side. Since we track offsets using Kafka
> offsets we either need to use the extra metadata space to store the
> sub-offsets or we need to ensure that we only ever need to commit offsets
> on Kafka message boundaries. We might be able to get away with just
> delivering the entire set of generated messages in a single put() call,
> which the connector is expected to either fully accept or fully reject (via
> exception). However, this may end up interacting poorly with assumptions
> connectors might make if we expose things like max.poll.records, where they
> might expect one record at a time.
>
> I'm not really convinced of the benefit of support this -- at some point it
> seems better to use Streams to do transformations if you need flatMap. I
> can't think of many generic transformations that would use 1-to-many, and
> single message transforms really should be quite general -- that's the
> reason for providing a separate interface isolated from Connectors or
> Converters.
>
> Gwen, re: using null and sending to dead letter queue, it would be useful
> to think about how this might interact with other uses of a dead letter
> queue. Similar ideas have been raised for messages that either can't be
> parsed or which the connector chokes on repeatedly. If we use a dead letter
> queue for those, do we want these messages (which are explicitly filtered
> by a transform setup by the user) to end up in the same location?
>
> -Ewen
>
> On Sun, Jul 17, 2016 at 9:53 PM, Jun Rao  wrote:
>
> > Does the transformation need to be 1-to-1? For example, some users model
> > each Kafka message as schema + a batch of binary records. When using a
> sink
> > connector to push the Kafka data to a sink, if would be useful if the
> > transformer can convert each Kafka message to multiple records.
> >
> > Thanks,
> >
> > Jun
> >
> > On Sat, Jul 16, 2016 at 1:25 PM, Nisarg Shah  wrote:
> >
> > > Gwen,
> > >
> > > Yup, that sounds great! Instead of keeping it up to the transformers to
> > > handle null, we can instead have the topic as null. Sounds good. To get
> > rid
> > > of a message, set the topic to a special one (could be as simple as
> > null).
> > >
> > > Like I said before, the more interesting part would be ‘adding’ a new
> > > message to the existing list, based on say the current message in the
> > > transformer. Does that feature warrant to be included?
> > >
> > > > On Jul 14, 2016, at 22:25, Gwen Shapira  wrote:
> > > >
> > > > I used to work on Apache Flume, where we used to allow users to
> filter
> > > > messages completely in the transformation and then we got rid of it,
> > > > because we spent too much time trying to help users who had "message
> > > > loss", where the loss was actually a bug in the filter...
> > > >
> > > > What we couldn't do in Flume, but perhaps can do in the simple
> > > > transform for Connect is the ability to route messages to different
> > > > topics, with "null" as one of the possible targets. This will allow
> > > 

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-28 Thread Shikhar Bhushan
Some thoughts on the KIP and single-message transforms in general.

* When does transformation take place? In the KIP, it seems like the
connector-implemented task is responsible for calling into the
transformation logic. I'd propose that,
  - for source connectors, the transformer chain operates on the
SourceRecord's generated by a call to SourceTask.poll(), and are called by
the connect framework after getting them from the task before writing to
Kafka
  - for sink connectors, the transformer chain operates on SinkRecord's and
are called by the connect framework after generating the records from Kafka
and before they are passed on to SinkTask.put()

I think this addresses the type-related questions raised in the KIP.
Transformers would operate on ConnectRecord's (the base class for
SourceRecord & SinkRecord) and not have any generic type parameters.

* There is loss of information wrt what is visible to sink connectors as
records go through the transformation chain, unless we hang on to the
previous states in some way, e.g. by having a field for the pre-transform
'parent' record. This way it is possible to get at the original record even
if there have been a number of transformations by tracing along the
parents. Keeping this state around does have memory usage implications.

* There is often a requirement for being able to flexibly map from source
entity like db table, filename etc. to the Kafka topic name in source
connectors, and Kafka topic name to the db table, hdfs dir, etc. in sink
connectors. We currently have various different kinds of configuration
styles being used by connectors for this purpose, e.g. a prefix property.
It'd be good to add some semantics or clear expectations around
transformers being able to serve this use-case, with the framework
providing a standard transformer for this purpose that can be e.g.
configured with regex substitutions.

Perhaps transformers could just rely on overriding the topic on the record,
which seems ugly but is semantically compatible with all the use-cases I
imagine:
- for source connectors the destination is indeed the topic so this is just
fine
- for sink connectors they're basing the destination on topic in some way,
and will now just use the transformed topic-name as a basis

* The above destination mapping takes care of dead-letter queues as far as
transformers may want to care about that. For filtering / discarding of
records -- I agree with not making the API support flatMap due to the
complications Ewen pointed out. I think we could have
transformer.transform() (or whatever the map function is called) return
null (given this is Java, null seems... idiomatic, unless we're on JDK8
soon in which case we can use Optional).

* The transformer API in the KIP takes in Map<String, String> as
configuration properties. This is consistent with how we are configuring
everything else in connect, but as I think about more complex transformers
I'd imagine wanting to supply multi-line scripts as a transformation spec
e.g. Lua <https://github.com/rollulus/kafka-streams-plumber>. I suppose
these kinds of transformers could expect the configuration property to be a
resource file that needs to be on the classpath. Just thinking out loud
here. I can certainly imagine kcql
<https://github.com/datamountaineer/kafka-connect-query-language> working
well as a transformer without needing an external resource for the spec, as
it is designed to be configured from within the confines of a property
value.

On Mon, Jul 25, 2016 at 1:08 AM Michael Noll <mich...@confluent.io> wrote:

> API question regarding dead letter queues / sending messages to a null
> topic in order to get rid of them:  I assume we wouldn't suggest users to
> actually pass `null` into some method, but rather have a proper and
> descriptive API method such as `discard()` (this name is just an example)?
>
> On Sat, Jul 23, 2016 at 11:13 PM, Ewen Cheslack-Postava <e...@confluent.io
> >
> wrote:
>
> > On Fri, Jul 22, 2016 at 12:58 AM, Shikhar Bhushan <shik...@confluent.io>
> > wrote:
> >
> > > flatMap() / supporting 1->n feels nice and general since filtering is
> > just
> > > the case of going from 1->0
> > >
> > > I'm not sure why we'd need to do any more granular offset tracking
> (like
> > > sub-offsets) for source connectors: after transformation of a given
> > record
> > > to n records, all of those n should map to same offset of the source
> > > partition. The only thing to take care of here would be that we don't
> > > commit a source offset while there are still records with that offset
> > that
> > > haven't been flushed to Kafka, but this is in the control of the
> connect
> > > runtime.
> > >
> > >
> > I'd like to be forward thinking with this and make sure we can ge

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-07-29 Thread Shikhar Bhushan
>
>
> Hmm, operating on ConnectRecords probably doesn't work since you need to
> emit the right type of record, which might mean instantiating a new one. I
> think that means we either need 2 methods, one for SourceRecord, one for
> SinkRecord, or we'd need to limit what parts of the message you can modify
> (e.g. you can change the key/value via something like
> transformKey(ConnectRecord) and transformValue(ConnectRecord), but other
> fields would remain the same and the fmwk would handle allocating new
> Source/SinkRecords if needed)
>

Good point, perhaps we could add an abstract method on ConnectRecord that
takes all the shared fields as parameters and the implementations return a
copy of the narrower SourceRecord/SinkRecord type as appropriate.
Transformers would only operate on ConnectRecord rather than caring about
SourceRecord or SinkRecord (in theory they could instanceof/cast, but the
API should discourage it)


> Is there a use case for hanging on to the original? I can't think of a
> transformation where you'd need to do that (or couldn't just order things
> differently so it isn't a problem).


Yeah maybe this isn't really necessary. No strong preference here.

That said, I do worry a bit that farming too much stuff out to transformers
> can result in "programming via config", i.e. a lot of the simplicity you
> get from Connect disappears in long config files. Standardization would be
> nice and might just avoid this (and doesn't cost that much implementing it
> in each connector), and I'd personally prefer something a bit less flexible
> but consistent and easy to configure.


Not sure what the you're suggesting :-) Standardized config properties for
a small set of transformations, leaving it upto connectors to integrate?

Personally I'm skeptical of that level of flexibility in transformers --
> its getting awfully complex and certainly takes us pretty far from "config
> only" realtime data integration. It's not clear to me what the use cases
> are that aren't covered by a small set of common transformations that can
> be chained together (e.g. rename/remove fields, mask values, and maybe a
> couple more).
>

I agree that we should have some standard transformations that we ship with
connect that users would ideally lean towards for routine tasks. The ones
you mention are some good candidates where I'd imagine can expose simple
config, e.g.
   transform.filter.whitelist=x,y,z # filter to a whitelist of fields
   transfom.rename.spec=oldName1=>newName1, oldName2=>newName2
   topic.rename.replace=-/_
   topic.rename.prefix=kafka_
etc..

However the ecosystem will invariably have more complex transformers if we
make this pluggable. And because ETL is messy, that's probably a good thing
if folks are able to do their data munging orthogonally to connectors, so
that connectors can focus on the logic of how data should be copied from/to
datastores and Kafka.


> In any case, we'd probably also have to change configs of connectors if we
> allowed configs like that since presumably transformer configs will just be
> part of the connector config.
>

Yeah, haven't thought much about how all the configuration would tie
together...

I think we'd need the ability to:
- spec transformer chain (fully-qualified class names? perhaps special
aliases for built-in ones? perhaps third-party fqcns can be assigned
aliases by users in the chain spec, for easier configuration and to
uniquely identify a transformation when it occurs more than one time in a
chain?)
- configure each transformer -- all properties prefixed with that
transformer's ID (fqcn / alias) get destined to it

Additionally, I think we would probably want to allow for topic-specific
overrides  (e.g. you want
certain transformations for one topic, but different ones for another...)


Re: [VOTE] KIP-65 Expose timestamps to Connect

2016-06-28 Thread Shikhar Bhushan
Thanks all, looks like the vote passed with +5 binding and +1 non-binding.
I'll update the KIP page.

On Tue, Jun 28, 2016 at 12:54 PM Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> +1 (binding)
>
> -Ewen
>
> On Fri, Jun 24, 2016 at 4:58 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > +1 (binding), provided that we make the usage of `Long/null` versus
> > `long/-1` consistent.
> >
> > Ismael
> >
> > On Sat, Jun 25, 2016 at 12:42 AM, Gwen Shapira <g...@confluent.io>
> wrote:
> >
> > > +1
> > >
> > > On Fri, Jun 24, 2016 at 10:59 AM, Shikhar Bhushan <
> shik...@confluent.io>
> > > wrote:
> > > > Since there isn't much to discuss with this KIP, bringing it to a
> vote
> > > >
> > > > KIP:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-65%3A+Expose+timestamps+to+Connect
> > > >
> > > > Pull request: https://github.com/apache/kafka/pull/1537
> > > >
> > > > Thanks,
> > > >
> > > > Shikhar
> > >
> >
>
>
>
> --
> Thanks,
> Ewen
>


Re: [VOTE] KIP-75 - Add per-connector Converters

2016-08-15 Thread Shikhar Bhushan
+1 (non-binding)

On Mon, Aug 15, 2016 at 1:20 PM Ismael Juma  wrote:

> +1 (binding)
>
> On 15 Aug 2016 7:21 pm, "Ewen Cheslack-Postava"  wrote:
>
> > I would like to initiate the voting process for KIP-75:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 75+-+Add+per-connector+Converters
> >
> > I'll kick things off with a +1 (binding).
> >
> > --
> > Thanks,
> > Ewen
> >
>


[VOTE] KIP-89: Allow sink connectors to decouple flush and offset commit

2016-11-09 Thread Shikhar Bhushan
Hi,

I would like to initiate a vote on KIP-89

https://cwiki.apache.org/confluence/display/KAFKA/KIP-89%3A+Allow+sink+connectors+to+decouple+flush+and+offset+commit

Best,

Shikhar


[DISCUSS] KIP-89: Allow sink connectors to decouple flush and offset commit

2016-11-04 Thread Shikhar Bhushan
Hi all,

I created KIP-89 for making a Connect API change that allows for sink
connectors to decouple flush and offset commits.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-89%3A+Allow+sink+connectors+to+decouple+flush+and+offset+commit

I'd welcome your input.

Best,

Shikhar


Re: [VOTE] KIP-89: Allow sink connectors to decouple flush and offset commit

2016-11-14 Thread Shikhar Bhushan
The vote passed with +3 binding votes. Thanks all!

On Sun, Nov 13, 2016 at 1:42 PM Gwen Shapira <g...@confluent.io> wrote:

+1 (binding)

On Nov 9, 2016 2:17 PM, "Shikhar Bhushan" <shik...@confluent.io> wrote:

> Hi,
>
> I would like to initiate a vote on KIP-89
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 89%3A+Allow+sink+connectors+to+decouple+flush+and+offset+commit
>
> Best,
>
> Shikhar
>


Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-07 Thread Shikhar Bhushan
Hi all,

I have another iteration at a proposal for this feature here:
https://cwiki.apache.org/confluence/display/KAFKA/Connect+Transforms+-+Proposed+Design

I'd welcome your feedback and comments.

Thanks,

Shikhar

On Tue, Aug 2, 2016 at 7:21 PM Ewen Cheslack-Postava <e...@confluent.io>
wrote:

On Thu, Jul 28, 2016 at 11:58 PM, Shikhar Bhushan <shik...@confluent.io>
wrote:

> >
> >
> > Hmm, operating on ConnectRecords probably doesn't work since you need to
> > emit the right type of record, which might mean instantiating a new one.
> I
> > think that means we either need 2 methods, one for SourceRecord, one for
> > SinkRecord, or we'd need to limit what parts of the message you can
> modify
> > (e.g. you can change the key/value via something like
> > transformKey(ConnectRecord) and transformValue(ConnectRecord), but other
> > fields would remain the same and the fmwk would handle allocating new
> > Source/SinkRecords if needed)
> >
>
> Good point, perhaps we could add an abstract method on ConnectRecord that
> takes all the shared fields as parameters and the implementations return a
> copy of the narrower SourceRecord/SinkRecord type as appropriate.
> Transformers would only operate on ConnectRecord rather than caring about
> SourceRecord or SinkRecord (in theory they could instanceof/cast, but the
> API should discourage it)
>
>
> > Is there a use case for hanging on to the original? I can't think of a
> > transformation where you'd need to do that (or couldn't just order
things
> > differently so it isn't a problem).
>
>
> Yeah maybe this isn't really necessary. No strong preference here.
>
> That said, I do worry a bit that farming too much stuff out to
transformers
> > can result in "programming via config", i.e. a lot of the simplicity you
> > get from Connect disappears in long config files. Standardization would
> be
> > nice and might just avoid this (and doesn't cost that much implementing
> it
> > in each connector), and I'd personally prefer something a bit less
> flexible
> > but consistent and easy to configure.
>
>
> Not sure what the you're suggesting :-) Standardized config properties for
> a small set of transformations, leaving it upto connectors to integrate?
>

I just mean that you get to the point where you're practically writing a
Kafka Streams application, you're just doing it through either an
incredibly convoluted set of transformers and configs, or a single
transformer with incredibly convoluted set of configs. You basically get to
the point where you're config is a mini DSL and you're not really saving
that much.

The real question is how much we want to venture into the "T" part of ETL.
I tend to favor minimizing how much we take on since the rest of Connect
isn't designed for it, it's designed around the E & L parts.

-Ewen


> Personally I'm skeptical of that level of flexibility in transformers --
> > its getting awfully complex and certainly takes us pretty far from
> "config
> > only" realtime data integration. It's not clear to me what the use cases
> > are that aren't covered by a small set of common transformations that
can
> > be chained together (e.g. rename/remove fields, mask values, and maybe a
> > couple more).
> >
>
> I agree that we should have some standard transformations that we ship
with
> connect that users would ideally lean towards for routine tasks. The ones
> you mention are some good candidates where I'd imagine can expose simple
> config, e.g.
>transform.filter.whitelist=x,y,z # filter to a whitelist of fields
>transfom.rename.spec=oldName1=>newName1, oldName2=>newName2
>topic.rename.replace=-/_
>topic.rename.prefix=kafka_
> etc..
>
> However the ecosystem will invariably have more complex transformers if we
> make this pluggable. And because ETL is messy, that's probably a good
thing
> if folks are able to do their data munging orthogonally to connectors, so
> that connectors can focus on the logic of how data should be copied
from/to
> datastores and Kafka.
>
>
> > In any case, we'd probably also have to change configs of connectors if
> we
> > allowed configs like that since presumably transformer configs will just
> be
> > part of the connector config.
> >
>
> Yeah, haven't thought much about how all the configuration would tie
> together...
>
> I think we'd need the ability to:
> - spec transformer chain (fully-qualified class names? perhaps special
> aliases for built-in ones? perhaps third-party fqcns can be assigned
> aliases by users in the chain spec, for easier configuration and to
> uniquely identify a transformation when it

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
I have updated KIP-66
https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect
with
the changes I proposed in the design.

Gwen, I think the main downside to not including some transformations with
Kafka Connect is that it seems less user friendly if folks have to make
sure to have the right transformation(s) on the classpath as well, besides
their connector(s). Additionally by going in with a small set included, we
can encourage a consistent configuration and implementation style and
provide utilities for e.g. data transformations, which I expect we will
definitely need (discussed under 'Patterns for data transformations').

It does get hard to draw the line once you go from 'none' to 'some'. To get
discussion going, if we get agreement on 'none' vs 'some', I added a table
under 'Bundled transformations' for transformations which I think are worth
including.

For many of these, I have noticed their absence in the wild as a pain point
--
TimestampRouter:
https://github.com/confluentinc/kafka-connect-elasticsearch/issues/33
Mask:
https://groups.google.com/d/msg/confluent-platform/3yHb8_mCReQ/sTQc3dNgBwAJ
Insert:
http://stackoverflow.com/questions/40664745/elasticsearch-connector-for-kafka-connect-offset-and-timestamp
RegexRouter:
https://groups.google.com/d/msg/confluent-platform/yEBwu1rGcs0/gIAhRp6kBwAJ
NumericCast:
https://github.com/confluentinc/kafka-connect-jdbc/issues/101#issuecomment-249096119
TimestampConverter:
https://groups.google.com/d/msg/confluent-platform/gGAOsw3Qeu4/8JCqdDhGBwAJ
ValueToKey: https://github.com/confluentinc/kafka-connect-jdbc/pull/166

In other cases, their functionality is already being implemented by
connectors in divergent ways: RegexRouter, Insert, Replace, HoistToStruct,
ExtractFromStruct

On Wed, Dec 14, 2016 at 6:00 PM Gwen Shapira <g...@confluent.io> wrote:

I'm a bit concerned about adding transformations in Kafka. NiFi has 150
processors, presumably they are all useful for someone. I don't know if I'd
want all of that in Apache Kafka. What's the downside of keeping it out? Or
at least keeping the built-in set super minimal (Flume has like 3 built-in
interceptors)?

Gwen

On Wed, Dec 14, 2016 at 1:36 PM, Shikhar Bhushan <shik...@confluent.io>
wrote:

> With regard to a), just using `ConnectRecord` with `newRecord` as a new
> abstract method would be a fine choice. In prototyping, both options end
up
> looking pretty similar (in terms of how transformations are implemented
and
> the runtime initializes and uses them) and I'm starting to lean towards
not
> adding a new interface into the mix.
>
> On b) I think we should include a small set of useful transformations with
> Connect, since they can be applicable across different connectors and we
> should encourage some standardization for common operations. I'll update
> KIP-66 soon including a spec of transformations that I believe are worth
> including.
>
> On Sat, Dec 10, 2016 at 11:52 PM Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> If anyone has time to review here, it'd be great to get feedback. I'd
> imagine that the proposal itself won't be too controversial -- keeps
> transformations simple (by only allowing map/filter), doesn't affect the
> rest of the framework much, and fits in with general config structure
we've
> used elsewhere (although ConfigDef could use some updates to make this
> easier...).
>
> I think the main open questions for me are:
>
> a) Is TransformableRecord worth it to avoid reimplementing small bits of
> code (it allows for a single implementation of the interface to trivially
> apply to both Source and SinkRecords). I think I prefer this, but it does
> come with some commitment to another interface on top of ConnectRecord. We
> could alternatively modify ConnectRecord which would require fewer
changes.
> b) How do folks feel about built-in transformations and the set that are
> mentioned here? This brings us way back to the discussion of built-in
> connectors. Transformations, especially when intended to be lightweight
and
> touch nothing besides the data already in the record, seem different from
> connectors -- there might be quite a few, but hopefully limited. Since we
> (hopefully) already factor out most serialization-specific stuff via
> Converters, I think we can keep this pretty limited. That said, I have no
> doubt some folks will (in my opinion) abuse this feature to do data
> enrichment by querying external systems, so building a bunch of
> transformations in could potentially open the floodgates, or at least make
> decisions about what is included vs what should be 3rd party muddy.
>
> -Ewen
>
>
> On Wed, Dec 7, 2016 at 11:46 AM, Shikhar Bhushan <shik...@confluent.io>
> wrote:
>
> > Hi all,
> >
> > I have another iteration at a proposa

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
There is no decision being proposed on the final list of transformations
that will ever be in Kafka :-) Just the initial set we should roll with.

On Thu, Dec 15, 2016 at 3:34 PM Gwen Shapira <g...@confluent.io> wrote:

You are absolutely right that the vast majority of NiFi's processors are
not what we would consider SMT.

I went over the list and I think the still contain just short of 50 legit
SMTs:
https://cwiki.apache.org/confluence/display/KAFKA/Analyzing+NiFi+Transformations

You are right that ExtractHL7 is an extreme that clearly doesn't belong in
Apache Kafka, but just before that we have ExtractAvroMetadata that may
fit? and ExtractEmailHeaders doesn't sound totally outlandish either...

Nothing in the baked-in list by Shikhar looks out of place. I am concerned
about slipperly slope. Or the arbitrariness of the decision if we say that
this list is final and nothing else will ever make it into Kafka.

Gwen

On Thu, Dec 15, 2016 at 3:00 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> I think there are a couple of factors that make transformations and
> connectors different.
>
> First, NiFi's 150 processors is a bit misleading. In NiFi, processors
cover
> data sources, data sinks, serialization/deserialization, *and*
> transformations. I haven't filtered the list to see how many fall into the
> first 3 categories, but it's a *lot* of the processors they have.
>
> Second, since transformations only apply to a single message and I'd think
> they generally shouldn't be interacting with external services (i.e. I
> think trying to do enrichment in SMT is probably a bad idea), the scope of
> possible transformations is reasonably limited and the transformations
> themselves tend to be small and easily maintainable. I think this is a
> dramatic difference from connectors, which are each substantial projects
in
> their own right.
>
> While I get the slippery slope argument re: including specific
> transformations, I think we can come up with a reasonable policy (and via
> KIPs we can, as a community, come to an agreement based purely on taste if
> it comes down to that). In particular, I'd say keep the core general (i.e.
> no domain-specific transformations/parsing like HL7), pure data
> manipulation (i.e. no enrichment), and nothing that could just as well be
> done as a converter/serializer/deserializer/source connector/sink
> connector.
>
> I was very staunchly against including connectors (aside from a simple
> example) directly in Kafka, so this may seem like a reversal of position.
> But I think the % of use cases covered will look very different between
> connectors and transformations. Sure, some connectors are very popular,
and
> moreso right now because they are the most thoroughly developed, tested,
> etc. But the top 3 most common transformations will probably be used
across
> all the top 20 most popular connectors. I have no doubt people will end up
> writing custom ones (which is why it's nice to make them pluggable rather
> than choosing a fixed set), but they'll either be very niche (like people
> write custom connectors for their internal systems) or be more broadly
> applicable but very domain specific such that they are easy to reject for
> inclusion.
>
> @Gwen if we filtered the list of NiFi processors to ones that fit that
> criteria, would that still be too long a list for your taste? Similarly,
> let's say we were going to include some baked in; in that case, does
> anything look out of place to you in the list Shikhar has included in the
> KIP?
>
> -Ewen
>
> On Thu, Dec 15, 2016 at 2:01 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > I agree about the ease of use in adding a small-subset of built-in
> > transformations.
> >
> > But the same thing is true for connectors - there are maybe 5 super
> popular
> > OSS connectors and the rest is a very long tail. We drew the line at not
> > adding any, because thats the easiest and because we did not want to
turn
> > Kafka into a collection of transformations.
> >
> > I really don't want to end up with 135 (or even 20) transformations in
> > Kafka. So either we have a super-clear definition of what belongs and
> what
> > doesn't - or we put in one minimal example and the rest goes into the
> > ecosystem.
> >
> > We can also start by putting transformations on github and just see if
> > there is huge demand for them in Apache. It is easier to add stuff to
the
> > project later than to remove functionality.
> >
> >
> >
> > On Thu, Dec 15, 2016 at 11:59 AM, Shikhar Bhushan <shik...@confluent.io>
> > wrote:
> >
> > > I have updated KIP-66
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
&

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-15 Thread Shikhar Bhushan
I think the tradeoffs for including connectors are different. Connectors
are comparatively larger in scope, they tend to come with their own set of
dependencies for the systems they need to talk to. Transformations as I
imagine them - at least the ones on the table in the wiki currently -
should be a single not-very-large class (or 3 when there are simple *Key
and *Value variants deriving from a base implementing the common
functionality), in some cases relying on common utilities for munging with
the Connect data API. Correspondingly, the maintenance burden is also
smaller.

It's true that it would probably be easier to add specific transformations
down the line than evolve/remove, but I have faith we can strike a good
balance in making the call on what to include from the start.

On > super-clear definition of what belongs and what doesn't

How about: small and broadly applicable, configurable in an easily
understandable manner, no external dependencies, concrete use-case

On Thu, Dec 15, 2016 at 2:01 PM Gwen Shapira <g...@confluent.io> wrote:

I agree about the ease of use in adding a small-subset of built-in
transformations.

But the same thing is true for connectors - there are maybe 5 super popular
OSS connectors and the rest is a very long tail. We drew the line at not
adding any, because thats the easiest and because we did not want to turn
Kafka into a collection of transformations.

I really don't want to end up with 135 (or even 20) transformations in
Kafka. So either we have a super-clear definition of what belongs and what
doesn't - or we put in one minimal example and the rest goes into the
ecosystem.

We can also start by putting transformations on github and just see if
there is huge demand for them in Apache. It is easier to add stuff to the
project later than to remove functionality.



On Thu, Dec 15, 2016 at 11:59 AM, Shikhar Bhushan <shik...@confluent.io>
wrote:

> I have updated KIP-66
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 66%3A+Single+Message+Transforms+for+Kafka+Connect
> with
> the changes I proposed in the design.
>
> Gwen, I think the main downside to not including some transformations with
> Kafka Connect is that it seems less user friendly if folks have to make
> sure to have the right transformation(s) on the classpath as well, besides
> their connector(s). Additionally by going in with a small set included, we
> can encourage a consistent configuration and implementation style and
> provide utilities for e.g. data transformations, which I expect we will
> definitely need (discussed under 'Patterns for data transformations').
>
> It does get hard to draw the line once you go from 'none' to 'some'. To
get
> discussion going, if we get agreement on 'none' vs 'some', I added a table
> under 'Bundled transformations' for transformations which I think are
worth
> including.
>
> For many of these, I have noticed their absence in the wild as a pain
point
> --
> TimestampRouter:
> https://github.com/confluentinc/kafka-connect-elasticsearch/issues/33
> Mask:
> https://groups.google.com/d/msg/confluent-platform/3yHb8_
> mCReQ/sTQc3dNgBwAJ
> Insert:
> http://stackoverflow.com/questions/40664745/elasticsearch-connector-for-
> kafka-connect-offset-and-timestamp
> RegexRouter:
> https://groups.google.com/d/msg/confluent-platform/
> yEBwu1rGcs0/gIAhRp6kBwAJ
> NumericCast:
> https://github.com/confluentinc/kafka-connect-
> jdbc/issues/101#issuecomment-249096119
> TimestampConverter:
> https://groups.google.com/d/msg/confluent-platform/
> gGAOsw3Qeu4/8JCqdDhGBwAJ
> ValueToKey: https://github.com/confluentinc/kafka-connect-jdbc/pull/166
>
> In other cases, their functionality is already being implemented by
> connectors in divergent ways: RegexRouter, Insert, Replace, HoistToStruct,
> ExtractFromStruct
>
> On Wed, Dec 14, 2016 at 6:00 PM Gwen Shapira <g...@confluent.io> wrote:
>
> I'm a bit concerned about adding transformations in Kafka. NiFi has 150
> processors, presumably they are all useful for someone. I don't know if
I'd
> want all of that in Apache Kafka. What's the downside of keeping it out?
Or
> at least keeping the built-in set super minimal (Flume has like 3 built-in
> interceptors)?
>
> Gwen
>
> On Wed, Dec 14, 2016 at 1:36 PM, Shikhar Bhushan <shik...@confluent.io>
> wrote:
>
> > With regard to a), just using `ConnectRecord` with `newRecord` as a new
> > abstract method would be a fine choice. In prototyping, both options end
> up
> > looking pretty similar (in terms of how transformations are implemented
> and
> > the runtime initializes and uses them) and I'm starting to lean towards
> not
> > adding a new interface into the mix.
> >
> > On b) I think we should include a small set of useful transformations
&g

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-14 Thread Shikhar Bhushan
With regard to a), just using `ConnectRecord` with `newRecord` as a new
abstract method would be a fine choice. In prototyping, both options end up
looking pretty similar (in terms of how transformations are implemented and
the runtime initializes and uses them) and I'm starting to lean towards not
adding a new interface into the mix.

On b) I think we should include a small set of useful transformations with
Connect, since they can be applicable across different connectors and we
should encourage some standardization for common operations. I'll update
KIP-66 soon including a spec of transformations that I believe are worth
including.

On Sat, Dec 10, 2016 at 11:52 PM Ewen Cheslack-Postava <e...@confluent.io>
wrote:

If anyone has time to review here, it'd be great to get feedback. I'd
imagine that the proposal itself won't be too controversial -- keeps
transformations simple (by only allowing map/filter), doesn't affect the
rest of the framework much, and fits in with general config structure we've
used elsewhere (although ConfigDef could use some updates to make this
easier...).

I think the main open questions for me are:

a) Is TransformableRecord worth it to avoid reimplementing small bits of
code (it allows for a single implementation of the interface to trivially
apply to both Source and SinkRecords). I think I prefer this, but it does
come with some commitment to another interface on top of ConnectRecord. We
could alternatively modify ConnectRecord which would require fewer changes.
b) How do folks feel about built-in transformations and the set that are
mentioned here? This brings us way back to the discussion of built-in
connectors. Transformations, especially when intended to be lightweight and
touch nothing besides the data already in the record, seem different from
connectors -- there might be quite a few, but hopefully limited. Since we
(hopefully) already factor out most serialization-specific stuff via
Converters, I think we can keep this pretty limited. That said, I have no
doubt some folks will (in my opinion) abuse this feature to do data
enrichment by querying external systems, so building a bunch of
transformations in could potentially open the floodgates, or at least make
decisions about what is included vs what should be 3rd party muddy.

-Ewen


On Wed, Dec 7, 2016 at 11:46 AM, Shikhar Bhushan <shik...@confluent.io>
wrote:

> Hi all,
>
> I have another iteration at a proposal for this feature here:
> https://cwiki.apache.org/confluence/display/KAFKA/
> Connect+Transforms+-+Proposed+Design
>
> I'd welcome your feedback and comments.
>
> Thanks,
>
> Shikhar
>
> On Tue, Aug 2, 2016 at 7:21 PM Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> On Thu, Jul 28, 2016 at 11:58 PM, Shikhar Bhushan <shik...@confluent.io>
> wrote:
>
> > >
> > >
> > > Hmm, operating on ConnectRecords probably doesn't work since you need
> to
> > > emit the right type of record, which might mean instantiating a new
> one.
> > I
> > > think that means we either need 2 methods, one for SourceRecord, one
> for
> > > SinkRecord, or we'd need to limit what parts of the message you can
> > modify
> > > (e.g. you can change the key/value via something like
> > > transformKey(ConnectRecord) and transformValue(ConnectRecord), but
> other
> > > fields would remain the same and the fmwk would handle allocating new
> > > Source/SinkRecords if needed)
> > >
> >
> > Good point, perhaps we could add an abstract method on ConnectRecord
that
> > takes all the shared fields as parameters and the implementations return
> a
> > copy of the narrower SourceRecord/SinkRecord type as appropriate.
> > Transformers would only operate on ConnectRecord rather than caring
about
> > SourceRecord or SinkRecord (in theory they could instanceof/cast, but
the
> > API should discourage it)
> >
> >
> > > Is there a use case for hanging on to the original? I can't think of a
> > > transformation where you'd need to do that (or couldn't just order
> things
> > > differently so it isn't a problem).
> >
> >
> > Yeah maybe this isn't really necessary. No strong preference here.
> >
> > That said, I do worry a bit that farming too much stuff out to
> transformers
> > > can result in "programming via config", i.e. a lot of the simplicity
> you
> > > get from Connect disappears in long config files. Standardization
would
> > be
> > > nice and might just avoid this (and doesn't cost that much
implementing
> > it
> > > in each connector), and I'd personally prefer something a bit less
> > flexible
> > > but consistent and easy to configure.
> >
> &

Re: KafkaConnect SinkTask::put

2017-01-05 Thread Shikhar Bhushan
Hi David,

You can override the underlying consumer's `max.poll.records` setting for
this. E.g.
consumer.max.poll.records=500

Best,

Shikhar

On Thu, Jan 5, 2017 at 3:59 AM <david.frank...@bt.com> wrote:

> Is there any way of limiting the number of events that are passed into the
> call to the put(Collection) method?
>
> I'm writing a set of events to Kafka via a source Connector/Task and
> reading these from a sink Connector/Task.
> If I generate of the order of 10k events the number of SinkRecords passed
> to the put method starts off very low but quickly rises in large increments
> such that 9k events are passed to a later invocation of the put method.
>
> Furthermore, processing a large number of events in a single call (I'm
> writing to Elasticsearch) appears to cause the source task poll() method to
> timeout, raising a CommitFailedException which, incidentally, I can't see
> how to catch.
>
> Thanks for any help you can provide,
> David
>


Re: KafkaConnect SinkTask::put

2017-01-06 Thread Shikhar Bhushan
Sorry I forgot to specify, this needs to go into your Connect worker
configuration.
On Fri, Jan 6, 2017 at 02:57 <david.frank...@bt.com> wrote:

> Hi Shikhar,
>
> I've just added this to ~config/consumer.properties in the Kafka folder
> but it doesn't appear to have made any difference.  Have I put it in the
> wrong place?
>
> Thanks again,
> David
>
> -Original Message-
> From: Shikhar Bhushan [mailto:shik...@confluent.io]
> Sent: 05 January 2017 18:12
> To: dev@kafka.apache.org
> Subject: Re: KafkaConnect SinkTask::put
>
> Hi David,
>
> You can override the underlying consumer's `max.poll.records` setting for
> this. E.g.
> consumer.max.poll.records=500
>
> Best,
>
> Shikhar
>
> On Thu, Jan 5, 2017 at 3:59 AM <david.frank...@bt.com> wrote:
>
> > Is there any way of limiting the number of events that are passed into
> > the call to the put(Collection) method?
> >
> > I'm writing a set of events to Kafka via a source Connector/Task and
> > reading these from a sink Connector/Task.
> > If I generate of the order of 10k events the number of SinkRecords
> > passed to the put method starts off very low but quickly rises in
> > large increments such that 9k events are passed to a later invocation of
> the put method.
> >
> > Furthermore, processing a large number of events in a single call (I'm
> > writing to Elasticsearch) appears to cause the source task poll()
> > method to timeout, raising a CommitFailedException which,
> > incidentally, I can't see how to catch.
> >
> > Thanks for any help you can provide,
> > David
> >
>


Re: [VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-06 Thread Shikhar Bhushan
That makes sense to me, I'll fold that into the PR and update the KIP if it
gets committed in that form.

On Fri, Jan 6, 2017 at 9:44 AM Jason Gustafson <ja...@confluent.io> wrote:

> +1 One minor comment: would it make sense to let the `Transformation`
> interface extend `o.a.k.c.Configurable` and remove the `init` method?
>
> On Thu, Jan 5, 2017 at 5:48 PM, Neha Narkhede <n...@confluent.io> wrote:
>
> > +1 (binding)
> >
> > On Wed, Jan 4, 2017 at 2:36 PM Shikhar Bhushan <shik...@confluent.io>
> > wrote:
> >
> > > I do plan on introducing a new `connect:transforms` module (which
> > > `connect:runtime` will depend on), so they will live in a separate
> module
> > > in the source tree and output.
> > >
> > > ( https://github.com/apache/kafka/pull/2299 )
> > >
> > > On Wed, Jan 4, 2017 at 2:28 PM Ewen Cheslack-Postava <
> e...@confluent.io>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > Gwen, re: bundling transformations, would it help at all to isolate
> > them
> > > to
> > > > a separate jar or is the concern purely about maintaining them as
> part
> > of
> > > > Kafka?
> > > >
> > > > -Ewen
> > > >
> > > > On Wed, Jan 4, 2017 at 1:31 PM, Sriram Subramanian <r...@confluent.io
> >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Wed, Jan 4, 2017 at 1:29 PM, Gwen Shapira <g...@confluent.io>
> > > wrote:
> > > > >
> > > > > > I would have preferred not to bundle transformations, but since
> SMT
> > > > > > capability is a much needed feature, I'll take it in its current
> > > form.
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > On Wed, Jan 4, 2017 at 10:47 AM, Shikhar Bhushan <
> > > shik...@confluent.io
> > > > >
> > > > > > wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'd like to start voting on KIP-66:
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > 66%3A+Single+Message+Transforms+for+Kafka+Connect
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Shikhar
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Gwen Shapira
> > > > > > Product Manager | Confluent
> > > > > > 650.450.2760 <(650)%20450-2760> <(650)%20450-2760>
> <(650)%20450-2760> | @gwenshap
> > > > > > Follow us: Twitter | blog
> > > > > >
> > > > >
> > > >
> > >
> > --
> > Thanks,
> > Neha
> >
>


Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2017-01-03 Thread Shikhar Bhushan
Makes sense Ewen, I edited the KIP to include this criteria.

I'd like to start a voting thread soon unless anyone has additional points
for discussion.

On Fri, Dec 30, 2016 at 12:14 PM Ewen Cheslack-Postava <e...@confluent.io>
wrote:

On Thu, Dec 15, 2016 at 7:41 PM, Shikhar Bhushan <shik...@confluent.io>
wrote:

> There is no decision being proposed on the final list of transformations
> that will ever be in Kafka :-) Just the initial set we should roll with.
>

I'd second this comment as well. I'm very wary of the slippery slope, which
is why I wasn't in favor of including any connectors except for very simple
demos.

But it might be useful to have some initial guidelines, and might even make
sense to include them in the KIP so they are easy for others to find. I
think both the examples Gwen gave are easily excluded with a simple rule:
SMTs that are shipped with Kafka should be general enough to apply to many
data sources & serialization formats. email is a very specific type of data
(email headers and HL7 are pretty similar) and Avro is a specific
serialization format where, presumably, the Connect data type you'd have to
receive to do this transformation is just a byte array of the original Avro
file. In contrast, the included transformations in the current KIP are
*really* broadly applicable; apart from timestamps, I think they pretty
much all could potentially be applied to *any* stream of data.

I think the more interesting cases that we'll probably end up debating are
around serialization formats that "fit" within other connectors, in
particular I'm thinking of CSV and line-oriented JSON parsing. Individual
connectors may avoid this (or not be aware that the data has this
structure), but users will want that type of transformation to be easy and
baked in.

-Ewen


>
> On Thu, Dec 15, 2016 at 3:34 PM Gwen Shapira <g...@confluent.io> wrote:
>
> You are absolutely right that the vast majority of NiFi's processors are
> not what we would consider SMT.
>
> I went over the list and I think the still contain just short of 50 legit
> SMTs:
> https://cwiki.apache.org/confluence/display/KAFKA/Analyzing+
> NiFi+Transformations
>
> You are right that ExtractHL7 is an extreme that clearly doesn't belong in
> Apache Kafka, but just before that we have ExtractAvroMetadata that may
> fit? and ExtractEmailHeaders doesn't sound totally outlandish either...
>
> Nothing in the baked-in list by Shikhar looks out of place. I am concerned
> about slipperly slope. Or the arbitrariness of the decision if we say that
> this list is final and nothing else will ever make it into Kafka.
>
> Gwen
>
> On Thu, Dec 15, 2016 at 3:00 PM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > I think there are a couple of factors that make transformations and
> > connectors different.
> >
> > First, NiFi's 150 processors is a bit misleading. In NiFi, processors
> cover
> > data sources, data sinks, serialization/deserialization, *and*
> > transformations. I haven't filtered the list to see how many fall into
> the
> > first 3 categories, but it's a *lot* of the processors they have.
> >
> > Second, since transformations only apply to a single message and I'd
> think
> > they generally shouldn't be interacting with external services (i.e. I
> > think trying to do enrichment in SMT is probably a bad idea), the scope
> of
> > possible transformations is reasonably limited and the transformations
> > themselves tend to be small and easily maintainable. I think this is a
> > dramatic difference from connectors, which are each substantial projects
> in
> > their own right.
> >
> > While I get the slippery slope argument re: including specific
> > transformations, I think we can come up with a reasonable policy (and
via
> > KIPs we can, as a community, come to an agreement based purely on taste
> if
> > it comes down to that). In particular, I'd say keep the core general
> (i.e.
> > no domain-specific transformations/parsing like HL7), pure data
> > manipulation (i.e. no enrichment), and nothing that could just as well
be
> > done as a converter/serializer/deserializer/source connector/sink
> > connector.
> >
> > I was very staunchly against including connectors (aside from a simple
> > example) directly in Kafka, so this may seem like a reversal of
position.
> > But I think the % of use cases covered will look very different between
> > connectors and transformations. Sure, some connectors are very popular,
> and
> > moreso right now because they are the most thoroughly developed, tested,
> > etc. But the top 3 most common transformations will probably be used
> across
> > all the t

Re: [VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-04 Thread Shikhar Bhushan
I do plan on introducing a new `connect:transforms` module (which
`connect:runtime` will depend on), so they will live in a separate module
in the source tree and output.

( https://github.com/apache/kafka/pull/2299 )

On Wed, Jan 4, 2017 at 2:28 PM Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> +1
>
> Gwen, re: bundling transformations, would it help at all to isolate them to
> a separate jar or is the concern purely about maintaining them as part of
> Kafka?
>
> -Ewen
>
> On Wed, Jan 4, 2017 at 1:31 PM, Sriram Subramanian <r...@confluent.io>
> wrote:
>
> > +1
> >
> > On Wed, Jan 4, 2017 at 1:29 PM, Gwen Shapira <g...@confluent.io> wrote:
> >
> > > I would have preferred not to bundle transformations, but since SMT
> > > capability is a much needed feature, I'll take it in its current form.
> > >
> > > +1
> > >
> > > On Wed, Jan 4, 2017 at 10:47 AM, Shikhar Bhushan <shik...@confluent.io
> >
> > > wrote:
> > > > Hi all,
> > > >
> > > > I'd like to start voting on KIP-66:
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 66%3A+Single+Message+Transforms+for+Kafka+Connect
> > > >
> > > > Best,
> > > >
> > > > Shikhar
> > >
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 <(650)%20450-2760> | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>


Re: [VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-09 Thread Shikhar Bhushan
Thanks all. The vote passed with +5 (binding).

On Fri, Jan 6, 2017 at 11:37 AM Shikhar Bhushan <shik...@confluent.io>
wrote:

That makes sense to me, I'll fold that into the PR and update the KIP if it
gets committed in that form.

On Fri, Jan 6, 2017 at 9:44 AM Jason Gustafson <ja...@confluent.io> wrote:

+1 One minor comment: would it make sense to let the `Transformation`
interface extend `o.a.k.c.Configurable` and remove the `init` method?

On Thu, Jan 5, 2017 at 5:48 PM, Neha Narkhede <n...@confluent.io> wrote:

> +1 (binding)
>
> On Wed, Jan 4, 2017 at 2:36 PM Shikhar Bhushan <shik...@confluent.io>
> wrote:
>
> > I do plan on introducing a new `connect:transforms` module (which
> > `connect:runtime` will depend on), so they will live in a separate
module
> > in the source tree and output.
> >
> > ( https://github.com/apache/kafka/pull/2299 )
> >
> > On Wed, Jan 4, 2017 at 2:28 PM Ewen Cheslack-Postava <e...@confluent.io>
> > wrote:
> >
> > > +1
> > >
> > > Gwen, re: bundling transformations, would it help at all to isolate
> them
> > to
> > > a separate jar or is the concern purely about maintaining them as part
> of
> > > Kafka?
> > >
> > > -Ewen
> > >
> > > On Wed, Jan 4, 2017 at 1:31 PM, Sriram Subramanian <r...@confluent.io>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Wed, Jan 4, 2017 at 1:29 PM, Gwen Shapira <g...@confluent.io>
> > wrote:
> > > >
> > > > > I would have preferred not to bundle transformations, but since
SMT
> > > > > capability is a much needed feature, I'll take it in its current
> > form.
> > > > >
> > > > > +1
> > > > >
> > > > > On Wed, Jan 4, 2017 at 10:47 AM, Shikhar Bhushan <
> > shik...@confluent.io
> > > >
> > > > > wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I'd like to start voting on KIP-66:
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 66%3A+Single+Message+Transforms+for+Kafka+Connect
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Shikhar
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Gwen Shapira
> > > > > Product Manager | Confluent
> > > > > 650.450.2760 <(650)%20450-2760> <(650)%20450-2760>
<(650)%20450-2760> | @gwenshap
> > > > > Follow us: Twitter | blog
> > > > >
> > > >
> > >
> >
> --
> Thanks,
> Neha
>


[VOTE] KIP-66: Single Message Transforms for Kafka Connect

2017-01-04 Thread Shikhar Bhushan
Hi all,

I'd like to start voting on KIP-66:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-66%3A+Single+Message+Transforms+for+Kafka+Connect

Best,

Shikhar


[jira] [Assigned] (KAFKA-3846) Connect record types should include timestamps

2016-06-21 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan reassigned KAFKA-3846:
--

Assignee: Shikhar Bhushan  (was: Ewen Cheslack-Postava)

> Connect record types should include timestamps
> --
>
> Key: KAFKA-3846
> URL: https://issues.apache.org/jira/browse/KAFKA-3846
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
> Fix For: 0.10.1.0
>
>
> Timestamps were added to records in the previous release, however this does 
> not get propagated automatically to Connect because it uses custom wrappers  
> to add fields and rename some for clarity.
> The addition of timestamps should be trivial, but can be really useful (e.g. 
> in sink connectors that would like to include timestamp info if available but 
> when it is not stored in the value).
> This is public API so it will need a KIP despite being very uncontentious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3846) Connect record types should include timestamps

2016-06-23 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15346891#comment-15346891
 ] 

Shikhar Bhushan commented on KAFKA-3846:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-65%3A+Expose+timestamps+to+Connect

> Connect record types should include timestamps
> --
>
> Key: KAFKA-3846
> URL: https://issues.apache.org/jira/browse/KAFKA-3846
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
> Fix For: 0.10.1.0
>
>
> Timestamps were added to records in the previous release, however this does 
> not get propagated automatically to Connect because it uses custom wrappers  
> to add fields and rename some for clarity.
> The addition of timestamps should be trivial, but can be really useful (e.g. 
> in sink connectors that would like to include timestamp info if available but 
> when it is not stored in the value).
> This is public API so it will need a KIP despite being very uncontentious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3846) Connect record types should include timestamps

2016-06-23 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3846 started by Shikhar Bhushan.
--
> Connect record types should include timestamps
> --
>
> Key: KAFKA-3846
> URL: https://issues.apache.org/jira/browse/KAFKA-3846
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Shikhar Bhushan
>  Labels: needs-kip
> Fix For: 0.10.1.0
>
>
> Timestamps were added to records in the previous release, however this does 
> not get propagated automatically to Connect because it uses custom wrappers  
> to add fields and rename some for clarity.
> The addition of timestamps should be trivial, but can be really useful (e.g. 
> in sink connectors that would like to include timestamp info if available but 
> when it is not stored in the value).
> This is public API so it will need a KIP despite being very uncontentious.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2935) Remove vestigial CLUSTER_CONFIG in WorkerConfig

2016-05-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290294#comment-15290294
 ] 

Shikhar Bhushan commented on KAFKA-2935:


I could not find any documentation reference to this config, so hope the above 
patch is GTG.

p.s. Can I please be added as a contributor for JIRA :)

> Remove vestigial CLUSTER_CONFIG in WorkerConfig 
> 
>
> Key: KAFKA-2935
> URL: https://issues.apache.org/jira/browse/KAFKA-2935
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> This config isn't used anywhere anymore. Its previous reason for existence is 
> now handled by DistributedConfig.GROUP_ID_CONFIG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3335) Kafka Connect hangs in shutdown hook

2016-05-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290333#comment-15290333
 ] 

Shikhar Bhushan commented on KAFKA-3335:


Currently {{start()}} looks like

 {code:java}
public void start() {
try {
log.info("Kafka Connect starting");
Runtime.getRuntime().addShutdownHook(shutdownHook);

herder.start();
rest.start(herder);

log.info("Kafka Connect started");
} finally {
startLatch.countDown();
}
}
 {code}

Would it make sense to only add the shutdown hook at the end of this method?

> Kafka Connect hangs in shutdown hook
> 
>
> Key: KAFKA-3335
> URL: https://issues.apache.org/jira/browse/KAFKA-3335
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Ben Kirwin
>
> The `Connect` class can run into issues during start, such as:
> {noformat}
> Exception in thread "main" org.apache.kafka.connect.errors.ConnectException: 
> Could not look up partition metadata for offset backing store topic in 
> allotted period. This could indicate a connectivity issue, unavailable topic 
> partitions, or if this is your first use of the topic it may have taken too 
> long to create.
> at 
> org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:130)
> at 
> org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:85)
> at org.apache.kafka.connect.runtime.Worker.start(Worker.java:108)
> at org.apache.kafka.connect.runtime.Connect.start(Connect.java:56)
> at 
> org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:62)
> {noformat}
> This exception halts the startup process. It also triggers the shutdown 
> hook... which blocks waiting for the service to start up before calling stop. 
> This causes the process to hang forever.
> There's a few things that could be done here, but it would be nice to bound 
> the amount of time the process spends trying to exit gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4010) ConfigDef.toRst() should create sections for each group

2016-08-01 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4010:
--

 Summary: ConfigDef.toRst() should create sections for each group
 Key: KAFKA-4010
 URL: https://issues.apache.org/jira/browse/KAFKA-4010
 Project: Kafka
  Issue Type: Improvement
Reporter: Shikhar Bhushan
Priority: Minor


Currently the ordering seems a bit arbitrary. There is a logical grouping that 
connectors are now able to specify with the 'group' field, which we should use 
as section headers. Also it would be good to generate {{:ref:}} for each 
section.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3962) ConfigDef support for resource-specific configuration

2016-07-13 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-3962:
--

 Summary: ConfigDef support for resource-specific configuration
 Key: KAFKA-3962
 URL: https://issues.apache.org/jira/browse/KAFKA-3962
 Project: Kafka
  Issue Type: Improvement
Reporter: Shikhar Bhushan


It often comes up with connectors that you want some piece of configuration 
that should be overridable at the topic-level, table-level, etc.

There are a couple of possible ways to allow for this:

1. Support for map-style config properties "k1:v1,k2:v2". There are escaping 
considerations to think through here. Also, how should the user override 
fallback/default values -- perhaps {{*}} as a special resource?

2. Templatized configs -- so we can define {{$resource.some.property}} with the 
ConfigDef API, and have getter variants that take the resource argument. The 
default value is more naturally overridable here, by the user setting 
{{some.property}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3962) ConfigDef support for resource-specific configuration

2016-07-13 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-3962:
---
Description: 
It often comes up with connectors that you want some piece of configuration 
that should be overridable at the topic-level, table-level, etc.

The ConfigDef API should allow for defining these resource-overridable config 
properties and we should have getter variants that accept a resource argument, 
and return the more specific config value (falling back to the default).

There are a couple of possible ways to allow for this:

1. Support for map-style config properties "resource1:v1,resource2:v2". There 
are escaping considerations to think through here. Also, how should the user 
override fallback/default values -- perhaps {{*}} as a special resource?

2. Templatized configs -- so you would define {{$resource.some.property}}. The 
default value is more naturally overridable here, by the user setting 
{{some.property}} without the {{$resource}} prefix.

  was:
It often comes up with connectors that you want some piece of configuration 
that should be overridable at the topic-level, table-level, etc.

There are a couple of possible ways to allow for this:

1. Support for map-style config properties "k1:v1,k2:v2". There are escaping 
considerations to think through here. Also, how should the user override 
fallback/default values -- perhaps {{*}} as a special resource?

2. Templatized configs -- so we can define {{$resource.some.property}} with the 
ConfigDef API, and have getter variants that take the resource argument. The 
default value is more naturally overridable here, by the user setting 
{{some.property}}.


> ConfigDef support for resource-specific configuration
> -
>
> Key: KAFKA-3962
> URL: https://issues.apache.org/jira/browse/KAFKA-3962
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Shikhar Bhushan
>
> It often comes up with connectors that you want some piece of configuration 
> that should be overridable at the topic-level, table-level, etc.
> The ConfigDef API should allow for defining these resource-overridable config 
> properties and we should have getter variants that accept a resource 
> argument, and return the more specific config value (falling back to the 
> default).
> There are a couple of possible ways to allow for this:
> 1. Support for map-style config properties "resource1:v1,resource2:v2". There 
> are escaping considerations to think through here. Also, how should the user 
> override fallback/default values -- perhaps {{*}} as a special resource?
> 2. Templatized configs -- so you would define {{$resource.some.property}}. 
> The default value is more naturally overridable here, by the user setting 
> {{some.property}} without the {{$resource}} prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4048:
---
Summary: Connect does not support RetriableException consistently for sinks 
 (was: Connect does not support RetriableException consistently for sources & 
sinks)

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.
> We don't have support for {{RetriableException}} with {{SourceTask}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4048:
---
Description: We only allow for handling {{RetriableException}} from calls 
to {{SinkTask.put()}}, but this is something we should support also for 
{{flush()}}  and arguably also {{open()}}.  (was: We only allow for handling 
{{RetriableException}} from calls to {{SinkTask.put()}}, but this is something 
we should support also for {{flush()}}  and arguably also {{open()}}.

We don't have support for {{RetriableException}} with {{SourceTask}}.)

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4042) DistributedHerder thread can die because of connector & task lifecycle exceptions

2016-08-16 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4042:
---
Component/s: KafkaConnect

> DistributedHerder thread can die because of connector & task lifecycle 
> exceptions
> -
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> As one example, there isn't exception handling in 
> {{DistributedHerder.startConnector()}} or the call-chain for it originating 
> in the {{tick()}} on the herder thread, and it can throw an exception because 
> of a bad class name in the connector config. (report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4048) Connect does not support RetriableException consistently for sources & sinks

2016-08-16 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4048:
--

 Summary: Connect does not support RetriableException consistently 
for sources & sinks
 Key: KAFKA-4048
 URL: https://issues.apache.org/jira/browse/KAFKA-4048
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


We only allow for handling {{RetriableException}} from calls to 
{{SinkTask.put()}}, but this is something we should support also for 
{{flush()}}  and arguably also {{open()}}.

We don't have support for {{RetriableException}} with {{SourceTask}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4042) DistributedHerder thread can die because of connector & task lifecycle exceptions

2016-08-15 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4042:
---
Summary: DistributedHerder thread can die because of connector & task 
lifecycle exceptions  (was: Missing error handling in Worker.startConnector() 
can cause Herder thread to die)

> DistributedHerder thread can die because of connector & task lifecycle 
> exceptions
> -
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> While errors in {{WorkerConnector}} are handled using the 
> {{statusListener.onFailure()}} callback, before the {{WorkerConnector}} is 
> created in {{Worker.startConnector()}} there can be exceptions because of 
> e.g. a bad class name and these are currently not handled, causing the 
> {{Herder}} thread to die.
> Report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4042) DistributedHerder thread can die because of connector & task lifecycle exceptions

2016-08-15 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4042:
---
Description: As one example, there isn't exception handling in 
{{DistributedHerder.startConnector()}} or the call-chain for it originating in 
the {{tick()}} on the herder thread, and it can throw an exception because of a 
bad class name in the connector config. (report of issue in wild: 
https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)  
(was: While errors in {{WorkerConnector}} are handled using the 
{{statusListener.onFailure()}} callback, before the {{WorkerConnector}} is 
created in {{Worker.startConnector()}} there can be exceptions because of e.g. 
a bad class name and these are currently not handled, causing the {{Herder}} 
thread to die.

Report of issue in wild: 
https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)

> DistributedHerder thread can die because of connector & task lifecycle 
> exceptions
> -
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> As one example, there isn't exception handling in 
> {{DistributedHerder.startConnector()}} or the call-chain for it originating 
> in the {{tick()}} on the herder thread, and it can throw an exception because 
> of a bad class name in the connector config. (report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4042) Missing error handling in Worker.startConnector() can cause Herder thread to die

2016-08-15 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4042:
--

 Summary: Missing error handling in Worker.startConnector() can 
cause Herder thread to die
 Key: KAFKA-4042
 URL: https://issues.apache.org/jira/browse/KAFKA-4042
 Project: Kafka
  Issue Type: Bug
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


While errors in {{WorkerConnector}} are handled using the 
{{statusListener.onFailure()}} callback, before the {{WorkerConnector}} is 
created in {{Worker.startConnector()}} there can be exceptions because of e.g. 
a bad class name and these are currently not handled, causing the {{Herder}} 
thread to die.

Report of issue in wild: 
https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-15 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan reassigned KAFKA-3054:
--

Assignee: Shikhar Bhushan  (was: jin xing)

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4678) Create separate page for Connect docs

2017-01-20 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4678:
--

 Summary: Create separate page for Connect docs
 Key: KAFKA-4678
 URL: https://issues.apache.org/jira/browse/KAFKA-4678
 Project: Kafka
  Issue Type: Improvement
  Components: documentation, KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava
Priority: Minor


The single-page http://kafka.apache.org/documentation/ is quite long, and will 
get even longer with the inclusion of info on Kafka Connect's included 
transformations.

Recently Kafka Streams documentation was split off to its own page with a short 
overview in the main doc page. We should do the same for {{connect.html}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-18 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-3054.

Resolution: Done

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427362#comment-15427362
 ] 

Shikhar Bhushan commented on KAFKA-3054:


Addressing this in KAFKA-4042, which should take care of remaining robustness 
issues in the {{DistributedHerder}} from bad connector or task configs.

> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4042) DistributedHerder thread can die because of connector & task lifecycle exceptions

2016-08-19 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4042:
---
Fix Version/s: 0.10.1.0

> DistributedHerder thread can die because of connector & task lifecycle 
> exceptions
> -
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
> Fix For: 0.10.1.0
>
>
> As one example, there isn't exception handling in 
> {{DistributedHerder.startConnector()}} or the call-chain for it originating 
> in the {{tick()}} on the herder thread, and it can throw an exception because 
> of a bad class name in the connector config. (report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-4042) Missing error handling in Worker.startConnector() can cause Herder thread to die

2016-08-15 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4042 started by Shikhar Bhushan.
--
> Missing error handling in Worker.startConnector() can cause Herder thread to 
> die
> 
>
> Key: KAFKA-4042
> URL: https://issues.apache.org/jira/browse/KAFKA-4042
> Project: Kafka
>  Issue Type: Bug
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> While errors in {{WorkerConnector}} are handled using the 
> {{statusListener.onFailure()}} callback, before the {{WorkerConnector}} is 
> created in {{Worker.startConnector()}} there can be exceptions because of 
> e.g. a bad class name and these are currently not handled, causing the 
> {{Herder}} thread to die.
> Report of issue in wild: 
> https://groups.google.com/d/msg/confluent-platform/EnleFnXpZCU/3B_gRxsRAgAJ



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-3054) Connect Herder fail forever if sent a wrong connector config or task config

2016-08-15 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3054 started by Shikhar Bhushan.
--
> Connect Herder fail forever if sent a wrong connector config or task config
> ---
>
> Key: KAFKA-3054
> URL: https://issues.apache.org/jira/browse/KAFKA-3054
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: jin xing
>Assignee: Shikhar Bhushan
>Priority: Blocker
> Fix For: 0.10.1.0
>
>
> Connector Herder throws ConnectException and shutdown if sent a wrong config, 
> restarting herder will keep failing with the wrong config; It make sense that 
> herder should stay available when start connector or task failed; After 
> receiving a delete connector request, the herder can delete the wrong config 
> from "config storage"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4068) FileSinkTask - use JsonConverter to serialize

2016-08-19 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4068:
--

 Summary: FileSinkTask - use JsonConverter to serialize
 Key: KAFKA-4068
 URL: https://issues.apache.org/jira/browse/KAFKA-4068
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan
Priority: Minor


People new to Connect often try out hooking up e.g. a Kafka topic with Avro 
data to the file sink connector, only to find the file contain values like:

{noformat}
org.apache.kafka.connect.data.Struct@ca1bf85a
org.apache.kafka.connect.data.Struct@c298db6a
org.apache.kafka.connect.data.Struct@44108fbd
{noformat}

This is because currently the {{FileSinkConnector}} is meant as a toy example 
that expects the schema to be {{Schema.STRING_SCHEMA}}, though it just 
{{toString()}}'s the value without verifying that. 

A better experience would probably be if we used 
{{JsonConverter.fromConnectData()}} for serializing to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4070) Implement a useful Struct.toString()

2016-08-20 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4070:
---
Description: Logging of {{Struct}}'s does not currently provide any useful 
output, and users also find it unhelpful e.g. when hooking up a Kafka topic 
with Avro data with the {{FileSinkConnector}} which simply {{toString()}}s the 
values to the file.  (was: Logging of {{Struct}}s does not currently provide 
any useful output, and users also find it unhelpful e.g. when hooking up a 
Kafka topic with Avro data with the {{FileSinkConnector}} which simply 
{{toString()}}s the values to the file.)

> Implement a useful Struct.toString()
> 
>
> Key: KAFKA-4070
> URL: https://issues.apache.org/jira/browse/KAFKA-4070
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Priority: Minor
>
> Logging of {{Struct}}'s does not currently provide any useful output, and 
> users also find it unhelpful e.g. when hooking up a Kafka topic with Avro 
> data with the {{FileSinkConnector}} which simply {{toString()}}s the values 
> to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4070) Implement a useful Struct.toString()

2016-08-20 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4070:
---
Description: Logging of {{Struct}}'s does not currently provide any useful 
output, and users also find it unhelpful e.g. when hooking up a Kafka topic 
with Avro data with the {{FileSinkConnector}} which simply {{toString()}}'s the 
values to the file.  (was: Logging of {{Struct}}'s does not currently provide 
any useful output, and users also find it unhelpful e.g. when hooking up a 
Kafka topic with Avro data with the {{FileSinkConnector}} which simply 
{{toString()}}s the values to the file.)

> Implement a useful Struct.toString()
> 
>
> Key: KAFKA-4070
> URL: https://issues.apache.org/jira/browse/KAFKA-4070
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Priority: Minor
>
> Logging of {{Struct}}'s does not currently provide any useful output, and 
> users also find it unhelpful e.g. when hooking up a Kafka topic with Avro 
> data with the {{FileSinkConnector}} which simply {{toString()}}'s the values 
> to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4070) Implement a useful Struct.toString()

2016-08-20 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4070:
--

 Summary: Implement a useful Struct.toString()
 Key: KAFKA-4070
 URL: https://issues.apache.org/jira/browse/KAFKA-4070
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Priority: Minor


Logging of {{Struct}}s does not currently provide any useful output, and users 
also find it unhelpful e.g. when hooking up a Kafka topic with Avro data with 
the {{FileSinkConnector}} which simply {{toString()}}s the values to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4068) FileSinkTask - use JsonConverter to serialize

2016-08-20 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4068.

Resolution: Not A Problem

I was thinking JSON since it would be easy to serialize to a human-readable 
format with that. But if we want to implement a more useful 
{{Struct.toString()}} in any case for debugging purposes, we should probably do 
that instead. Fair point about keeping the file sink connector as simple as 
possible. Closing this in favor of KAFKA-4070.

> FileSinkTask - use JsonConverter to serialize
> -
>
> Key: KAFKA-4068
> URL: https://issues.apache.org/jira/browse/KAFKA-4068
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>Priority: Minor
>
> People new to Connect often try out hooking up e.g. a Kafka topic with Avro 
> data to the file sink connector, only to find the file contain values like:
> {noformat}
> org.apache.kafka.connect.data.Struct@ca1bf85a
> org.apache.kafka.connect.data.Struct@c298db6a
> org.apache.kafka.connect.data.Struct@44108fbd
> {noformat}
> This is because currently the {{FileSinkConnector}} is meant as a toy example 
> that expects the schema to be {{Schema.STRING_SCHEMA}}, though it just 
> {{toString()}}'s the value without verifying that. 
> A better experience would probably be if we used 
> {{JsonConverter.fromConnectData()}} for serializing to the file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (KAFKA-4127) Possible data loss

2016-09-06 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan closed KAFKA-4127.
--

> Possible data loss
> --
>
> Key: KAFKA-4127
> URL: https://issues.apache.org/jira/browse/KAFKA-4127
> Project: Kafka
>  Issue Type: Bug
> Environment: Normal three node Kafka cluster. All machines running 
> linux.
>Reporter: Ramnatthan Alagappan
>
> I am running a three node Kakfa cluster. ZooKeeper runs in a standalone mode.
> When I create a new message topic, I see the following sequence of system 
> calls:
> mkdir("/appdir/my-topic1-0")
> creat("/appdir/my-topic1-0/.log")
> I have configured Kafka to write the messages persistently to the disk before 
> acknowledging the client. Specifically, I have set flush.interval_messages to 
> 1, min_insync_replicas to 3, and disabled dirty election.  Now, I insert a 
> new message into the created topic.
> I see that Kafka writes the message to the log file and flushes the data down 
> to disk by carefully fsync'ing the log file. I get an acknowledgment back 
> from the cluster after the message is safely persisted on all three replicas 
> and written to disk. 
> Unfortunately, Kafka can still lose data since it does not explicitly fsync 
> the directory to persist the directory entries of the topic directory and the 
> log file. If a crash happens after acknowledging the client, it is possible 
> for Kafka lose the directory entry for the topic directory or the log file. 
> Many systems carefully issue fsync to the parent directory when a new file or 
> directory is created. This is required for the file to be completely 
> persisted to the disk.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4127) Possible data loss

2016-09-06 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468012#comment-15468012
 ] 

Shikhar Bhushan commented on KAFKA-4127:


Dupe of KAFKA-3968

> Possible data loss
> --
>
> Key: KAFKA-4127
> URL: https://issues.apache.org/jira/browse/KAFKA-4127
> Project: Kafka
>  Issue Type: Bug
> Environment: Normal three node Kafka cluster. All machines running 
> linux.
>Reporter: Ramnatthan Alagappan
>
> I am running a three node Kakfa cluster. ZooKeeper runs in a standalone mode.
> When I create a new message topic, I see the following sequence of system 
> calls:
> mkdir("/appdir/my-topic1-0")
> creat("/appdir/my-topic1-0/.log")
> I have configured Kafka to write the messages persistently to the disk before 
> acknowledging the client. Specifically, I have set flush.interval_messages to 
> 1, min_insync_replicas to 3, and disabled dirty election.  Now, I insert a 
> new message into the created topic.
> I see that Kafka writes the message to the log file and flushes the data down 
> to disk by carefully fsync'ing the log file. I get an acknowledgment back 
> from the cluster after the message is safely persisted on all three replicas 
> and written to disk. 
> Unfortunately, Kafka can still lose data since it does not explicitly fsync 
> the directory to persist the directory entries of the topic directory and the 
> log file. If a crash happens after acknowledging the client, it is possible 
> for Kafka lose the directory entry for the topic directory or the log file. 
> Many systems carefully issue fsync to the parent directory when a new file or 
> directory is created. This is required for the file to be completely 
> persisted to the disk.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4127) Possible data loss

2016-09-06 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4127.

Resolution: Duplicate

> Possible data loss
> --
>
> Key: KAFKA-4127
> URL: https://issues.apache.org/jira/browse/KAFKA-4127
> Project: Kafka
>  Issue Type: Bug
> Environment: Normal three node Kafka cluster. All machines running 
> linux.
>Reporter: Ramnatthan Alagappan
>
> I am running a three node Kakfa cluster. ZooKeeper runs in a standalone mode.
> When I create a new message topic, I see the following sequence of system 
> calls:
> mkdir("/appdir/my-topic1-0")
> creat("/appdir/my-topic1-0/.log")
> I have configured Kafka to write the messages persistently to the disk before 
> acknowledging the client. Specifically, I have set flush.interval_messages to 
> 1, min_insync_replicas to 3, and disabled dirty election.  Now, I insert a 
> new message into the created topic.
> I see that Kafka writes the message to the log file and flushes the data down 
> to disk by carefully fsync'ing the log file. I get an acknowledgment back 
> from the cluster after the message is safely persisted on all three replicas 
> and written to disk. 
> Unfortunately, Kafka can still lose data since it does not explicitly fsync 
> the directory to persist the directory entries of the topic directory and the 
> log file. If a crash happens after acknowledging the client, it is possible 
> for Kafka lose the directory entry for the topic directory or the log file. 
> Many systems carefully issue fsync to the parent directory when a new file or 
> directory is created. This is required for the file to be completely 
> persisted to the disk.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (KAFKA-3962) ConfigDef support for resource-specific configuration

2016-09-06 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468352#comment-15468352
 ] 

Shikhar Bhushan edited comment on KAFKA-3962 at 9/6/16 7:49 PM:


This is also realizable today by using {{ConfigDef.originalsWithPrefix()}}, if 
the template format is {{some.property.$resource}} rather than 
{{$resource.some.property}} as suggested above. There isn't library-level 
validation support when doing this, though.


was (Author: shikhar):
This is also realizable today by using {{ConfigDef.originalsWithPrefix()}}, if 
the template format is {{some.property.$resource}} rather than 
{{$resource.some.property}} as suggested above. There isn't framework-level 
validation support when doing this, though.

> ConfigDef support for resource-specific configuration
> -
>
> Key: KAFKA-3962
> URL: https://issues.apache.org/jira/browse/KAFKA-3962
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Shikhar Bhushan
>
> It often comes up with connectors that you want some piece of configuration 
> that should be overridable at the topic-level, table-level, etc.
> The ConfigDef API should allow for defining these resource-overridable config 
> properties and we should have getter variants that accept a resource 
> argument, and return the more specific config value (falling back to the 
> default).
> There are a couple of possible ways to allow for this:
> 1. Support for map-style config properties "resource1:v1,resource2:v2". There 
> are escaping considerations to think through here. Also, how should the user 
> override fallback/default values -- perhaps {{*}} as a special resource?
> 2. Templatized configs -- so you would define {{$resource.some.property}}. 
> The default value is more naturally overridable here, by the user setting 
> {{some.property}} without the {{$resource}} prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3962) ConfigDef support for resource-specific configuration

2016-09-06 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468352#comment-15468352
 ] 

Shikhar Bhushan commented on KAFKA-3962:


This is also realizable today by using {{ConfigDef.originalsWithPrefix()}}, if 
the template format is {{some.property.$resource}} rather than 
{{$resource.some.property}} as suggested above. There isn't framework-level 
validation support when doing this, though.

> ConfigDef support for resource-specific configuration
> -
>
> Key: KAFKA-3962
> URL: https://issues.apache.org/jira/browse/KAFKA-3962
> Project: Kafka
>  Issue Type: Improvement
>    Reporter: Shikhar Bhushan
>
> It often comes up with connectors that you want some piece of configuration 
> that should be overridable at the topic-level, table-level, etc.
> The ConfigDef API should allow for defining these resource-overridable config 
> properties and we should have getter variants that accept a resource 
> argument, and return the more specific config value (falling back to the 
> default).
> There are a couple of possible ways to allow for this:
> 1. Support for map-style config properties "resource1:v1,resource2:v2". There 
> are escaping considerations to think through here. Also, how should the user 
> override fallback/default values -- perhaps {{*}} as a special resource?
> 2. Templatized configs -- so you would define {{$resource.some.property}}. 
> The default value is more naturally overridable here, by the user setting 
> {{some.property}} without the {{$resource}} prefix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4048) Connect does not support RetriableException consistently for sinks

2016-09-06 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan resolved KAFKA-4048.

Resolution: Not A Problem

Turns out all exceptions from {{task.flush()}} are treated as retriable (see 
{{WorkerSinkTask.commitOffsets()}}), so there is nothing to do here.

> Connect does not support RetriableException consistently for sinks
> --
>
> Key: KAFKA-4048
> URL: https://issues.apache.org/jira/browse/KAFKA-4048
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> We only allow for handling {{RetriableException}} from calls to 
> {{SinkTask.put()}}, but this is something we should support also for 
> {{flush()}}  and arguably also {{open()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4115) Grow default heap settings for distributed Connect from 256M to 1G

2016-09-01 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4115:
--

 Summary: Grow default heap settings for distributed Connect from 
256M to 1G
 Key: KAFKA-4115
 URL: https://issues.apache.org/jira/browse/KAFKA-4115
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


Currently, both {{connect-standalone.sh}} and {{connect-distributed.sh}} start 
the Connect JVM with the default heap settings from {{kafka-run-class.sh}} of 
{{-Xmx256M}}.

At least for distributed connect, we should default to a much higher limit like 
1G. While the 'correct' sizing is workload dependent, with a system where you 
can run arbitrary connector plugins which may perform buffering of data, we 
should provide for more headroom.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4100) Connect Struct schemas built using SchemaBuilder with no fields cause NPE in Struct constructor

2016-08-29 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4100:
--

 Summary: Connect Struct schemas built using SchemaBuilder with no 
fields cause NPE in Struct constructor
 Key: KAFKA-4100
 URL: https://issues.apache.org/jira/browse/KAFKA-4100
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.0.1
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan
Priority: Minor
 Fix For: 0.10.1.0


Avro records can legitimately have 0 fields (though arguable how useful that 
is).

When using the Confluent Schema Registry's {{AvroConverter}} with such a schema,
{noformat}
java.lang.NullPointerException
at org.apache.kafka.connect.data.Struct.(Struct.java:56)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:980)
at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:782)
at 
io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:103)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:358)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:227)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:171)
at 
org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
at 
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{noformat}

This is because it is using the {{SchemaBuilder}} to create the Struct schema, 
which provides a {{field(..)}} builder for each field. If there are no fields, 
the list stays as null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Work started] (KAFKA-4100) Connect Struct schemas built using SchemaBuilder with no fields cause NPE in Struct constructor

2016-08-29 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4100 started by Shikhar Bhushan.
--
> Connect Struct schemas built using SchemaBuilder with no fields cause NPE in 
> Struct constructor
> ---
>
> Key: KAFKA-4100
> URL: https://issues.apache.org/jira/browse/KAFKA-4100
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.0.1
>Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> Avro records can legitimately have 0 fields (though arguable how useful that 
> is).
> When using the Confluent Schema Registry's {{AvroConverter}} with such a 
> schema,
> {noformat}
> java.lang.NullPointerException
>   at org.apache.kafka.connect.data.Struct.(Struct.java:56)
>   at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:980)
>   at io.confluent.connect.avro.AvroData.toConnectData(AvroData.java:782)
>   at 
> io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:103)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:358)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:227)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:171)
>   at 
> org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
>   at 
> org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
>   at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> This is because it is using the {{SchemaBuilder}} to create the Struct 
> schema, which provides a {{field(..)}} builder for each field. If there are 
> no fields, the list stays as null.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4159) Allow overriding producer & consumer properties at the connector level

2016-09-13 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4159:
--

 Summary: Allow overriding producer & consumer properties at the 
connector level
 Key: KAFKA-4159
 URL: https://issues.apache.org/jira/browse/KAFKA-4159
 Project: Kafka
  Issue Type: New Feature
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava


As an example use cases, overriding a sink connector's consumer's partition 
assignment strategy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4161) Allow connectors to request flush via the context

2016-09-14 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15491296#comment-15491296
 ] 

Shikhar Bhushan commented on KAFKA-4161:


bq. Probably worth clarifying whether we're really talking about just flush 
here or offset commit as well. Flush really only exists in order to support 
offset commit (from the framework's perspective), but since you mention full 
buffers I think you might be getting at a slightly different use case for 
connectors.

Sorry I wasn't clear, flushing data & offset commit are currently coupled as 
you pointed out. If we want to avoid unnecessary redelivery of records it is 
best to commit offsets with the 'most current' knowledge of them, which we 
currently have after calling {{flush()}}.

bq. In general, I think it'd actually be even better to just get rid of the 
idea of having to flush as a common operation as it hurts throughput to have to 
flush entirely to commit offsets (we are flushing the pipeline, which is never 
good). Ideally we coudl do what the framework does with source connectors and 
just track which data has been successfully delivered and use that for the 
majority of offset commits. We'd still need it for cases like shutdown where we 
want to make sure all data has been sent, but since the framework controls 
delivery of data, maybe its even better just to wait for that data to be 
written. 

Good points, I agree it would be better to make it so {{flush()}} is not 
routine since it can hurt throughput. I think we can deprecate it altogether. 
As a proposal:
{noformat}
abstract class SinkTask {
..
 // New method
public Map<TopicPartition, OffsetMetadata> flushedOffsets() { throw new 
NotImplementedException(); }

@Deprecated
public void flush(Map<TopicPartition, OffsetMetadata> offsets) { }
..
}
{noformat}

Then periodic offset committing business would get at the {{flushedOffsets()}}, 
and if that is not implemented, call {{flush()}} as currently so it can commit 
the offset state as of the last {{put()}} call.

I don't think {{flush()}} is needed even at shutdown. Tasks are already being 
advised via {{close()}} and can choose to flush any buffered data from there. 
We can do a final offset commit based on the {{flushedOffsets()}} after 
{{close()}} (though this does imply a quirk that even after a 
{{TopicPartition}} is closed we expect tasks to keep offset state around in the 
map returned by {{flushedOffsets()}}).

Additionally, it would be good to have a {{context.requestCommit()}} in the 
spirit of {{context.requestFlush()}} as I was originally proposing. The 
motivation is that connectors can optimize for avoiding unnecessary redelivery 
when recovering from failures. Connectors can choose whatever policies are best 
like number-of-records or size-based batching/buffering for writing to the 
destination system as part of the normal flow of calls to {{put()}}, and 
request a commit when they have actually written data to the destination 
system. There need not be a strong guarantee about whether offset committing 
actually happens after such a request so we don't commit offsets too often and 
can choose to only do it after some minimum interval, e.g. in case a connector 
always requests commit after a put.

bq. The main reason I think we even need the explicit flush() is that some 
connectors may have very long delays between flushes (e.g. any object stores 
like S3) such that they need to be told directly that they need to write all 
their data (or discard it).

I don't believe it is currently possible for a connector to communicate that it 
wants to discard data rather than write it out when {{flush()}} is called 
(aside from I guess throwing an exception...). With the above proposal the 
decision of when and whether or not to write data would be completely upto 
connectors.

bq. Was there a specific connector & scenario you were thinking about here?

This came up in a thread on the user list ('Sink Connector feature request: 
SinkTask.putAndReport()')

> Allow connectors to request flush via the context
> -
>
> Key: KAFKA-4161
> URL: https://issues.apache.org/jira/browse/KAFKA-4161
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Shikhar Bhushan
>Assignee: Ewen Cheslack-Postava
>  Labels: needs-kip
>
> It is desirable to have, in addition to the time-based flush interval, volume 
> or size-based commits. E.g. a sink connector which is buffering in terms of 
> number of records may want to request a flush when the buffer is full, or 
> when sufficient amount of data has been buffered in a file.
> Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
>

[jira] [Created] (KAFKA-4173) SchemaProjector should successfully project when source schema field is missing and target schema field is optional

2016-09-14 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4173:
--

 Summary: SchemaProjector should successfully project when source 
schema field is missing and target schema field is optional
 Key: KAFKA-4173
 URL: https://issues.apache.org/jira/browse/KAFKA-4173
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Shikhar Bhushan


As reported in https://github.com/confluentinc/kafka-connect-hdfs/issues/115



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4161) Allow connectors to request flush via the context

2016-09-13 Thread Shikhar Bhushan (JIRA)
Shikhar Bhushan created KAFKA-4161:
--

 Summary: Allow connectors to request flush via the context
 Key: KAFKA-4161
 URL: https://issues.apache.org/jira/browse/KAFKA-4161
 Project: Kafka
  Issue Type: New Feature
  Components: KafkaConnect
Reporter: Shikhar Bhushan
Assignee: Ewen Cheslack-Postava


It is desirable to have, in addition to the time-based flush interval, volume 
or size-based commits. E.g. a sink connector which is buffering in terms of 
number of records may want to request a flush when the buffer is full, or when 
sufficient amount of data has been buffered in a file.

Having a method like say {{requestFlush()}} on the {{SinkTaskContext}} would 
allow for connectors to have flexible policies around flushes. This would be in 
addition to the time interval based flushes that are controlled with 
{{offset.flush.interval.ms}}, for which the clock should be reset when any kind 
of flush happens.

We should probably also support requesting flushes via the 
{{SourceTaskContext}} for consistency though a use-case doesn't come to mind 
off the bat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4154) Kafka Connect fails to shutdown if it has not completed startup

2016-09-12 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4154:
---
Fix Version/s: (was: 0.10.0.2)

> Kafka Connect fails to shutdown if it has not completed startup
> ---
>
> Key: KAFKA-4154
> URL: https://issues.apache.org/jira/browse/KAFKA-4154
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
>
> To reproduce:
> 1. Start Kafka Connect in distributed mode without Kafka running 
> {{./bin/connect-distributed.sh config/connect-distributed.properties}}
> 2. Ctrl+C fails to terminate the process
> thread dump:
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode):
> "Thread-1" #13 prio=5 os_prio=31 tid=0x7fc29a18a800 nid=0x7007 waiting on 
> condition [0x73129000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007bd7d91d8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.stop(DistributedHerder.java:357)
>   at 
> org.apache.kafka.connect.runtime.Connect.stop(Connect.java:71)
>   at 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook.run(Connect.java:93)
> "SIGINT handler" #27 daemon prio=9 os_prio=31 tid=0x7fc29aa6a000 
> nid=0x560f in Object.wait() [0x71a61000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
>   at 
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
>   at java.lang.Shutdown.runHooks(Shutdown.java:123)
>   at java.lang.Shutdown.sequence(Shutdown.java:167)
>   at java.lang.Shutdown.exit(Shutdown.java:212)
>   - locked <0x0007b0244600> (a java.lang.Class for 
> java.lang.Shutdown)
>   at java.lang.Terminator$1.handle(Terminator.java:52)
>   at sun.misc.Signal$1.run(Signal.java:212)
>   at java.lang.Thread.run(Thread.java:745)
> "kafka-producer-network-thread | producer-1" #15 daemon prio=5 os_prio=31 
> tid=0x7fc29a0b7000 nid=0x7a03 runnable [0x72608000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
>   at 
> sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
>   at 
> sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0007bd7788d8> (a sun.nio.ch.Util$2)
>   - locked <0x0007bd7788e8> (a 
> java.util.Collections$UnmodifiableSet)
>   - locked <0x0007bd77> (a sun.nio.ch.KQueueSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.kafka.common.network.Selector.select(Selector.java:470)
>   at 
> org.apache.kafka.common.network.Selector.poll(Selector.java:286)
>   at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:235)
>   at 
> org.apache.kafka.clients.produc

[jira] [Updated] (KAFKA-4154) Kafka Connect fails to shutdown if it has not completed startup

2016-09-12 Thread Shikhar Bhushan (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shikhar Bhushan updated KAFKA-4154:
---
Fix Version/s: (was: 0.10.1.0)
   0.10.0.2

> Kafka Connect fails to shutdown if it has not completed startup
> ---
>
> Key: KAFKA-4154
> URL: https://issues.apache.org/jira/browse/KAFKA-4154
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>    Reporter: Shikhar Bhushan
>Assignee: Shikhar Bhushan
> Fix For: 0.10.0.2
>
>
> To reproduce:
> 1. Start Kafka Connect in distributed mode without Kafka running 
> {{./bin/connect-distributed.sh config/connect-distributed.properties}}
> 2. Ctrl+C fails to terminate the process
> thread dump:
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.92-b14 mixed mode):
> "Thread-1" #13 prio=5 os_prio=31 tid=0x7fc29a18a800 nid=0x7007 waiting on 
> condition [0x73129000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x0007bd7d91d8> (a 
> java.util.concurrent.CountDownLatch$Sync)
>   at 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>   at 
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
>   at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.stop(DistributedHerder.java:357)
>   at 
> org.apache.kafka.connect.runtime.Connect.stop(Connect.java:71)
>   at 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook.run(Connect.java:93)
> "SIGINT handler" #27 daemon prio=9 os_prio=31 tid=0x7fc29aa6a000 
> nid=0x560f in Object.wait() [0x71a61000]
>java.lang.Thread.State: WAITING (on object monitor)
>   at java.lang.Object.wait(Native Method)
>   - waiting on <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1245)
>   - locked <0x0007bd63db38> (a 
> org.apache.kafka.connect.runtime.Connect$ShutdownHook)
>   at java.lang.Thread.join(Thread.java:1319)
>   at 
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
>   at 
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
>   at java.lang.Shutdown.runHooks(Shutdown.java:123)
>   at java.lang.Shutdown.sequence(Shutdown.java:167)
>   at java.lang.Shutdown.exit(Shutdown.java:212)
>   - locked <0x0007b0244600> (a java.lang.Class for 
> java.lang.Shutdown)
>   at java.lang.Terminator$1.handle(Terminator.java:52)
>   at sun.misc.Signal$1.run(Signal.java:212)
>   at java.lang.Thread.run(Thread.java:745)
> "kafka-producer-network-thread | producer-1" #15 daemon prio=5 os_prio=31 
> tid=0x7fc29a0b7000 nid=0x7a03 runnable [0x72608000]
>java.lang.Thread.State: RUNNABLE
>   at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
>   at 
> sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
>   at 
> sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
>   at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>   - locked <0x0007bd7788d8> (a sun.nio.ch.Util$2)
>   - locked <0x0007bd7788e8> (a 
> java.util.Collections$UnmodifiableSet)
>   - locked <0x0007bd77> (a sun.nio.ch.KQueueSelectorImpl)
>   at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
>   at 
> org.apache.kafka.common.network.Selector.select(Selector.java:470)
>   at 
> org.apache.kafka.common.network.Selector.poll(Selector.java:286)
>   at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260)
>   at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:2

  1   2   >