[jira] [Resolved] (KAFKA-2124) gradlew is not working on a fresh checkout

2018-12-13 Thread Grant Henke (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KAFKA-2124.

Resolution: Duplicate
  Assignee: Grant Henke

> gradlew is not working on a fresh checkout
> --
>
> Key: KAFKA-2124
> URL: https://issues.apache.org/jira/browse/KAFKA-2124
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Reporter: Jakob Homan
>    Assignee: Grant Henke
>Priority: Major
>
> For a fresh checkout, the gradlew script is not working:
> {noformat}heimdallr 15:54 $ asfclone kafka
> Cloning into 'kafka'...
> remote: Counting objects: 25676, done.
> remote: Compressing objects: 100% (36/36), done.
> remote: Total 25676 (delta 5), reused 0 (delta 0), pack-reused 25627
> Receiving objects: 100% (25676/25676), 19.58 MiB | 4.29 MiB/s, done.
> Resolving deltas: 100% (13852/13852), done.
> Checking connectivity... done.
> /tmp/kafka /tmp
> /tmp
> ✔ /tmp
> heimdallr 15:54 $ cd kafka
> ✔ /tmp/kafka [trunk|✔]
> heimdallr 15:54 $ ./gradlew tasks
> Error: Could not find or load main class 
> org.gradle.wrapper.GradleWrapperMain{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-4906) Support 0.9 brokers with a newer Producer or Consumer version

2017-08-26 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KAFKA-4906.

Resolution: Won't Fix

> Support 0.9 brokers with a newer Producer or Consumer version
> -
>
> Key: KAFKA-4906
> URL: https://issues.apache.org/jira/browse/KAFKA-4906
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.2.0
>    Reporter: Grant Henke
>Assignee: Grant Henke
>
> KAFKA-4507 added the ability for newer Kafka clients to talk to older Kafka 
> brokers if a new feature supported by a newer wire protocol was not 
> used/required. 
> We currently support brokers as old as 0.10.0.0 because thats when the 
> ApiVersionsRequest/Response was added to the broker (KAFKA-3307).
> However, there are relatively few changes between 0.9.0.0 and 0.10.0.0 on the 
> wire, making it possible to support another major broker version set by 
> assuming that any disconnect resulting from an ApiVersionsRequest is from a 
> 0.9 broker and defaulting to legacy protocol versions. 
> Supporting 0.9 with newer clients can drastically simplify upgrades, allow 
> for libraries and frameworks to easily support a wider set of environments, 
> and let developers take advantage of client side improvements without 
> requiring cluster upgrades first. 
> Below is a list of the wire protocol versions by release for reference: 
> {noformat}
> 0.10.x
>   Produce(0): 0 to 2
>   Fetch(1): 0 to 2 
>   Offsets(2): 0
>   Metadata(3): 0 to 1
>   OffsetCommit(8): 0 to 2
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): 0
>   Heartbeat(12): 0
>   LeaveGroup(13): 0
>   SyncGroup(14): 0
>   DescribeGroups(15): 0
>   ListGroups(16): 0
>   SaslHandshake(17): 0
>   ApiVersions(18): 0
> 0.9.x:
>   Produce(0): 0 to 1 (no response timestamp from v2)
>   Fetch(1): 0 to 1 (no response timestamp from v2)
>   Offsets(2): 0
>   Metadata(3): 0 (no cluster id or rack info from v1)
>   OffsetCommit(8): 0 to 2
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): 0
>   Heartbeat(12): 0
>   LeaveGroup(13): 0
>   SyncGroup(14): 0
>   DescribeGroups(15): 0
>   ListGroups(16): 0
>   SaslHandshake(17): UNSUPPORTED
>   ApiVersions(18): UNSUPPORTED
> 0.8.2.x:
>   Produce(0): 0 (no quotas from v1)
>   Fetch(1): 0 (no quotas from v1)
>   Offsets(2): 0
>   Metadata(3): 0
>   OffsetCommit(8): 0 to 1 (no global retention time from v2)
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): UNSUPPORTED
>   Heartbeat(12): UNSUPPORTED
>   LeaveGroup(13): UNSUPPORTED
>   SyncGroup(14): UNSUPPORTED
>   DescribeGroups(15): UNSUPPORTED
>   ListGroups(16): UNSUPPORTED
>   SaslHandshake(17): UNSUPPORTED
>   ApiVersions(18): UNSUPPORTED
> {noformat}
> Note: Due to KAFKA-3088 it may take up to request.timeout.time to fail an 
> ApiVersionsRequest and failover to legacy protocol versions unless we handle 
> that scenario specifically in this patch. The workaround would be to reduce 
> request.timeout.time if needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KAFKA-4906) Support 0.9 brokers with a newer Producer or Consumer version

2017-03-15 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927449#comment-15927449
 ] 

Grant Henke commented on KAFKA-4906:


[~ijuma] Following up with a summary of some of our out of band chats here. 

In smoke testing of a WIP patch it appeared I was able to send messages from a 
trunk client to a 0.9 broker and receive the from a trunk consumer. We were a 
bit confused by this since the message format had changed and should not be 
parsable. I think since I was using uncompressed messages and a regular topic, 
the message could pass through without the format really being parsed or 
validated. 

However, that is likely not the case for a compacted topic or a compressed 
message set. More testing would be needed to be sure. 

Regardless the safest approach would likely be to ensure the message format 
matches the producer message version. (Produce v1 = Message Format 0, and 
Produce V2 = Message Format 1). I will investigate further and see how large of 
a change is required before posting anything further to do that. 

> Support 0.9 brokers with a newer Producer or Consumer version
> -
>
> Key: KAFKA-4906
> URL: https://issues.apache.org/jira/browse/KAFKA-4906
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.2.0
>    Reporter: Grant Henke
>Assignee: Grant Henke
>
> KAFKA-4507 added the ability for newer Kafka clients to talk to older Kafka 
> brokers if a new feature supported by a newer wire protocol was not 
> used/required. 
> We currently support brokers as old as 0.10.0.0 because thats when the 
> ApiVersionsRequest/Response was added to the broker (KAFKA-3307).
> However, there are relatively few changes between 0.9.0.0 and 0.10.0.0 on the 
> wire, making it possible to support another major broker version set by 
> assuming that any disconnect resulting from an ApiVersionsRequest is from a 
> 0.9 broker and defaulting to legacy protocol versions. 
> Supporting 0.9 with newer clients can drastically simplify upgrades, allow 
> for libraries and frameworks to easily support a wider set of environments, 
> and let developers take advantage of client side improvements without 
> requiring cluster upgrades first. 
> Below is a list of the wire protocol versions by release for reference: 
> {noformat}
> 0.10.x
>   Produce(0): 0 to 2
>   Fetch(1): 0 to 2 
>   Offsets(2): 0
>   Metadata(3): 0 to 1
>   OffsetCommit(8): 0 to 2
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): 0
>   Heartbeat(12): 0
>   LeaveGroup(13): 0
>   SyncGroup(14): 0
>   DescribeGroups(15): 0
>   ListGroups(16): 0
>   SaslHandshake(17): 0
>   ApiVersions(18): 0
> 0.9.x:
>   Produce(0): 0 to 1 (no response timestamp from v2)
>   Fetch(1): 0 to 1 (no response timestamp from v2)
>   Offsets(2): 0
>   Metadata(3): 0 (no cluster id or rack info from v1)
>   OffsetCommit(8): 0 to 2
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): 0
>   Heartbeat(12): 0
>   LeaveGroup(13): 0
>   SyncGroup(14): 0
>   DescribeGroups(15): 0
>   ListGroups(16): 0
>   SaslHandshake(17): UNSUPPORTED
>   ApiVersions(18): UNSUPPORTED
> 0.8.2.x:
>   Produce(0): 0 (no quotas from v1)
>   Fetch(1): 0 (no quotas from v1)
>   Offsets(2): 0
>   Metadata(3): 0
>   OffsetCommit(8): 0 to 1 (no global retention time from v2)
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): UNSUPPORTED
>   Heartbeat(12): UNSUPPORTED
>   LeaveGroup(13): UNSUPPORTED
>   SyncGroup(14): UNSUPPORTED
>   DescribeGroups(15): UNSUPPORTED
>   ListGroups(16): UNSUPPORTED
>   SaslHandshake(17): UNSUPPORTED
>   ApiVersions(18): UNSUPPORTED
> {noformat}
> Note: Due to KAFKA-3088 it may take up to request.timeout.time to fail an 
> ApiVersionsRequest and failover to legacy protocol versions unless we handle 
> that scenario specifically in this patch. The workaround would be to reduce 
> request.timeout.time if needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4906) Support 0.9 brokers with a newer Producer or Consumer version

2017-03-15 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-4906:
---
Fix Version/s: (was: 0.10.2.1)

> Support 0.9 brokers with a newer Producer or Consumer version
> -
>
> Key: KAFKA-4906
> URL: https://issues.apache.org/jira/browse/KAFKA-4906
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.10.2.0
>    Reporter: Grant Henke
>Assignee: Grant Henke
>
> KAFKA-4507 added the ability for newer Kafka clients to talk to older Kafka 
> brokers if a new feature supported by a newer wire protocol was not 
> used/required. 
> We currently support brokers as old as 0.10.0.0 because thats when the 
> ApiVersionsRequest/Response was added to the broker (KAFKA-3307).
> However, there are relatively few changes between 0.9.0.0 and 0.10.0.0 on the 
> wire, making it possible to support another major broker version set by 
> assuming that any disconnect resulting from an ApiVersionsRequest is from a 
> 0.9 broker and defaulting to legacy protocol versions. 
> Supporting 0.9 with newer clients can drastically simplify upgrades, allow 
> for libraries and frameworks to easily support a wider set of environments, 
> and let developers take advantage of client side improvements without 
> requiring cluster upgrades first. 
> Below is a list of the wire protocol versions by release for reference: 
> {noformat}
> 0.10.x
>   Produce(0): 0 to 2
>   Fetch(1): 0 to 2 
>   Offsets(2): 0
>   Metadata(3): 0 to 1
>   OffsetCommit(8): 0 to 2
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): 0
>   Heartbeat(12): 0
>   LeaveGroup(13): 0
>   SyncGroup(14): 0
>   DescribeGroups(15): 0
>   ListGroups(16): 0
>   SaslHandshake(17): 0
>   ApiVersions(18): 0
> 0.9.x:
>   Produce(0): 0 to 1 (no response timestamp from v2)
>   Fetch(1): 0 to 1 (no response timestamp from v2)
>   Offsets(2): 0
>   Metadata(3): 0 (no cluster id or rack info from v1)
>   OffsetCommit(8): 0 to 2
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): 0
>   Heartbeat(12): 0
>   LeaveGroup(13): 0
>   SyncGroup(14): 0
>   DescribeGroups(15): 0
>   ListGroups(16): 0
>   SaslHandshake(17): UNSUPPORTED
>   ApiVersions(18): UNSUPPORTED
> 0.8.2.x:
>   Produce(0): 0 (no quotas from v1)
>   Fetch(1): 0 (no quotas from v1)
>   Offsets(2): 0
>   Metadata(3): 0
>   OffsetCommit(8): 0 to 1 (no global retention time from v2)
>   OffsetFetch(9): 0 to 1
>   GroupCoordinator(10): 0
>   JoinGroup(11): UNSUPPORTED
>   Heartbeat(12): UNSUPPORTED
>   LeaveGroup(13): UNSUPPORTED
>   SyncGroup(14): UNSUPPORTED
>   DescribeGroups(15): UNSUPPORTED
>   ListGroups(16): UNSUPPORTED
>   SaslHandshake(17): UNSUPPORTED
>   ApiVersions(18): UNSUPPORTED
> {noformat}
> Note: Due to KAFKA-3088 it may take up to request.timeout.time to fail an 
> ApiVersionsRequest and failover to legacy protocol versions unless we handle 
> that scenario specifically in this patch. The workaround would be to reduce 
> request.timeout.time if needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4906) Support 0.9 brokers with a newer Producer or Consumer version

2017-03-15 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-4906:
--

 Summary: Support 0.9 brokers with a newer Producer or Consumer 
version
 Key: KAFKA-4906
 URL: https://issues.apache.org/jira/browse/KAFKA-4906
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Affects Versions: 0.10.2.0
Reporter: Grant Henke
Assignee: Grant Henke
 Fix For: 0.10.2.1


KAFKA-4507 added the ability for newer Kafka clients to talk to older Kafka 
brokers if a new feature supported by a newer wire protocol was not 
used/required. 

We currently support brokers as old as 0.10.0.0 because thats when the 
ApiVersionsRequest/Response was added to the broker (KAFKA-3307).

However, there are relatively few changes between 0.9.0.0 and 0.10.0.0 on the 
wire, making it possible to support another major broker version set by 
assuming that any disconnect resulting from an ApiVersionsRequest is from a 0.9 
broker and defaulting to legacy protocol versions. 

Supporting 0.9 with newer clients can drastically simplify upgrades, allow for 
libraries and frameworks to easily support a wider set of environments, and let 
developers take advantage of client side improvements without requiring cluster 
upgrades first. 

Below is a list of the wire protocol versions by release for reference: 
{noformat}
0.10.x
Produce(0): 0 to 2
Fetch(1): 0 to 2 
Offsets(2): 0
Metadata(3): 0 to 1
OffsetCommit(8): 0 to 2
OffsetFetch(9): 0 to 1
GroupCoordinator(10): 0
JoinGroup(11): 0
Heartbeat(12): 0
LeaveGroup(13): 0
SyncGroup(14): 0
DescribeGroups(15): 0
ListGroups(16): 0
SaslHandshake(17): 0
ApiVersions(18): 0

0.9.x:
Produce(0): 0 to 1 (no response timestamp from v2)
Fetch(1): 0 to 1 (no response timestamp from v2)
Offsets(2): 0
Metadata(3): 0 (no cluster id or rack info from v1)
OffsetCommit(8): 0 to 2
OffsetFetch(9): 0 to 1
GroupCoordinator(10): 0
JoinGroup(11): 0
Heartbeat(12): 0
LeaveGroup(13): 0
SyncGroup(14): 0
DescribeGroups(15): 0
ListGroups(16): 0
SaslHandshake(17): UNSUPPORTED
ApiVersions(18): UNSUPPORTED

0.8.2.x:
Produce(0): 0 (no quotas from v1)
Fetch(1): 0 (no quotas from v1)
Offsets(2): 0
Metadata(3): 0
OffsetCommit(8): 0 to 1 (no global retention time from v2)
OffsetFetch(9): 0 to 1
GroupCoordinator(10): 0
JoinGroup(11): UNSUPPORTED
Heartbeat(12): UNSUPPORTED
LeaveGroup(13): UNSUPPORTED
SyncGroup(14): UNSUPPORTED
DescribeGroups(15): UNSUPPORTED
ListGroups(16): UNSUPPORTED
SaslHandshake(17): UNSUPPORTED
ApiVersions(18): UNSUPPORTED
{noformat}

Note: Due to KAFKA-3088 it may take up to request.timeout.time to fail an 
ApiVersionsRequest and failover to legacy protocol versions unless we handle 
that scenario specifically in this patch. The workaround would be to reduce 
request.timeout.time if needed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-117: Add a public AdminClient API for Kafka admin operations

2017-03-14 Thread Grant Henke
+1

On Tue, Mar 14, 2017 at 2:44 AM, Sriram Subramanian <r...@confluent.io>
wrote:

> +1 (binding)
>
> Nice work in driving this.
>
> On Mon, Mar 13, 2017 at 10:31 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > +1 (binding)
> >
> > I expressed few concerns in the discussion thread, but in general this is
> > super important to get done.
> >
> > On Fri, Mar 10, 2017 at 10:38 AM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start voting on KIP-117
> > > (https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 117%3A+Add+a+public+AdminClient+API+for+Kafka+admin+operations
> > > ).
> > >
> > > The discussion thread can be found here:
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg65697.html
> > >
> > > best,
> > > Colin
> > >
> >
> >
> >
> > --
> > *Gwen Shapira*
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> > <http://www.confluent.io/blog>
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [VOTE] KIP-122: Add Reset Consumer Group Offsets tooling

2017-03-14 Thread Grant Henke
+1. Agreed. This is a great tool to have.

On Tue, Mar 14, 2017 at 12:33 AM, Gwen Shapira <g...@confluent.io> wrote:

> +1 (binding)
>
> Nice job - this is going to be super useful.
>
> On Thu, Feb 23, 2017 at 4:46 PM, Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi All,
> >
> > It seems that there is no further concern with the KIP-122.
> > At this point we would like to start the voting process.
> >
> > The KIP can be found here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 122%3A+Add+Reset+Consumer+Group+Offsets+tooling
> >
> >
> > Thanks!
> >
> > Jorge.
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> <http://www.confluent.io/blog>
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Is there already a KIP or JIRA to change auto.create.topics.enable to default to false?

2017-03-02 Thread Grant Henke
Feel free to create a KIP based on that discussion and drive the jira. I
don't think a KIP exists yet.

On Thu, Mar 2, 2017 at 1:28 PM, Jeff Widman <j...@netskope.com> wrote:

> Thanks, that's the ticket I was thinking of.
>
> On Thu, Mar 2, 2017 at 11:19 AM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > I think the idea was that once clients have the ability to create topics,
> > we would move "auto topic creation" client side and deprecate and
> > eventually remove the support for server side auto create. This
> simplifies
> > error handling, authorization, and puts the client in control of details
> > like partition counts and replication factor.
> >
> > There is a jira (KAFKA-2410
> > <https://issues.apache.org/jira/browse/KAFKA-2410>) tracking the move to
> > client side auto topic creation and a discussion about some of the
> details
> > here:
> > http://search-hadoop.com/m/Kafka/uyzND1yAwWoCt1yc?subj=+
> > DISCUSS+Client+Side+Auto+Topic+Creation
> >
> > Since a change in the default behavior of auto.create.topics.enable would
> > be considered "breaking" I think it would be best to consider this change
> > as a part of the deprecation and eventual removal of the configuration.
> > Perhaps 0.11 would be a good timeframe to consider doing that, but it
> > depends on if the supporting features are complete.
> >
> > Thanks,
> > Grant
> >
> >
> >
> > On Thu, Mar 2, 2017 at 12:56 PM, Jeff Widman <j...@netskope.com> wrote:
> >
> > > I thought I saw mention somewhere of changing the default of
> > > auto.create.topics.enable to false.
> > >
> > > I searched, but couldn't find anything in JIRA... am I imagining
> things?
> > >
> > > Now that there's API support for creating topics, the version bump to
> > > 0.11.0 seems like a good time to re-evaluate whether this default
> should
> > be
> > > flipped to false.
> > >
> > > I'm happy to create a KIP if needed, just didn't want to duplicate
> > effort.
> > >
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Is there already a KIP or JIRA to change auto.create.topics.enable to default to false?

2017-03-02 Thread Grant Henke
I think the idea was that once clients have the ability to create topics,
we would move "auto topic creation" client side and deprecate and
eventually remove the support for server side auto create. This simplifies
error handling, authorization, and puts the client in control of details
like partition counts and replication factor.

There is a jira (KAFKA-2410
<https://issues.apache.org/jira/browse/KAFKA-2410>) tracking the move to
client side auto topic creation and a discussion about some of the details
here:
http://search-hadoop.com/m/Kafka/uyzND1yAwWoCt1yc?subj=+DISCUSS+Client+Side+Auto+Topic+Creation

Since a change in the default behavior of auto.create.topics.enable would
be considered "breaking" I think it would be best to consider this change
as a part of the deprecation and eventual removal of the configuration.
Perhaps 0.11 would be a good timeframe to consider doing that, but it
depends on if the supporting features are complete.

Thanks,
Grant



On Thu, Mar 2, 2017 at 12:56 PM, Jeff Widman <j...@netskope.com> wrote:

> I thought I saw mention somewhere of changing the default of
> auto.create.topics.enable to false.
>
> I searched, but couldn't find anything in JIRA... am I imagining things?
>
> Now that there's API support for creating topics, the version bump to
> 0.11.0 seems like a good time to re-evaluate whether this default should be
> flipped to false.
>
> I'm happy to create a KIP if needed, just didn't want to duplicate effort.
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [VOTE] KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-03-02 Thread Grant Henke
+1

On Wed, Mar 1, 2017 at 9:44 PM, Gwen Shapira <g...@confluent.io> wrote:

> +1
>
> On Tue, Feb 28, 2017 at 2:40 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi everyone,
> >
> > Since the few who responded in the discuss thread were in favour and
> there
> > were no objections, I would like to initiate the voting process for
> > KIP-119: Drop Support for Scala 2.10 in Kafka 0.11:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
> >
> > The vote will run for a minimum of 72 hours.
> >
> > Thanks,
> > Ismael
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter <https://twitter.com/ConfluentInc> | blog
> <http://www.confluent.io/blog>
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Commented] (KAFKA-2729) Cached zkVersion not equal to that in zookeeper, broker not recovering.

2017-02-22 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15879124#comment-15879124
 ] 

Grant Henke commented on KAFKA-2729:


I am curious if everyone on this Jira is actually seeing the reported issue. I 
have had multiple cases where someone presented my with an environment they 
thought was experiencing this issue. After researching the environment and 
logs, to date it has always been something else. 

The main culprits so far have been:
* Long GC pauses causing zookeeper sessions to timeout
* Slow or poorly configured zookeeper
* Bad network configuration

All of the above resulted in a soft reoccurring failure of brokers. That churn 
often caused addition load perpetuating the issue. 

If you are seeing this issue do you see the following pattern repeating in the 
logs?:
{noformat}
INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Disconnected)
...
INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (Expired)
INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, 
session 0x153ab38abdbd360 has expired, closing socket connection
...
INFO org.I0Itec.zkclient.ZkClient: zookeeper state changed (SyncConnected)
INFO kafka.server.KafkaHealthcheck: re-registering broker info in ZK for broker 
32
INFO kafka.utils.ZKCheckedEphemeral: Creating /brokers/ids/32 (is it secure? 
false)
INFO kafka.utils.ZKCheckedEphemeral: Result of znode creation is: OK
{noformat}

If so, something is causing communication with zookeeper to take too long and 
the broker is unregistering itself. This will cause ISRs to shrink and expand 
over and over again.

I don't think this will solve everyones issue here, but hopefully it will help 
solve some.



> Cached zkVersion not equal to that in zookeeper, broker not recovering.
> ---
>
> Key: KAFKA-2729
> URL: https://issues.apache.org/jira/browse/KAFKA-2729
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.1
>Reporter: Danil Serdyuchenko
>
> After a small network wobble where zookeeper nodes couldn't reach each other, 
> we started seeing a large number of undereplicated partitions. The zookeeper 
> cluster recovered, however we continued to see a large number of 
> undereplicated partitions. Two brokers in the kafka cluster were showing this 
> in the logs:
> {code}
> [2015-10-27 11:36:00,888] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Shrinking ISR for 
> partition [__samza_checkpoint_event-creation_1,3] from 6,5 to 5 
> (kafka.cluster.Partition)
> [2015-10-27 11:36:00,891] INFO Partition 
> [__samza_checkpoint_event-creation_1,3] on broker 5: Cached zkVersion [66] 
> not equal to that in zookeeper, skip updating ISR (kafka.cluster.Partition)
> {code}
> For all of the topics on the effected brokers. Both brokers only recovered 
> after a restart. Our own investigation yielded nothing, I was hoping you 
> could shed some light on this issue. Possibly if it's related to: 
> https://issues.apache.org/jira/browse/KAFKA-1382 , however we're using 
> 0.8.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4754) Correctly parse '=' characters in command line overrides

2017-02-17 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872139#comment-15872139
 ] 

Grant Henke commented on KAFKA-4754:


{quote}
This could expose the password to anyone who is able to run ps on the system, 
or look at the bash history. So I'm not sure that we should be concerned about 
the println
{quote}

I think its worth adding, just because 1 thing is wrong and a security hole 
,doesn't mean we shouldn't close of fix others. If security were all or nothing 
we would be left with nothing. Often application logs are passed around 
aggregated and collected. Access to a machine to run ps or look at the history 
is a much lower concern than that.

> Correctly parse '=' characters in command line overrides
> 
>
> Key: KAFKA-4754
> URL: https://issues.apache.org/jira/browse/KAFKA-4754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
>
> When starting Kafka with an override parameter via "--override 
> my.parameter=myvalue".
> If a value contains an '=' character it fails and exits with "Invalid command 
> line properties:.."
> Often passwords contain an '=' character so its important to support that 
> value. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4754) Correctly parse '=' characters in command line overrides

2017-02-17 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872127#comment-15872127
 ] 

Grant Henke commented on KAFKA-4754:


{quote}
Hmm. It is not a good practice to pass passwords through the command line. 
{quote}

I agree, but my usage is not via command line. Its actually used internal to 
the application and used to improve security. This functionality supports a 
workaround since there was pushback of the feature proposed in KAFKA-2629. I 
generate the password, and pass it via a call to kafka.Kafka.main(args: 
Array[String]).



> Correctly parse '=' characters in command line overrides
> 
>
> Key: KAFKA-4754
> URL: https://issues.apache.org/jira/browse/KAFKA-4754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
>
> When starting Kafka with an override parameter via "--override 
> my.parameter=myvalue".
> If a value contains an '=' character it fails and exits with "Invalid command 
> line properties:.."
> Often passwords contain an '=' character so its important to support that 
> value. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4754) Correctly parse '=' characters in command line overrides

2017-02-09 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-4754:
---
Status: Patch Available  (was: Open)

> Correctly parse '=' characters in command line overrides
> 
>
> Key: KAFKA-4754
> URL: https://issues.apache.org/jira/browse/KAFKA-4754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
>
> When starting Kafka with an override parameter via "--override 
> my.parameter=myvalue".
> If a value contains an '=' character it fails and exits with "Invalid command 
> line properties:.."
> Often passwords contain an '=' character so its important to support that 
> value. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4754) Correctly parse '=' characters in command line overrides

2017-02-09 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15860597#comment-15860597
 ] 

Grant Henke commented on KAFKA-4754:


Its worth noting, it was also possible to echo out passwords on any error in 
this code path via CommandLineUtils.parseKeyValueArgs: 
{noformat}
System.err.println("Invalid command line properties: " + args.mkString(" "))
{noformat}

> Correctly parse '=' characters in command line overrides
> 
>
> Key: KAFKA-4754
> URL: https://issues.apache.org/jira/browse/KAFKA-4754
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>    Reporter: Grant Henke
>Assignee: Grant Henke
>
> When starting Kafka with an override parameter via "--override 
> my.parameter=myvalue".
> If a value contains an '=' character it fails and exits with "Invalid command 
> line properties:.."
> Often passwords contain an '=' character so its important to support that 
> value. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4754) Correctly parse '=' characters in command line overrides

2017-02-09 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-4754:
--

 Summary: Correctly parse '=' characters in command line overrides
 Key: KAFKA-4754
 URL: https://issues.apache.org/jira/browse/KAFKA-4754
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.9.0.0
Reporter: Grant Henke
Assignee: Grant Henke


When starting Kafka with an override parameter via "--override 
my.parameter=myvalue".

If a value contains an '=' character it fails and exits with "Invalid command 
line properties:.."

Often passwords contain an '=' character so its important to support that 
value. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-09 Thread Grant Henke
+1

On Thu, Feb 9, 2017 at 10:51 AM, Mickael Maison <mickael.mai...@gmail.com>
wrote:

> +1 too.
>
> On Thu, Feb 9, 2017 at 4:30 PM, Edoardo Comar <eco...@uk.ibm.com> wrote:
> > +1 (non-binding)
> > --
> > Edoardo Comar
> > IBM MessageHub
> > eco...@uk.ibm.com
> > IBM UK Ltd, Hursley Park, SO21 2JN
> >
> > IBM United Kingdom Limited Registered in England and Wales with number
> > 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants.
> PO6
> > 3AU
> >
> >
> >
> > From:   Ismael Juma <ism...@juma.me.uk>
> > To: dev@kafka.apache.org
> > Date:   09/02/2017 15:33
> > Subject:[VOTE] KIP-118: Drop Support for Java 7 in Kafka 0.11
> > Sent by:isma...@gmail.com
> >
> >
> >
> > Hi everyone,
> >
> > Since everyone in the discuss thread was in favour (10 people responded),
> > I
> > would like to initiate the voting process for KIP-118: Drop Support for
> > Java 7 in Kafka 0.11:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> >
> >
> > The vote will run for a minimum of 72 hours.
> >
> > Thanks,
> > Ismael
> >
> >
> >
> > Unless stated otherwise above:
> > IBM United Kingdom Limited - Registered in England and Wales with number
> > 741598.
> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
> 3AU
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Commented] (KAFKA-4746) Offsets can be committed for the offsets topic

2017-02-08 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858709#comment-15858709
 ] 

Grant Henke commented on KAFKA-4746:


I just mean that often when working with a compacted topic you read from the 
start of the topic every time your process restarts to see or rebuild "the 
current state". 

But you are right, that is a bit of an overstatement. There are likely cases 
where a process commits an offset to try and resume where it left off being 
well aware that the offsets could have been cleaned since it was last 
committed. As I understand before KIP-58/KAFKA-1981 it would be a race 
condition against the log cleaner whether the committed offset is valid or not. 
 Committing the offset also doesn't do anything to help ensure you didn't miss 
an offset that was cleaned while your application was not processing. 

KIP-58/KAFKA-1981 Fixed that to ensure some time passed before cleaning with  
min.compaction.lag.ms/min.compaction.lag.bytes/min.compaction.lag.messages

> Offsets can be committed for the offsets topic
> --
>
> Key: KAFKA-4746
> URL: https://issues.apache.org/jira/browse/KAFKA-4746
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.0
>Reporter: Grant Henke
>
> Though this is likely rare and I don't suspect to many people would try to do 
> this, we should prevent users from committing offsets for the offsets topic 
> into the offsets topic. This would essentially create an infinite loop in any 
> consumer consuming from that topic. Also committing offsets for a compacted 
> topic doesn't likely make sense anyway. 
> Here is a quick failing test I wrote to see if this guard exists:
> {code:title=OffsetCommitTest.scala|borderStyle=solid}
>  @Test
>   def testOffsetTopicOffsetCommit() {
> val topic1 = "__consumer_offsets"
> // Commit an offset
> val expectedReplicaAssignment = Map(0  -> List(1))
> val commitRequest = OffsetCommitRequest(
>   groupId = group,
>   requestInfo = immutable.Map(TopicAndPartition(topic1, 0) -> 
> OffsetAndMetadata(offset=42L)),
>   versionId = 2
> )
> val commitResponse = simpleConsumer.commitOffsets(commitRequest)
> assertEquals(Errors.INVALID_TOPIC_EXCEPTION.code, 
> commitResponse.commitStatus.get(TopicAndPartition(topic1, 0)).get)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4746) Offsets can be committed for the offsets topic

2017-02-08 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-4746:
--

 Summary: Offsets can be committed for the offsets topic
 Key: KAFKA-4746
 URL: https://issues.apache.org/jira/browse/KAFKA-4746
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.9.0.0
Reporter: Grant Henke


Though this is likely rare and I don't suspect to many people would try to do 
this, we should prevent users from committing offsets for the offsets topic 
into the offsets topic. This would essentially create an infinite loop in any 
consumer consuming from that topic. Also committing offsets for a compacted 
topic doesn't likely make sense anyway. 

Here is a quick failing test I wrote to see if this guard exists:

{code:title=OffsetCommitTest.scala|borderStyle=solid}
 @Test
  def testOffsetTopicOffsetCommit() {
val topic1 = "__consumer_offsets"
// Commit an offset
val expectedReplicaAssignment = Map(0  -> List(1))
val commitRequest = OffsetCommitRequest(
  groupId = group,
  requestInfo = immutable.Map(TopicAndPartition(topic1, 0) -> 
OffsetAndMetadata(offset=42L)),
  versionId = 2
)
val commitResponse = simpleConsumer.commitOffsets(commitRequest)

assertEquals(Errors.INVALID_TOPIC_EXCEPTION.code, 
commitResponse.commitStatus.get(TopicAndPartition(topic1, 0)).get)
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-48 Support for delegation tokens as an authentication mechanism

2017-02-07 Thread Grant Henke
+1 from me as well.

On Tue, Feb 7, 2017 at 7:10 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Looks like a great proposal! I noticed that key rotation is not included.
> That may be reasonable for the initial work, but it might be nice to share
> some thoughts on how that might work in the future. For example, I could
> imagine delegation.token.master.key could be a list, which would allow
> users to support both a new and old key at the same time while clients are
> upgrading keys.
>
> -Jason
>
> On Tue, Feb 7, 2017 at 4:42 PM, Gwen Shapira <g...@confluent.io> wrote:
>
> > Read the KIP again and I think it looks good.
> >
> > +1 from me.
> >
> > On Tue, Feb 7, 2017 at 3:05 PM, Jun Rao <j...@confluent.io> wrote:
> > > Hi, Mani,
> > >
> > > If a token expires, then every broker will potentially try to delete it
> > > around the same time, but only one will succeed. So, we will have to
> deal
> > > with failures in that case? Another way is to let just one broker (say,
> > the
> > > controller) deletes expired tokens.
> > >
> > > It would also be helpful for others to give feedback on this KIP.
> Rajini,
> > > Gwen, Ismael?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Sun, Feb 5, 2017 at 9:54 AM, Manikumar <manikumar.re...@gmail.com>
> > wrote:
> > >
> > >> Hi Jun,
> > >>
> > >>  Please see the replies inline.
> > >>
> > >>
> > >> > >
> > >> > > Only one broker does the deletion. Broker updates the expiration
> in
> > its
> > >> > > local cache
> > >> > > and on zookeeper so other brokers also get notified and their
> cache
> > >> > > statuses are updated as well.
> > >> > >
> > >> > >
> > >> > Which broker does the deletion?
> > >> >
> > >>
> > >> Any broker can handle the create/expire/renew/describe delegationtoken
> > >> requests.
> > >> changes are propagated through zk notifications.  Every broker is
> > >> responsible for
> > >> expiring the tokens. This check be can done during request handling
> time
> > >> and/or
> > >> during token authentication time.
> > >>
> > >>
> > >> >
> > >> >
> > >> > 110. The diagrams in the wiki still show MD5 digest. Could you
> change
> > it
> > >> to
> > >> > SCRAM?
> > >> >
> > >> >
> > >>   Updated the diagram.
> > >>
> > >>
> > >>
> > >> Thanks,
> > >> Manikumar
> > >>
> > >>
> > >>
> > >>
> > >> >
> > >> >
> > >> > >
> > >> > > Thanks.
> > >> > > Manikumar
> > >> > >
> > >> > >
> > >> > > >
> > >> > > > On Fri, Dec 23, 2016 at 9:26 AM, Manikumar <
> > >> manikumar.re...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > I would like to initiate the vote on KIP-48:
> > >> > > > >
> > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+
> > >> > > > > Delegation+token+support+for+Kafka
> > >> > > > >
> > >> > > > >
> > >> > > > > Thanks,
> > >> > > > > Manikumar
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> >
> >
> > --
> > Gwen Shapira
> > Product Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-07 Thread Grant Henke
 It is true that a garbage collection in one broker would not
> affect
> >>>>>>> others. But that is after every broker only uses 1/10 of the
> memory.
> >>>>> Can
> >>>>>>> we be sure that this will actually help performance?
> >>>>>>
> >>>>>> The big question is, how much memory do Kafka brokers use now, and
> how
> >>>>>> much will they use in the future?  Our experience in HDFS was that
> >> once
> >>>>>> you start getting more than 100-200GB Java heap sizes, full GCs
> start
> >>>>>> taking minutes to finish when using the standard JVMs.  That alone
> is
> >> a
> >>>>>> good reason to go multi-process or consider storing more things off
> >> the
> >>>>>> Java heap.
> >>>>>>
> >>>>>
> >>>>> I see. Now I agree one-broker-per-disk should be more efficient in
> >> terms
> >>>> of
> >>>>> GC since each broker probably needs less than 1/10 of the memory
> >>>> available
> >>>>> on a typical machine nowadays. I will remove this from the reason of
> >>>>> rejection.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Disk failure is the "easy" case.  The "hard" case, which is
> >>>>>> unfortunately also the much more common case, is disk misbehavior.
> >>>>>> Towards the end of their lives, disks tend to start slowing down
> >>>>>> unpredictably.  Requests that would have completed immediately
> before
> >>>>>> start taking 20, 100 500 milliseconds.  Some files may be readable
> and
> >>>>>> other files may not be.  System calls hang, sometimes forever, and
> the
> >>>>>> Java process can't abort them, because the hang is in the kernel.
> It
> >>>> is
> >>>>>> not fun when threads are stuck in "D state"
> >>>>>> http://stackoverflow.com/questions/20423521/process-perminan
> >>>>>> tly-stuck-on-d-state
> >>>>>> .  Even kill -9 cannot abort the thread then.  Fortunately, this is
> >>>>>> rare.
> >>>>>>
> >>>>>
> >>>>> I agree it is a harder problem and it is rare. We probably don't have
> >> to
> >>>>> worry about it in this KIP since this issue is orthogonal to whether
> or
> >>>> not
> >>>>> we use JBOD.
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> Another approach we should consider is for Kafka to implement its
> own
> >>>>>> storage layer that would stripe across multiple disks.  This
> wouldn't
> >>>>>> have to be done at the block level, but could be done at the file
> >>>> level.
> >>>>>> We could use consistent hashing to determine which disks a file
> should
> >>>>>> end up on, for example.
> >>>>>>
> >>>>>
> >>>>> Are you suggesting that we should distribute log, or log segment,
> >> across
> >>>>> disks of brokers? I am not sure if I fully understand this approach.
> My
> >>>> gut
> >>>>> feel is that this would be a drastic solution that would require
> >>>>> non-trivial design. While this may be useful to Kafka, I would prefer
> >> not
> >>>>> to discuss this in detail in this thread unless you believe it is
> >>>> strictly
> >>>>> superior to the design in this KIP in terms of solving our use-case.
> >>>>>
> >>>>>
> >>>>>> best,
> >>>>>> Colin
> >>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Dong
> >>>>>>>
> >>>>>>> On Wed, Jan 25, 2017 at 11:34 AM, Colin McCabe <cmcc...@apache.org
> >
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi Dong,
> >>>>>>>>
> >>>>>>>> Thanks for the writeup!  It's very interesting.
> >>>>>>>>
> >>>>>>>> I apologize in advance if this has been discussed somewhere else.
> >>>>> But
> >>>>>> I
> >>>>>>>> am curious if you have considered the solution of running multiple
> >>>>>>>> brokers per node.  Clearly there is a memory overhead with this
> >>>>>> solution
> >>>>>>>> because of the fixed cost of starting multiple JVMs.  However,
> >>>>> running
> >>>>>>>> multiple JVMs would help avoid scalability bottlenecks.  You could
> >>>>>>>> probably push more RPCs per second, for example.  A garbage
> >>>>> collection
> >>>>>>>> in one broker would not affect the others.  It would be
> interesting
> >>>>> to
> >>>>>>>> see this considered in the "alternate designs" design, even if you
> >>>>> end
> >>>>>>>> up deciding it's not the way to go.
> >>>>>>>>
> >>>>>>>> best,
> >>>>>>>> Colin
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Jan 12, 2017, at 10:46, Dong Lin wrote:
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> We created KIP-112: Handle disk failure for JBOD. Please find the
> >>>>> KIP
> >>>>>>>>> wiki
> >>>>>>>>> in the link https://cwiki.apache.org/confl
> >>>> uence/display/KAFKA/KIP-
> >>>>>>>>> 112%3A+Handle+disk+failure+for+JBOD.
> >>>>>>>>>
> >>>>>>>>> This KIP is related to KIP-113
> >>>>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >>>>>>>> 113%3A+Support+replicas+movement+between+log+directories>:
> >>>>>>>>> Support replicas movement between log directories. They are
> >>>> needed
> >>>>> in
> >>>>>>>>> order
> >>>>>>>>> to support JBOD in Kafka. Please help review the KIP. You
> >>>> feedback
> >>>>> is
> >>>>>>>>> appreciated!
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Dong
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >>
>
>


-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-117: Add a public AdministrativeClient API for Kafka admin operations

2017-02-07 Thread Grant Henke
+1 I think its important to focus this KIP discussion on the "patterns" we
would like to see in the client and a few key methods in order to make
progress and then iterate from there.

I think we should let Colin drive the APIs he thinks are important since he
is volunteering to do the work. And then others can propose and add APIs
from there.

On Tue, Feb 7, 2017 at 10:37 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi all,
>
> I think it's good that we have discussed a number of API that would make
> sense in the AdminClient. Having done that, I think we should now narrow
> the focus of this KIP to a small set of methods to get us started. Once we
> have an AdminClient in the codebase, we can propose subsequent KIPs to
> enrich it. I would suggest the following:
>
> 1. Let's focus on topic management operations: describe, create, alter and
> delete topics.
> 2. Let's add an @Unstable annotation to the class and specify in the
> javadoc that the methods are subject to change (if necessary).
>
> Thoughts?
>
> Ismael
>
> On Fri, Feb 3, 2017 at 6:24 PM, Colin McCabe <cmcc...@apache.org> wrote:
>
> > On Thu, Feb 2, 2017, at 21:45, Becket Qin wrote:
> > > Hi Colin,
> > >
> > > Thanks for the KIP. An admin client is probably a must after we block
> > > direct access to ZK. Some comments and thoughts below:
> > >
> > > 1. Do we have a clear scope for the admin client? It might be worth
> > > thinking about the entire user experience of the admin client. Ideally
> we
> > > may want to have a single client to do all the administrative work
> > > instead
> > > of having multiple ways to do different things. For example, do we want
> > > to
> > > add topic configurations change API in the admin client? What about
> > > partition movements and preferred leader election? Those are also
> > > administrative tasks which seem reasonable to be integrated into the
> > > admin
> > > client.
> >
> > Thanks for the comments, Becket!
> >
> > I agree that topic configuration change should be in the administrative
> > client.  I have not thought about partition movement or preferred leader
> > election.  It probably makes sense to put them in the client as well,
> > but we should probably have a longer discussion about those features
> > later when someone is ready to implement them ;)
> >
> > best,
> > Colin
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Trying to understand design decision about producer ack and min.insync.replicas

2017-02-03 Thread Grant Henke
ot always), a particular topic is used only by a
> > small
> > > >> set of producers with a specific set of data. The durability
> settings
> > > would
> > > >> usually be chosen due to the nature of the data, rather than based
> on
> > > who
> > > >> produced the data, and so it makes sense to me that the durability
> > > should
> > > >> be on the entire topic, not by the producer.
> > > >>
> > > >> What is a use case where you have multiple producers writing to the
> > same
> > > >> topic but would want different durability?
> > > >>
> > > >> -James
> > > >>
> > > >>> The ability to make this tradeoff in different places can seem more
> > > >> complex
> > > >>> (and really by definition *is* more complex), but it also offers
> more
> > > >>> flexibility.
> > > >>>
> > > >>> -Ewen
> > > >>>
> > > >>>
> > > >>>> But I understand your point, min.insync.replicas setting should be
> > > >>>> understood as "if a producer wants to get an error when topics are
> > > under
> > > >>>> replicated, then how many replicas are enough for not raising an
> > > error?"
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Jan 26, 2017 at 4:16 PM, Ewen Cheslack-Postava <
> > > >> e...@confluent.io>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> The acks setting for the producer doesn't affect the final
> > durability
> > > >>>>> guarantees. These are still enforced by the replication and min
> ISR
> > > >>>>> settings. Instead, the ack setting just lets the producer control
> > how
> > > >>>>> durable the write is before *that producer* can consider the
> write
> > > >>>>> "complete", i.e. before it gets an ack.
> > > >>>>>
> > > >>>>> -Ewen
> > > >>>>>
> > > >>>>> On Tue, Jan 24, 2017 at 12:46 PM, Luciano Afranllie <
> > > >>>>> listas.luaf...@gmail.com> wrote:
> > > >>>>>
> > > >>>>>> Hi everybody
> > > >>>>>>
> > > >>>>>> I am trying to understand why Kafka let each individual
> producer,
> > > on a
> > > >>>>>> connection per connection basis, choose the tradeoff between
> > > >>>> availability
> > > >>>>>> and durability, honoring min.insync.replicas value only if
> > producer
> > > >>>> uses
> > > >>>>>> ack=all.
> > > >>>>>>
> > > >>>>>> I mean, for a single topic, cluster administrators can't enforce
> > > >>>> messages
> > > >>>>>> to be stores in a minimum number of replicas without
> coordinating
> > > with
> > > >>>>> all
> > > >>>>>> producers to that topic so all of them use ack=all.
> > > >>>>>>
> > > >>>>>> Is there something that I am missing? Is there any other
> strategy
> > to
> > > >>>>>> overcome this situation?
> > > >>>>>>
> > > >>>>>> Regards
> > > >>>>>> Luciano
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>
> > > >>
> > >
> > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: KIP-119: Drop Support for Scala 2.10 in Kafka 0.11

2017-02-03 Thread Grant Henke
Thanks for proposing this Ismael. This makes sense to me.

In this KIP and the java KIP you state:

A reasonable policy is to support the 2 most recently released versions so
> that we can strike a good balance between supporting older versions,
> maintainability and taking advantage of language and library improvements.


What do you think about adjusting the KIP to instead vote on that as a
standard policy for Java and Scala going forward? Something along the lines
of:

"Kafka's policy is to support the 2 most recently released versions of Java
and Scala at a given time. When a new version becomes available, the
supported versions will be updated in the next major release of Kafka."


On Fri, Feb 3, 2017 at 8:30 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi all,
>
> I have posted a KIP for dropping support for Scala 2.10 in Kafka 0.11:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 119%3A+Drop+Support+for+Scala+2.10+in+Kafka+0.11
>
> Please take a look. Your feedback is appreciated.
>
> Thanks,
> Ismael
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-03 Thread Grant Henke
Looks good to me. Thanks for handling the KIP.

On Fri, Feb 3, 2017 at 8:49 AM, Damian Guy <damian@gmail.com> wrote:

> Thanks Ismael. Makes sense to me.
>
> On Fri, 3 Feb 2017 at 10:39 Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi all,
> >
> > I have posted a KIP for dropping support for Java 7 in Kafka 0.11:
> >
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> >
> > Most people were supportive when we last discussed the topic[1], but
> there
> > were a few concerns. I believe the following should mitigate the
> concerns:
> >
> > 1. The new proposal suggests dropping support in the next major version
> > instead of the next minor version.
> > 2. KIP-97 which is part of 0.10.2 means that 0.11 clients will support
> 0.10
> > brokers (0.11 brokers will also support 0.10 clients as usual), so there
> is
> > even more flexibility on incremental upgrades.
> > 3. Java 9 will be released shortly after the next Kafka release, so we'll
> > be supporting the 2 most recent Java releases, which is a reasonable
> > policy.
> > 4. 8 months have passed since the last proposal and the release after
> > 0.10.2 won't be out for another 4 months, which should hopefully be
> enough
> > time for Java 8 to be even more established. We haven't decided when the
> > next major release will happen, but we know that it won't happen before
> > June 2017.
> >
> > Please take a look at the proposal and share your feedback.
> >
> > Thanks,
> > Ismael
> >
> > [1] http://search-hadoop.com/m/Kafka/uyzND1oIhV61GS5Sf2
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-01 Thread Grant Henke
rms
> > >> of
> > >>> GC since each broker probably needs less than 1/10 of the memory
> > >> available
> > >>> on a typical machine nowadays. I will remove this from the reason of
> > >>> rejection.
> > >>>
> > >>>
> > >>>>
> > >>>> Disk failure is the "easy" case.  The "hard" case, which is
> > >>>> unfortunately also the much more common case, is disk misbehavior.
> > >>>> Towards the end of their lives, disks tend to start slowing down
> > >>>> unpredictably.  Requests that would have completed immediately
> before
> > >>>> start taking 20, 100 500 milliseconds.  Some files may be readable
> and
> > >>>> other files may not be.  System calls hang, sometimes forever, and
> the
> > >>>> Java process can't abort them, because the hang is in the kernel.
> It
> > >> is
> > >>>> not fun when threads are stuck in "D state"
> > >>>> http://stackoverflow.com/questions/20423521/process-perminan
> > >>>> tly-stuck-on-d-state
> > >>>> .  Even kill -9 cannot abort the thread then.  Fortunately, this is
> > >>>> rare.
> > >>>>
> > >>>
> > >>> I agree it is a harder problem and it is rare. We probably don't have
> > to
> > >>> worry about it in this KIP since this issue is orthogonal to whether
> or
> > >> not
> > >>> we use JBOD.
> > >>>
> > >>>
> > >>>>
> > >>>> Another approach we should consider is for Kafka to implement its
> own
> > >>>> storage layer that would stripe across multiple disks.  This
> wouldn't
> > >>>> have to be done at the block level, but could be done at the file
> > >> level.
> > >>>> We could use consistent hashing to determine which disks a file
> should
> > >>>> end up on, for example.
> > >>>>
> > >>>
> > >>> Are you suggesting that we should distribute log, or log segment,
> > across
> > >>> disks of brokers? I am not sure if I fully understand this approach.
> My
> > >> gut
> > >>> feel is that this would be a drastic solution that would require
> > >>> non-trivial design. While this may be useful to Kafka, I would prefer
> > not
> > >>> to discuss this in detail in this thread unless you believe it is
> > >> strictly
> > >>> superior to the design in this KIP in terms of solving our use-case.
> > >>>
> > >>>
> > >>>> best,
> > >>>> Colin
> > >>>>
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Dong
> > >>>>>
> > >>>>> On Wed, Jan 25, 2017 at 11:34 AM, Colin McCabe <cmcc...@apache.org
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Hi Dong,
> > >>>>>>
> > >>>>>> Thanks for the writeup!  It's very interesting.
> > >>>>>>
> > >>>>>> I apologize in advance if this has been discussed somewhere else.
> > >>> But
> > >>>> I
> > >>>>>> am curious if you have considered the solution of running multiple
> > >>>>>> brokers per node.  Clearly there is a memory overhead with this
> > >>>> solution
> > >>>>>> because of the fixed cost of starting multiple JVMs.  However,
> > >>> running
> > >>>>>> multiple JVMs would help avoid scalability bottlenecks.  You could
> > >>>>>> probably push more RPCs per second, for example.  A garbage
> > >>> collection
> > >>>>>> in one broker would not affect the others.  It would be
> interesting
> > >>> to
> > >>>>>> see this considered in the "alternate designs" design, even if you
> > >>> end
> > >>>>>> up deciding it's not the way to go.
> > >>>>>>
> > >>>>>> best,
> > >>>>>> Colin
> > >>>>>>
> > >>>>>>
> > >>>>>> On Thu, Jan 12, 2017, at 10:46, Dong Lin wrote:
> > >>>>>>> Hi all,
> > >>>>>>>
> > >>>>>>> We created KIP-112: Handle disk failure for JBOD. Please find the
> > >>> KIP
> > >>>>>>> wiki
> > >>>>>>> in the link https://cwiki.apache.org/confl
> > >> uence/display/KAFKA/KIP-
> > >>>>>>> 112%3A+Handle+disk+failure+for+JBOD.
> > >>>>>>>
> > >>>>>>> This KIP is related to KIP-113
> > >>>>>>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>> 113%3A+Support+replicas+movement+between+log+directories>:
> > >>>>>>> Support replicas movement between log directories. They are
> > >> needed
> > >>> in
> > >>>>>>> order
> > >>>>>>> to support JBOD in Kafka. Please help review the KIP. You
> > >> feedback
> > >>> is
> > >>>>>>> appreciated!
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> Dong
> > >>>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: Is this a bug or just unintuitive behavior?

2017-01-17 Thread Grant Henke
or any of the subscribed list of topics
> > > Topic is created or deleted
> > > An existing member of the consumer group dies
> > > A new member is added to an existing consumer group via the join API
> > > "
> > >
> > > I'm guessing that this would affect any of those scenarios.
> > >
> > > -James
> > >
> > >
> > > >
> > > >
> > > >
> > > > On Thu, Jan 5, 2017 at 12:40 AM, James Cheng <wushuja...@gmail.com>
> > > wrote:
> > > >
> > > >> Jeff,
> > > >>
> > > >> Your analysis is correct. I would say that it is known but
> unintuitive
> > > >> behavior.
> > > >>
> > > >> As an example of a problem caused by this behavior, it's possible
> for
> > > >> mirrormaker to miss messages on newly created topics, even thought
> it
> > > was
> > > >> subscribed to them before topics were creted.
> > > >>
> > > >> See the following JIRAs:
> > > >> https://issues.apache.org/jira/browse/KAFKA-3848 <
> > > >> https://issues.apache.org/jira/browse/KAFKA-3848>
> > > >> https://issues.apache.org/jira/browse/KAFKA-3370 <
> > > >> https://issues.apache.org/jira/browse/KAFKA-3370>
> > > >>
> > > >> -James
> > > >>
> > > >>> On Jan 4, 2017, at 4:37 PM, h...@confluent.io wrote:
> > > >>>
> > > >>> This sounds exactly as I would expect things to behave. If you
> > consume
> > > >> from the beginning I would think you would get all the messages but
> > not
> > > if
> > > >> you consume from the latest offset. You can separately tune the
> > metadata
> > > >> refresh interval if you want to miss fewer messages but that still
> > won't
> > > >> get you all messages from the beginning if you don't explicitly
> > consume
> > > >> from the beginning.
> > > >>>
> > > >>> Sent from my iPhone
> > > >>>
> > > >>>> On Jan 4, 2017, at 6:53 PM, Jeff Widman <j...@netskope.com>
> wrote:
> > > >>>>
> > > >>>> I'm seeing consumers miss messages when they subscribe before the
> > > topic
> > > >> is
> > > >>>> actually created.
> > > >>>>
> > > >>>> Scenario:
> > > >>>> 1) kafka 0.10.1.1 cluster with allow-topic no topics, but supports
> > > topic
> > > >>>> auto-creation as soon as a message is published to the topic
> > > >>>> 2) consumer subscribes using topic string or a regex pattern.
> > > Currently
> > > >> no
> > > >>>> topics match. Consumer offset is "latest"
> > > >>>> 3) producer publishes to a topic that matches the string or regex
> > > >> pattern.
> > > >>>> 4) broker immediately creates a topic, writes the message, and
> also
> > > >>>> notifies the consumer group that a rebalance needs to happen to
> > assign
> > > >> the
> > > >>>> topic_partition to one of the consumers..
> > > >>>> 5) rebalance is fairly quick, maybe a second or so
> > > >>>> 6) a consumer is assigned to the newly-created topic_partition
> > > >>>>
> > > >>>> At this point, we've got a consumer steadily polling the recently
> > > >> created
> > > >>>> topic_partition. However, the consumer.poll() never returns any
> > > messages
> > > >>>> published between topic creation and when the consumer was
> assigned
> > to
> > > >> the
> > > >>>> topic_partition. I'm guessing this may be because when the
> consumer
> > is
> > > >>>> assigned to the topic_partition it doesn't find any, so it uses
> the
> > > >> latest
> > > >>>> offset, which happens to be after the messages that were published
> > to
> > > >>>> create the topic.
> > > >>>>
> > > >>>> This is surprising because the consumer technically was subscribed
> > to
> > > >> the
> > > >>>> topic before the messages were produced, so you'd think the
> consumer
> > > >> would
> > > >>>> receive these messages.
> > > >>>>
> > > >>>> Is this known behavior? A bug in Kafka broker? Or a bug in my
> client
> > > >>>> library?
> > > >>
> > > >>
> > >
> > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [ANNOUNCE] New committer: Grant Henke

2017-01-12 Thread Grant Henke
Thanks everyone!

On Thu, Jan 12, 2017 at 2:58 AM, Damian Guy <damian@gmail.com> wrote:

> Congratulations!
>
> On Thu, 12 Jan 2017 at 03:35 Jun Rao <j...@confluent.io> wrote:
>
> > Grant,
> >
> > Thanks for all your contribution! Congratulations!
> >
> > Jun
> >
> > On Wed, Jan 11, 2017 at 2:51 PM, Gwen Shapira <g...@confluent.io> wrote:
> >
> > > The PMC for Apache Kafka has invited Grant Henke to join as a
> > > committer and we are pleased to announce that he has accepted!
> > >
> > > Grant contributed 88 patches, 90 code reviews, countless great
> > > comments on discussions, a much-needed cleanup to our protocol and the
> > > on-going and critical work on the Admin protocol. Throughout this, he
> > > displayed great technical judgment, high-quality work and willingness
> > > to contribute where needed to make Apache Kafka awesome.
> > >
> > > Thank you for your contributions, Grant :)
> > >
> > > --
> > > Gwen Shapira
> > > Product Manager | Confluent
> > > 650.450.2760 <(650)%20450-2760> | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] Dormant/Inactive KIPs

2017-01-05 Thread Grant Henke
Thanks for following up on this Ismael!

On Thu, Jan 5, 2017 at 6:46 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Thanks for the feedback. I went ahead and moved the mentioned KIPs as well
> as KIP-59, KIP-64, KIP-69 and KIP-76 to a dormant/inactive table. Note that
> this is not a permanent change, people are welcome to move a KIP back to
> the "Under discussion" table if it's not inactive/dormant.
>
> Ismael
>
> On Tue, Dec 13, 2016 at 11:53 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi all,
> >
> > A while back Grant proposed moving inactive/dormant KIPs to a separate
> > table in the wiki. I think this is a good idea as it will make it easier
> > for people to see the KIPs that are actually active. The list that Grant
> > proposed then was:
> >
> > - KIP-6 - New reassignment partition logic for rebalancing (dormant)
> >> - KIP-14 - Tools standardization (dormant)
> >> - KIP-17 - Add HighwaterMarkOffset to OffsetFetchResponse (dormant)
> >> - KIP-18 - JBOD Support (dormant)
> >> - KIP-23 - Add JSON/CSV output and looping options to
> ConsumerGroupCommand
> >> (dormant)
> >> - KIP-27 - Conditional Publish (dormant)
> >>  - KIP-30 - Allow for brokers to have plug-able consensus and meta data
> storage
> >> sub systems (dormant)
> >>  - KIP-39: Pinning controller to broker (dormant)
> >>  - KIP-44 - Allow Kafka to have a customized security protocol (dormant)
> >>  - KIP-46 - Self Healing (dormant)
> >>  - KIP-47 - Add timestamp-based log deletion policy (blocked - by
> KIP-33)
> >>  - KIP-53 - Add custom policies for reconnect attempts to NetworkdClient
> >>  - KIP-58 - Make Log Compaction Point Configurable (blocked - by KIP-33)
> >>  - KIP-61: Add a log retention parameter for maximum disk space usage
> >>percentage (dormant)
> >>  - KIP-68 Add a consumed log retention before log retention (dormant)
> >
> >
> > KIP-58 was adopted since then and it probably makes sense to add KIP-10
> to
> > the list.
> >
> > Are people OK with this? Feel free to suggest KIPs that should or should
> > not be in the inactive/dormant list.
> >
> > Ismael
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Created] (KAFKA-4525) Kafka should not require SSL trust store password

2016-12-12 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-4525:
--

 Summary: Kafka should not require SSL trust store password
 Key: KAFKA-4525
 URL: https://issues.apache.org/jira/browse/KAFKA-4525
 Project: Kafka
  Issue Type: Bug
  Components: security
Affects Versions: 0.9.0.0
Reporter: Grant Henke
Assignee: Grant Henke


When configuring SSL for Kafka; If the truststore password is not set, Kafka 
fails to start with:
{noformat}
org.apache.kafka.common.KafkaException: SSL trust store is specified, but trust 
store password is not specified.

at 
org.apache.kafka.common.security.ssl.SslFactory.createTruststore(SslFactory.java:195)
at 
org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:115)
{noformat}

The truststore password is not required for read operations. When reading the 
truststore the password is used as an integrity check but not required. 

The risk of not providing a password is that someone could add a certificate 
into the store which you do not want to trust. The store should be protected 
first by the OS permissions. The password is an additional protection.

Though this risk of trusting the OS permissions is one many may not want to 
take, its not a decision that Kafka should enforce or require. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-2552) Certain admin commands such as partition assignment fail on large clusters

2016-10-11 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke resolved KAFKA-2552.

Resolution: Duplicate

> Certain admin commands such as partition assignment fail on large clusters
> --
>
> Key: KAFKA-2552
> URL: https://issues.apache.org/jira/browse/KAFKA-2552
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Abhishek Nigam
>Assignee: Abhishek Nigam
>
> This happens because the json generated is greater than 1 MB and exceeds the 
> default data limit of zookeeper nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4203) Java producer default max message size does not align with broker default

2016-09-21 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-4203:
--

 Summary: Java producer default max message size does not align 
with broker default
 Key: KAFKA-4203
 URL: https://issues.apache.org/jira/browse/KAFKA-4203
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Grant Henke
Assignee: Grant Henke
Priority: Critical


The Java producer sets max.request.size = 1048576 (the base 2 version of 1 MB 
(MiB))

The broker sets max.message.bytes = 112 (the base 10 value of 1 MB + 12 
bytes for overhead)

This means that by default the producer can try to produce messages larger than 
the broker will accept resulting in RecordTooLargeExceptions.

There were not similar issues in the old producer because it sets 
max.message.size = 100 (the base 10 value of 1 MB)

I propose we increase the broker default for max.message.bytes to 1048588 (the 
base 2 value of 1 MB (MiB) + 12 bytes for overhead) so that any message 
produced with default configs from either producer does not result in a 
RecordTooLargeException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4157) Transient system test failure in replica_verification_test.test_replica_lags

2016-09-13 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-4157:
--

 Summary: Transient system test failure in 
replica_verification_test.test_replica_lags
 Key: KAFKA-4157
 URL: https://issues.apache.org/jira/browse/KAFKA-4157
 Project: Kafka
  Issue Type: Bug
  Components: system tests
Affects Versions: 0.10.0.0
Reporter: Grant Henke
Assignee: Grant Henke


The replica_verification_test.test_replica_lags test runs a background thread 
via replica_verification_tool that populates a dict with max lag for each 
"topic,partition" key. Because populating that map is in a separate thread, 
there is a race condition on populating the key and querying it via 
replica_verification_tool.get_lag_for_partition. This results in a key error 
like below: 
{noformat}
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ducktape/tests/runner.py", line 106, 
in run_all_tests
data = self.run_single_test()
  File "/usr/lib/python2.7/site-packages/ducktape/tests/runner.py", line 162, 
in run_single_test
return self.current_test_context.function(self.current_test)
  File "/root/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", 
line 82, in test_replica_lags
err_msg="Timed out waiting to reach zero replica lags.")
  File "/usr/lib/python2.7/site-packages/ducktape/utils/util.py", line 31, in 
wait_until
if condition():
  File "/root/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", 
line 81, in 
wait_until(lambda: self.replica_verifier.get_lag_for_partition(TOPIC, 0) == 
0, timeout_sec=10,
  File "/root/kafka/tests/kafkatest/services/replica_verification_tool.py", 
line 66, in get_lag_for_partition
lag = self.partition_lag[topic_partition]
KeyError: 'topic-replica-verification,0'
{noformat}

Instead of an error, None should be returned when no key is found. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4157) Transient system test failure in replica_verification_test.test_replica_lags

2016-09-13 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-4157:
---
Status: Patch Available  (was: Open)

> Transient system test failure in replica_verification_test.test_replica_lags
> 
>
> Key: KAFKA-4157
> URL: https://issues.apache.org/jira/browse/KAFKA-4157
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Affects Versions: 0.10.0.0
>    Reporter: Grant Henke
>Assignee: Grant Henke
>
> The replica_verification_test.test_replica_lags test runs a background thread 
> via replica_verification_tool that populates a dict with max lag for each 
> "topic,partition" key. Because populating that map is in a separate thread, 
> there is a race condition on populating the key and querying it via 
> replica_verification_tool.get_lag_for_partition. This results in a key error 
> like below: 
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/ducktape/tests/runner.py", line 106, 
> in run_all_tests
> data = self.run_single_test()
>   File "/usr/lib/python2.7/site-packages/ducktape/tests/runner.py", line 162, 
> in run_single_test
> return self.current_test_context.function(self.current_test)
>   File 
> "/root/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", line 
> 82, in test_replica_lags
> err_msg="Timed out waiting to reach zero replica lags.")
>   File "/usr/lib/python2.7/site-packages/ducktape/utils/util.py", line 31, in 
> wait_until
> if condition():
>   File 
> "/root/kafka/tests/kafkatest/tests/tools/replica_verification_test.py", line 
> 81, in 
> wait_until(lambda: self.replica_verifier.get_lag_for_partition(TOPIC, 0) 
> == 0, timeout_sec=10,
>   File "/root/kafka/tests/kafkatest/services/replica_verification_tool.py", 
> line 66, in get_lag_for_partition
> lag = self.partition_lag[topic_partition]
> KeyError: 'topic-replica-verification,0'
> {noformat}
> Instead of an error, None should be returned when no key is found. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-78 Cluster Id (second attempt)

2016-09-07 Thread Grant Henke
+1 (non-binding)

On Wed, Sep 7, 2016 at 6:55 AM, Rajini Sivaram <rajinisiva...@googlemail.com
> wrote:

> +1 (non-binding)
>
> On Wed, Sep 7, 2016 at 4:09 AM, Sriram Subramanian <r...@confluent.io>
> wrote:
>
> > +1 binding
> >
> > > On Sep 6, 2016, at 7:46 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> > >
> > > Hi all,
> > >
> > > I would like to (re)initiate[1] the voting process for KIP-78 Cluster
> Id:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-78%3A+Cluster+Id
> > >
> > > As explained in the KIP and discussion thread, we see this as a good
> > first
> > > step that can serve as a foundation for future improvements.
> > >
> > > Thanks,
> > > Ismael
> > >
> > > [1] Even though I created a new vote thread, Gmail placed the messages
> in
> > > the discuss thread, making it not as visible as required. It's
> important
> > to
> > > mention that two +1s were cast by Gwen and Sriram:
> > >
> > > http://mail-archives.apache.org/mod_mbox/kafka-dev/201609.
> > mbox/%3CCAD5tkZbLv7fvH4q%2BKe%2B%3DJMgGq%2BZT2t34e0WRUsCT1ErhtKOg1w%
> > 40mail.gmail.com%3E
> >
>
>
>
> --
> Regards,
>
> Rajini
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [ANNOUNCE] New committer: Jason Gustafson

2016-09-07 Thread Grant Henke
Congratulations and thank you for all of your contributions to Apache
Kafka Jason!

On Wed, Sep 7, 2016 at 10:12 AM, Mayuresh Gharat <gharatmayures...@gmail.com
> wrote:

> congrats Jason !
>
> Thanks,
>
> Mayuresh
>
> On Wed, Sep 7, 2016 at 5:16 AM, Eno Thereska <eno.there...@gmail.com>
> wrote:
>
> > Congrats Jason!
> >
> > Eno
> > > On 7 Sep 2016, at 10:00, Rajini Sivaram <rajinisiva...@googlemail.com>
> > wrote:
> > >
> > > Congrats, Jason!
> > >
> > > On Wed, Sep 7, 2016 at 8:29 AM, Flavio P JUNQUEIRA <f...@apache.org>
> > wrote:
> > >
> > >> Congrats, Jason. Well done and great to see this project inviting new
> > >> committers.
> > >>
> > >> -Flavio
> > >>
> > >> On 7 Sep 2016 04:58, "Ashish Singh" <asi...@cloudera.com> wrote:
> > >>
> > >>> Congrats, Jason!
> > >>>
> > >>> On Tuesday, September 6, 2016, Jason Gustafson <ja...@confluent.io>
> > >> wrote:
> > >>>
> > >>>> Thanks all!
> > >>>>
> > >>>> On Tue, Sep 6, 2016 at 5:13 PM, Becket Qin <becket@gmail.com
> > >>>> <javascript:;>> wrote:
> > >>>>
> > >>>>> Congrats, Jason!
> > >>>>>
> > >>>>> On Tue, Sep 6, 2016 at 5:09 PM, Onur Karaman
> > >>>> <okara...@linkedin.com.invalid
> > >>>>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> congrats jason!
> > >>>>>>
> > >>>>>> On Tue, Sep 6, 2016 at 4:12 PM, Sriram Subramanian <
> > >> r...@confluent.io
> > >>>> <javascript:;>>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>>> Congratulations Jason!
> > >>>>>>>
> > >>>>>>> On Tue, Sep 6, 2016 at 3:40 PM, Vahid S Hashemian <
> > >>>>>>> vahidhashem...@us.ibm.com <javascript:;>
> > >>>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Congratulations Jason on this very well deserved recognition.
> > >>>>>>>>
> > >>>>>>>> --Vahid
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> From:   Neha Narkhede <n...@confluent.io <javascript:;>>
> > >>>>>>>> To: "dev@kafka.apache.org <javascript:;>" <
> > >>>> dev@kafka.apache.org <javascript:;>>,
> > >>>>>>>> "us...@kafka.apache.org <javascript:;>" <
> > >> us...@kafka.apache.org
> > >>>> <javascript:;>>
> > >>>>>>>> Cc: "priv...@kafka.apache.org <javascript:;>" <
> > >>>> priv...@kafka.apache.org <javascript:;>>
> > >>>>>>>> Date:   09/06/2016 03:26 PM
> > >>>>>>>> Subject:[ANNOUNCE] New committer: Jason Gustafson
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> The PMC for Apache Kafka has invited Jason Gustafson to join
> > >> as a
> > >>>>>>>> committer and
> > >>>>>>>> we are pleased to announce that he has accepted!
> > >>>>>>>>
> > >>>>>>>> Jason has contributed numerous patches to a wide range of
> > >> areas,
> > >>>>>> notably
> > >>>>>>>> within the new consumer and the Kafka Connect layers. He has
> > >>>>> displayed
> > >>>>>>>> great taste and judgement which has been apparent through his
> > >>>>>> involvement
> > >>>>>>>> across the board from mailing lists, JIRA, code reviews to
> > >>>>> contributing
> > >>>>>>>> features, bug fixes and code and documentation improvements.
> > >>>>>>>>
> > >>>>>>>> Thank you for your contribution and welcome to Apache Kafka,
> > >>> Jason!
> > >>>>>>>> --
> > >>>>>>>> Thanks,
> > >>>>>>>> Neha
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Ashish h
> > >>>
> > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Rajini
> >
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-71 Enable log compaction and deletion to co-exist

2016-08-19 Thread Grant Henke
e only concrete example i can think of is a case for limiting
> > disk
> > > > > usage.
> > > > > > Say, i had something like Connect running that was tracking
> changes
> > > in
> > > > a
> > > > > > database. Downstream i don't really care about every change, i
> just
> > > > want
> > > > > > the latest values, so compaction could be enabled. However, the
> > kafka
> > > > > > cluster has limited disk space so we need to limit the size of
> each
> > > > > > partition.
> > > > > > In a previous life i have done the same, just without compaction
> > > turned
> > > > > on.
> > > > > >
> > > > > > Besides, i don't think it costs us anything in terms of added
> > > > complexity
> > > > > to
> > > > > > enable it for time & size based retention - the code already does
> > > this
> > > > > for
> > > > > > us.
> > > > > >
> > > > > > Thanks,
> > > > > > Damian
> > > > > >
> > > > > > On Fri, 12 Aug 2016 at 05:30 Neha Narkhede <n...@confluent.io>
> > > wrote:
> > > > > >
> > > > > > > Jun,
> > > > > > >
> > > > > > > The motivation for this KIP is to handle joins and windows in
> > Kafka
> > > > > > > streams better and since Streams supports time-based windows,
> the
> > > KIP
> > > > > > > suggests combining time-based deletion and compaction.
> > > > > > >
> > > > > > > It might make sense to do the same for size-based windows, but
> > can
> > > > you
> > > > > > > think of a concrete use case? If not, perhaps we can come back
> to
> > > it.
> > > > > > > On Thu, Aug 11, 2016 at 3:08 PM Jun Rao <j...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > >> Hi, Damian,
> > > > > > >>
> > > > > > >> Thanks for the proposal. It makes sense to use time-based
> > deletion
> > > > > > >> retention and compaction together, as you mentioned in the
> > > KStream.
> > > > > > >>
> > > > > > >> Is there a use case where we want to combine size-based
> deletion
> > > > > > retention
> > > > > > >> and compaction together?
> > > > > > >>
> > > > > > >> Jun
> > > > > > >>
> > > > > > >> On Thu, Aug 11, 2016 at 2:00 AM, Damian Guy <
> > damian@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Hi Jason,
> > > > > > >> >
> > > > > > >> > Thanks for your input - appreciated.
> > > > > > >> >
> > > > > > >> > 1. Would it make sense to use this KIP in the consumer
> > > coordinator
> > > > > to
> > > > > > >> > > expire offsets based on the topic's retention time?
> > Currently,
> > > > we
> > > > > > >> have a
> > > > > > >> > > periodic task which scans the full cache to check which
> > > offsets
> > > > > can
> > > > > > be
> > > > > > >> > > expired, but we might be able to get rid of this if we
> had a
> > > > > > callback
> > > > > > >> to
> > > > > > >> > > update the cache when a segment was deleted. Technically
> > > offsets
> > > > > can
> > > > > > >> be
> > > > > > >> > > given their own expiration time, but it seems questionable
> > > > whether
> > > > > > we
> > > > > > >> > need
> > > > > > >> > > this going forward (the new consumer doesn't even expose
> it
> > at
> > > > the
> > > > > > >> > moment).
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> > The KIP in its current form isn't adding a callback. So
> you'd
> > > > still
> > > > > > >> need to
> > > > > > >> > scan the cache and remove any expired offsets, however you
> > > > wouldn't
> > > > > > send
> > > > > > >> > the tombstone messages.
> > > > > > >> > Having a callback sounds useful, though it isn't clear to me
> > how
> > > > you
> > > > > > >> would
> > > > > > >> > know which offsets to remove from the cache on segment
> > > deletion? I
> > > > > > will
> > > > > > >> > look into it.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > > 2. This KIP could also be useful for expiration in the
> case
> > > of a
> > > > > > cache
> > > > > > >> > > maintained on the client, but I don't see an obvious way
> > that
> > > > we'd
> > > > > > be
> > > > > > >> > able
> > > > > > >> > > to leverage it since there's no indication to the client
> > when
> > > a
> > > > > > >> segment
> > > > > > >> > has
> > > > > > >> > > been deleted (unless they reload the cache from the
> > beginning
> > > of
> > > > > the
> > > > > > >> > log).
> > > > > > >> > > One approach I can think of would be to write
> corresponding
> > > > > > >> tombstones as
> > > > > > >> > > necessary when a segment is removed, but that seems pretty
> > > > heavy.
> > > > > > Have
> > > > > > >> > you
> > > > > > >> > > considered this problem?
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > We've not considered this and I'm not sure we want to as
> part
> > of
> > > > > this
> > > > > > >> KIP.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> > Damian
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > On Mon, Aug 8, 2016 at 12:41 AM, Damian Guy <
> > > > damian@gmail.com
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hi,
> > > > > > >> > > >
> > > > > > >> > > > We have created KIP 71: Enable log compaction and
> deletion
> > > to
> > > > > > >> co-exist`
> > > > > > >> > > >
> > > > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > >> > > > 71%3A+Enable+log+compaction+and+deletion+to+co-exist
> > > > > > >> > > >
> > > > > > >> > > > Please take a look. Feedback is appreciated.
> > > > > > >> > > >
> > > > > > >> > > > Thank you
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Should we have a KIP call?

2016-08-18 Thread Grant Henke
I am thinking it might be a good time to have a Kafka KIP call. There are a
lot of KIPs and discussions in progress that could benefit from a "quick"
call to discuss, coordinate, and prioritize.

Some of the voted topics we could discuss are:
(I didn't include ones that were just voted or will pass just before the
call)

   - KIP-33: Add a time based log index
   - KIP-50: Move Authorizer to o.a.k.common package
   - KIP-55: Secure Quotas for Authenticated Users
   - KIP-67: Queryable state for Kafka Streams
   - KIP-70: Revise Partition Assignment Semantics on New Consumer's
   Subscription Change

Some of the un-voted topics we could discuss are:

   - Time-based releases for Apache Kafka
   - Java 7 support timeline
   - KIP-4: ACL Admin Schema
   - KIP-37 - Add Namespaces to Kafka
   - KIP-48: Delegation token support for Kafka
   - KIP-54: Sticky Partition Assignment Strategy
   - KIP-63: Unify store and downstream caching in streams
   - KIP-66: Add Kafka Connect Transformers to allow transformations to
   messages
   - KIP-72 Allow Sizing Incoming Request Queue in Bytes
   - KIP-73: Replication Quotas
   - KIP-74: Add FetchResponse size limit in bytes

As a side note it may be worth moving some open KIPs to a "parked" list if
they are not being actively worked on. We can include a reason why as well.
Reasons could include being blocked, parked, dormant (no activity), or
abandoned (creator isn,t working on it and others can pick it up). We would
need to ask the KIP creator or define some length of time before we call a
KIP abandoned and available for pickup.

Some KIPs which may be candidates to be "parked" in a first pass are:

   - KIP-6 - New reassignment partition logic for rebalancing (dormant)
   - KIP-14 - Tools standardization (dormant)
   - KIP-17 - Add HighwaterMarkOffset to OffsetFetchResponse (dormant)
   - KIP-18 - JBOD Support (dormant)
   - KIP-23 - Add JSON/CSV output and looping options to
   ConsumerGroupCommand (dormant)
   - KIP-27 - Conditional Publish (dormant)
   - KIP-30 - Allow for brokers to have plug-able consensus and meta data
   storage sub systems (dormant)
   - KIP-39: Pinning controller to broker (dormant)
   - KIP-44 - Allow Kafka to have a customized security protocol (dormant)
   - KIP-46 - Self Healing (dormant)
   - KIP-47 - Add timestamp-based log deletion policy (blocked - by KIP-33)
   - KIP-53 - Add custom policies for reconnect attempts to NetworkdClient
   - KIP-58 - Make Log Compaction Point Configurable (blocked - by KIP-33)
   - KIP-61: Add a log retention parameter for maximum disk space usage
   percentage (dormant)
   - KIP-68 Add a consumed log retention before log retention (dormant)

Thank you,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-4 ACL Admin Schema

2016-08-18 Thread Grant Henke
Thanks for the feedback. Below are some responses:


> I don't have any problem with breaking things into 2 requests if it's
> necessary or optimal. But can you explain why separate requests "vastly
> simplifies the broker side implementation"? It doesn't seem like it should
> be particularly complex to process each ACL change in order.


You are right, it isn't super difficult to process ACL changes in order.
The simplicity I was thinking about comes from not needing to group and
re-order them like in the current implementation. It also removes the
"action" enumerator which also simplifies things a bit. I am open to both
ideas (single alter processing in order vs separate requests) as the trade
offs aren't drastic.

I am leaning towards separate requests for a few reasons:

   - The Admin API will likely submit separate requests for delete and add
   regardless
  - I expect most admin apis will have a removeAcl/s and addAcl/s api
  the fires a request immediately. I don't expect "batching" explicit or
  implicit to be all that common.
   - Independent concerns and capabilities
  - Separating delete into its own request makes it easier to define
  "delete all" type behavior.
  - I think 2 simple requests may be easier to document and understand
  by client developers than 1 more complicated one.
   - Matches the Authorizer interface
  - Separate requests matches authorizer interface more closely and
  still allows for actions on collections of ACLs instead of one
call per ACL
  via Authorizer.addAcls(acls: Set[Acl], resource: Resource) and
  Authorizer.removeAcls(acls: Set[Acl], resource: Resource)

Hmm, even if all ACL requests are directed to the controller, concurrency
> is only guaranteed on the same connection. For requests coming from
> different connections, the order that they get processed on the broker is
> non-deterministic.


I was thinking less about making sure the exact order of requests is
handled correctly, since each client will likely get a response between
each request before sending another. Its more about ensuring local
state/cache is accurate and that there is no way 2 nodes can have different
ACLs which they think are correct. Good implementations will handle this,
but may take a performance hit.

Perhaps I am overthinking it and the SimpleAclAuthorizer is the only one
that would be affected by this (and likely rarely because volume of ACL
write requests should not be high). The SimpleAclAuthorizer is eventually
consistent between instances. It writes optimistically with the cached zk
node version while writing a complete list of ACLs (not just adding
removing single nodes for acls). Below is a concrete demonstration of the
impact I am thinking about with the SimpleAclAuthorizer:

If we force all writes to go through one instance then follow up (ignoring
first call to warm cache) writes for a resource would:

   1. Call addAcls
   2. Call updateResourceAcls combining the current cached acls and the new
   acls
   3. Write the result to Zookeeper via conditional write on the cached
   version
   4. Success 1 remote call

If requests can go through any instance then follow up writes for a
resource may:

   1. Call addAcls
   2. Call updateResourceAcls combining the current cached acls and the new
   acls
  1. If no cached acls read from Zookeeper
   3. Write the result to Zookeeper via conditional write on the cached
   version
  1. If the cached version is wrong due to a write on another
  instance read from Zookeeper
  2. Rebuild the final ACLs list
  3. Repeat until the write is successful
   4. Success in 1 or 4 (or more) remote calls

It looks like the Sentry implementation would not have this issue and the
Ranger implementation doesn't support modifying ACLs anyway (must use the
Ranger UI/API).

I wanted to explain my original thoughts, but I am happy to remove the
controller constraint given the SimpleAclAuthorizer appears to be the only
(of those I know) implementation to be affected.

Thank you,
Grant






On Sat, Aug 13, 2016 at 5:41 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> On Mon, Aug 8, 2016 at 2:44 PM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > Thank you for the feedback everyone. Below I respond to the last batch of
> > emails:
> >
> > You mention that "delete" actions
> > > will get processed before "add" actions, which makes sense to me. An
> > > alternative to avoid the confusion in the first place would be to
> replace
> > > the AlterAcls APIs with separate AddAcls and DeleteAcls APIs. Was this
> > > option already rejected?
> >
> >
> > 4. There is no CreateAcls or DeleteAcls (unlike CreateTopics and
> >
> > DeleteTopics, for example). It would be good to explain the reasoning for
> >

Re: [ANNOUNCE] New Kafka PMC member Gwen Shapira

2016-08-18 Thread Grant Henke
Congratulations Gwen!



On Thu, Aug 18, 2016 at 2:36 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Congratulations Gwen! Great news.
>
> Ismael
>
> On 18 Aug 2016 2:44 am, "Jun Rao" <j...@confluent.io> wrote:
>
> > Hi, Everyone,
> >
> > Gwen Shapira has been active in the Kafka community since she became a
> > Kafka committer
> > about a year ago. I am glad to announce that Gwen is now a member of
> Kafka
> > PMC.
> >
> > Congratulations, Gwen!
> >
> > Jun
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-59 - Proposal for a kafka broker command - kafka-brokers.sh

2016-08-17 Thread Grant Henke
Hi Jayesh,

Like Gwen said KIP-4 is adding fields and requests to the wire protocols in
order to allow all admin tools to talk directly to Kafka and a client api
to support those requests. Talking to Kafka as opposed to Zookeeper allows
us to leverage authorization, lock down zookeeper, and improve
compatibility. Like Gwen said waiting until some of the KIP-4 work is done
may avoid rework. I can't make a commitment to how fast the client will be
available as it depends on many factors but progress is being made
regularly and I do have some mock client work done locally.

It looks like the only thing in your proposal that can not be supported via
the wire protocol today is the endpoints metadata. It's sort of a catch-22
because the bootstrap is required to connect to a Kafka cluster (as opposed
to zookeeper) and at that point the Metadata returned assumes an endpoint
of the connecting security protocol. Is that an acceptable limitation?

Thanks,
Grant

On Wed, Aug 17, 2016 at 6:53 PM, Gwen Shapira <g...@confluent.io> wrote:

> Thanks Jayesh.
>
> I think this can be a useful addition to Apache Kafka.
> One potential issue is that you are getting all the information for
> ZooKeeper. We already added a protocol that allows adding the
> information to Kafka itself and are in the process of adding an admin
> client (i.e. Java client, not CLI).
> If you add this as planned, we'll need to re-work it to work with
> Kafka directly instead of ZooKeeper once the admin client lands. If
> you choose, you can wait for the admin client to arrive first, and
> avoid the re-work.
>
> Gwen
>
> On Tue, Aug 16, 2016 at 7:22 AM, Jayesh Thakrar
> <j_thak...@yahoo.com.invalid> wrote:
> > All,
> > If there is no discussion, feedback or objection, is there any concern
> in going to the next step of voting?
> > Thanks,Jayesh
> >   From: Jayesh Thakrar <j_thak...@yahoo.com>
> >  To: "dev@kafka.apache.org" <dev@kafka.apache.org>
> >  Sent: Saturday, August 13, 2016 11:44 PM
> >  Subject: [DISCUSS] KIP-59 - Proposal for a kafka broker command -
> kafka-brokers.sh
> >
> > This is to start off a discussion on the above KIP at
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 59%3A+Proposal+for+a+kafka+broker+commandThe proposal is to fill the void
> of a command line tool/utility that can provide information on the brokers
> in a Kafka cluster.
> > The code is available on GitHub at https://github.com/JThakrar/kafkaThe
> KIP page has the help documentation as well as the output from the command
> with various options.Thank you,Jayesh Thakrar
> >
> >
> >
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Updated] (KAFKA-4032) Uncaught exceptions when autocreating topics

2016-08-14 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-4032:
---
Status: Patch Available  (was: Open)

> Uncaught exceptions when autocreating topics
> 
>
> Key: KAFKA-4032
> URL: https://issues.apache.org/jira/browse/KAFKA-4032
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jason Gustafson
>    Assignee: Grant Henke
>
> With the addition of the CreateTopics API in KIP-4, we have some new 
> exceptions which can be raised from {{AdminUtils.createTopic}}. For example, 
> it is possible to raise InvalidReplicationFactorException. Since we have not 
> yet removed the ability to create topics automatically, we need to make sure 
> these exceptions are caught and handled in both the TopicMetadata and 
> GroupCoordinator request handlers. Currently these exceptions are propagated 
> all the way to the processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4038) Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests

2016-08-14 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-4038:
---
Status: Patch Available  (was: Open)

> Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests
> -
>
> Key: KAFKA-4038
> URL: https://issues.apache.org/jira/browse/KAFKA-4038
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>    Assignee: Grant Henke
>
> {code}
> java.lang.AssertionError: The response error should match 
> Expected :REQUEST_TIMED_OUT
> Actual   :NONE
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:89)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:88)
>   at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>   at 
> kafka.server.DeleteTopicsRequestTest.validateErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:88)
>   at 
> kafka.server.DeleteTopicsRequestTest.testErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:76)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4038) Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests

2016-08-14 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KAFKA-4038:
--

Assignee: Grant Henke

> Transient failure in DeleteTopicsRequestTest.testErrorDeleteTopicRequests
> -
>
> Key: KAFKA-4038
> URL: https://issues.apache.org/jira/browse/KAFKA-4038
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jason Gustafson
>    Assignee: Grant Henke
>
> {code}
> java.lang.AssertionError: The response error should match 
> Expected :REQUEST_TIMED_OUT
> Actual   :NONE
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:89)
>   at 
> kafka.server.DeleteTopicsRequestTest$$anonfun$validateErrorDeleteTopicRequests$1.apply(DeleteTopicsRequestTest.scala:88)
>   at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
>   at 
> kafka.server.DeleteTopicsRequestTest.validateErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:88)
>   at 
> kafka.server.DeleteTopicsRequestTest.testErrorDeleteTopicRequests(DeleteTopicsRequestTest.scala:76)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3959) __consumer_offsets wrong number of replicas at startup

2016-08-12 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419471#comment-15419471
 ] 

Grant Henke commented on KAFKA-3959:


I would like to present an alternative option. This problem exists with any 
topic created using default.replication.factor > 1 as well. That prevents using 
2 or 3 as the configuration default because we want to support single nodes 
clusters without changing the defaults. 

Instead of preventing topics from being created with a low replication factor 
(unless min.isr is set). Instead it would be really nice if we tracked a 
"target replication factor" in the topic metadata. This is an improvement over 
assuming the target replication factor based on the actual replicas as is done 
today and can actually result in a more accurate under replicated count. 

This change would also help support any ability to automatically maintain the 
desired replication factor as nodes are started, stopped, etc. Some related 
KIPs for that are:
* [KIP-73 Replication 
Quotas|https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas]
* [KIP-46: Self Healing 
Kafka|https://cwiki.apache.org/confluence/display/KAFKA/KIP-46%3A+Self+Healing+Kafka]

Would that be a viable option?

> __consumer_offsets wrong number of replicas at startup
> --
>
> Key: KAFKA-3959
> URL: https://issues.apache.org/jira/browse/KAFKA-3959
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, offset manager, replication
>Affects Versions: 0.9.0.1, 0.10.0.0
> Environment: Brokers of 3 kafka nodes running Red Hat Enterprise 
> Linux Server release 7.2 (Maipo)
>Reporter: Alban Hurtaud
>
> When creating a stack of 3 kafka brokers, the consumer is starting faster 
> than kafka nodes and when trying to read a topic, only one kafka node is 
> available.
> So the __consumer_offsets is created with a replication factor set to 1 
> (instead of configured 3) :
> offsets.topic.replication.factor=3
> default.replication.factor=3
> min.insync.replicas=2
> Then, other kafka nodes go up and we have exceptions because the replicas # 
> for __consumer_offsets is 1 and min insync is 2. So exceptions are thrown.
> What I missed is : Why the __consumer_offsets is created with replication to 
> 1 (when 1 broker is running) whereas in server.properties it is set to 3 ?
> To reproduce : 
> - Prepare 3 kafka nodes with the 3 lines above added to servers.properties.
> - Run one kafka,
> - Run one consumer (the __consumer_offsets is created with replicas =1)
> - Run 2 more kafka nodes



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4032) Uncaught exceptions when autocreating topics

2016-08-12 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419350#comment-15419350
 ] 

Grant Henke commented on KAFKA-4032:


I will make a patch for this shortly

> Uncaught exceptions when autocreating topics
> 
>
> Key: KAFKA-4032
> URL: https://issues.apache.org/jira/browse/KAFKA-4032
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jason Gustafson
>    Assignee: Grant Henke
>
> With the addition of the CreateTopics API in KIP-4, we have some new 
> exceptions which can be raised from {{AdminUtils.createTopic}}. For example, 
> it is possible to raise InvalidReplicationFactorException. Since we have not 
> yet removed the ability to create topics automatically, we need to make sure 
> these exceptions are caught and handled in both the TopicMetadata and 
> GroupCoordinator request handlers. Currently these exceptions are propagated 
> all the way to the processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-4032) Uncaught exceptions when autocreating topics

2016-08-12 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KAFKA-4032:
--

Assignee: Grant Henke

> Uncaught exceptions when autocreating topics
> 
>
> Key: KAFKA-4032
> URL: https://issues.apache.org/jira/browse/KAFKA-4032
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jason Gustafson
>    Assignee: Grant Henke
>
> With the addition of the CreateTopics API in KIP-4, we have some new 
> exceptions which can be raised from {{AdminUtils.createTopic}}. For example, 
> it is possible to raise InvalidReplicationFactorException. Since we have not 
> yet removed the ability to create topics automatically, we need to make sure 
> these exceptions are caught and handled in both the TopicMetadata and 
> GroupCoordinator request handlers. Currently these exceptions are propagated 
> all the way to the processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Consumer Offset Migration Tool

2016-08-09 Thread Grant Henke
Hi Jun,

Exactly what Gwen said. I am assuming shutdown old consumers, migrate
offsets, start new consumers. This is best for cases where you are
migrating from the old clients to the new clients without ever using dual
commit in the old client. Because the new clients can't coordinate with the
old ones an outage is required regardless.

Thanks,
Grant

On Tue, Aug 9, 2016 at 8:19 PM, Gwen Shapira <g...@confluent.io> wrote:

> Jun,
>
> Grant's use-case is about migrating from old-consumer-committing-to-ZK
> to new-consumer-committing-to-Kafka (which is what happens if you
> upgrade Flume, and maybe other stream processing systems too). This
> seems to require shutting down all instances in any case.
>
> Gwen
>
> On Tue, Aug 9, 2016 at 6:05 PM, Jun Rao <j...@confluent.io> wrote:
> > Hi, Grant,
> >
> > For your tool to work, do you expect all consumer instances in the
> consumer
> > group to be stopped before copying the offsets? Some applications may not
> > want to do that.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Tue, Aug 9, 2016 at 10:01 AM, Grant Henke <ghe...@cloudera.com>
> wrote:
> >
> >> I had to write a simple offset migration tool and I wanted to get
> feedback
> >> on whether or not this would be a useful addition to Apache Kafka.
> >>
> >> Currently the path to upgrade from the zookeeper offsets to the Kafka
> >> offset (and often the Scala to Java client) is via dual commit. The
> process
> >> is documented here:
> >> http://kafka.apache.org/documentation.html#offsetmigration
> >>
> >> The reason that process wasn't sufficient in my case is because:
> >>
> >>- It needs to be done ahead of the upgrade
> >>- It requires the old client to commit at least once in dual commit
> mode
> >>- Some frameworks don't expose the dual commit functionality well
> >>- Dual commit is not supported in 0.8.1.x
> >>
> >> The tool I wrote takes the relevant connection information and a
> consumer
> >> group and simply copies the Zookeeper offsets into the Kafka offsets for
> >> that group.
> >> A rough WIP PR can be seen here: https://github.com/apache/
> kafka/pull/1715
> >>
> >> Even though many users have already made the transition, I think this
> could
> >> still be useful in Kafka. Here are a few reasons:
> >>
> >>- It simplifies the migration for users who have yet to migrate,
> >>especially as the old clients get deprecated and removed
> >>- Though the tool is not available in the Kafka 0.8.x or 0.9.x
> series,
> >>downloading and using the jar from maven would be fairly
> straightforward
> >>   - Alternatively this could be a separate repo or jar, though I
> hardly
> >>   want to push this single tool to maven as a standalone artifact.
> >>
> >> Do you think this is useful in Apache Kafka? Any thoughts on the
> approach?
> >>
> >> Thanks,
> >> Grant
> >> --
> >> Grant Henke
> >> Software Engineer | Cloudera
> >> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >>
>
>
>
> --
> Gwen Shapira
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Consumer Offset Migration Tool

2016-08-09 Thread Grant Henke
I had to write a simple offset migration tool and I wanted to get feedback
on whether or not this would be a useful addition to Apache Kafka.

Currently the path to upgrade from the zookeeper offsets to the Kafka
offset (and often the Scala to Java client) is via dual commit. The process
is documented here:
http://kafka.apache.org/documentation.html#offsetmigration

The reason that process wasn't sufficient in my case is because:

   - It needs to be done ahead of the upgrade
   - It requires the old client to commit at least once in dual commit mode
   - Some frameworks don't expose the dual commit functionality well
   - Dual commit is not supported in 0.8.1.x

The tool I wrote takes the relevant connection information and a consumer
group and simply copies the Zookeeper offsets into the Kafka offsets for
that group.
A rough WIP PR can be seen here: https://github.com/apache/kafka/pull/1715

Even though many users have already made the transition, I think this could
still be useful in Kafka. Here are a few reasons:

   - It simplifies the migration for users who have yet to migrate,
   especially as the old clients get deprecated and removed
   - Though the tool is not available in the Kafka 0.8.x or 0.9.x series,
   downloading and using the jar from maven would be fairly straightforward
  - Alternatively this could be a separate repo or jar, though I hardly
  want to push this single tool to maven as a standalone artifact.

Do you think this is useful in Apache Kafka? Any thoughts on the approach?

Thanks,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Updated] (KAFKA-3934) Start scripts enable GC by default with no way to disable

2016-08-09 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3934:
---
Summary: Start scripts enable GC by default with no way to disable  (was: 
kafka-server-start.sh enables GC by default with no way to disable)

> Start scripts enable GC by default with no way to disable
> -
>
> Key: KAFKA-3934
> URL: https://issues.apache.org/jira/browse/KAFKA-3934
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
>
> In KAFKA-1127 the following line was added to kafka-server-start.sh:
> {noformat}
> EXTRA_ARGS="-name kafkaServer -loggc"
> {noformat}
> This prevents gc logging from being disabled without some unusual environment 
> variable workarounds. 
> I suggest EXTRA_ARGS is made overridable like below: 
> {noformat}
> if [ "x$EXTRA_ARGS" = "x" ]; then
> export EXTRA_ARGS="-name kafkaServer -loggc"
> fi
> {noformat}
> *Note:* I am also not sure I understand why the existing code uses the "x" 
> thing when checking the variable instead of the following:
> {noformat}
> export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
> {noformat}
> This lets the variable be overridden to "" without taking the default. 
> *Workaround:* As a workaround the user should be able to set 
> $KAFKA_GC_LOG_OPTS to fit their needs. Since kafka-run-class.sh will not 
> ignore the -loggc parameter if that is set. 
> {noformat}
> -loggc)
>   if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
> GC_LOG_ENABLED="true"
>   fi
>   shift
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 ACL Admin Schema

2016-08-08 Thread Grant Henke
be nice to have something more descriptive, if possible. Any ideas?


The problem w/ being more descriptive is that its possible that

it restricts potential use cases if people think that somehow

their use case wouldn't fit.



> In the database world (Resource, ACL) pair is typically called a
> "grant". (sorry)
>
> You "grant" permission on a resource to a user.


"grant" has a nice sound to it...I can't figure it out but I like it a lot.
Happy to use that as the term for the resource & ACLs pair. I don't think
it will restrict any potential use cases.

I didn't use the term since its not used in the Authorizer API methods
(addAcls, removeAcls) or existing docs.

5. What is the plan for when we add standard exceptions to the Authorizer
> interface? Will we bump the protocol version?


Currently I sort of "punt" on this issue. I catch any throwable and log it
server side before letting the generic error handling take over. If the
error is not mapped in Errors.java it results in a Unknown error code back
to the client. This means that when we decide to set standard/descriptive
exceptions (KIP-50?) they can be communicated without any server side
changes.

An alternative is to wrap all exceptions in some generic
AuthorizerException but that doesn't feel much more descriptive or helpful.

I could also add the current exceptions laid out in KIP-50 here
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-50+-+Move+Authorizer+to+o.a.k.common+package#KIP-50-MoveAuthorizertoo.a.k.commonpackage-AddexceptionsrelatedtoAuthorizer.>,
but they will not be thrown until new Authorizer implementations use them.

Is there any benefit for sending the AlterAcls
> request to the controller? The controller is currently only designed for
> sending topic level metadata.
>

Essentially I am calling the controller node the leader for ACL writes. The
Authorizer API exposes no details of delayed writes or consensus. If we
allow writes to go to any node, we will be crossing our fingers that any
Authorizer implementation handles this concurrency problem well today. Even
if the implementation is sound, directing the low volume writes to a single
node prevents delays due to retries and stale caches.

As an example, "KAFKA-3328
<https://issues.apache.org/jira/browse/KAFKA-3328>: SimpleAclAuthorizer can
lose ACLs with frequent add/remove calls" would have been a much bigger
issue if requests could go to all nodes. Even after the fix, it could cause
delayed writes because of retries.

Its not absolutely required, but I think its a good simplification.

Thanks,
Grant

On Fri, Jul 29, 2016 at 12:03 AM, Gwen Shapira <g...@confluent.io> wrote:

> In the database world (Resource, ACL) pair is typically called a
> "grant". (sorry)
>
> You "grant" permission on a resource to a user.
>
> http://dev.mysql.com/doc/refman/5.7/en/show-grants.html
>
> Gwen
>
>
>
>
> On Fri, Jul 22, 2016 at 4:13 AM, Jim Jagielski <j...@jagunet.com> wrote:
> >
> >> On Jul 21, 2016, at 10:57 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> >>
> >> Hi Grant,
> >>
> >> Thanks for the KIP.  A few questions and comments:
> >>
> >> 1. My main concern is that we are skipping the discussion on the desired
> >> model for controlling ACL access and updates. I understand the desire to
> >> reduce the scope, but this seems to be a fundamental aspect of the
> design
> >> that we need to get right. Without a plan for that, it is difficult to
> >> evaluate if that part of the current proposal is fine.
> >
> > ++1.
> >
> >> 2. Are the Java objects in "org.apache.kafka.common.security.auth"
> going to
> >> be public API? If so, we should explain why they should be public and
> >> describe them in the KIP. If not, we should mention that.
> >> 3. It would be nice to have a name for a (Resource, ACL) pair. The
> current
> >> protocol uses `requests`/`responses` for the list of such pairs, but it
> >> would be nice to have something more descriptive, if possible. Any
> ideas?
> >
> > The problem w/ being more descriptive is that its possible that
> > it restricts potential use cases if people think that somehow
> > their use case wouldn't fit.
> >
> >> 4. There is no CreateAcls or DeleteAcls (unlike CreateTopics and
> >> DeleteTopics, for example). It would be good to explain the reasoning
> for
> >> this choice (Jason also asked this question).
> >> 5. What is the plan for when we add standard exceptions to the
> Authorizer
> >> interface? Will we bump the protocol version?
> >>
> >> Thanks,
> >> I

Re: [VOTE] 0.10.0.1 RC2

2016-08-05 Thread Grant Henke
+1 (non-binding)

On Fri, Aug 5, 2016 at 2:04 PM, Dana Powers <dana.pow...@gmail.com> wrote:

> passed kafka-python integration tests, +1
>
> -Dana
>
>
> On Fri, Aug 5, 2016 at 9:35 AM, Tom Crayford <tcrayf...@heroku.com> wrote:
> > Heroku has tested this using the same performance testing setup we used
> to
> > evaluate the impact of 0.9 -> 0.10 (see https://engineering.
> > heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-
> > performance-in-distributed-systems/).
> >
> > We see no issues at all with them, so +1 (non-binding) from here.
> >
> > On Fri, Aug 5, 2016 at 12:58 PM, Jim Jagielski <j...@jagunet.com> wrote:
> >
> >> Looks good here: +1
> >>
> >> > On Aug 4, 2016, at 9:54 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >> >
> >> > Hello Kafka users, developers and client-developers,
> >> >
> >> > This is the third candidate for the release of Apache Kafka 0.10.0.1.
> >> This
> >> > is a bug fix release and it includes fixes and improvements from 53
> JIRAs
> >> > (including a few critical bugs). See the release notes for more
> details:
> >> >
> >> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/RELEASE_NOTES.html
> >> >
> >> > When compared to RC1, RC2 contains a fix for a regression where an
> older
> >> > version of slf4j-log4j12 was also being included in the libs folder of
> >> the
> >> > binary tarball (KAFKA-4008). Thanks to Manikumar Reddy for reporting
> the
> >> > issue.
> >> >
> >> > *** Please download, test and vote by Monday, 8 August, 8am PT ***
> >> >
> >> > Kafka's KEYS file containing PGP keys we use to sign the release:
> >> > http://kafka.apache.org/KEYS
> >> >
> >> > * Release artifacts to be voted upon (source and binary):
> >> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/
> >> >
> >> > * Maven artifacts to be voted upon:
> >> > https://repository.apache.org/content/groups/staging
> >> >
> >> > * Javadoc:
> >> > http://home.apache.org/~ijuma/kafka-0.10.0.1-rc2/javadoc/
> >> >
> >> > * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc2 tag:
> >> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> >> f8f56751744ba8e55f90f5c4f3aed8c3459447b2
> >> >
> >> > * Documentation:
> >> > http://kafka.apache.org/0100/documentation.html
> >> >
> >> > * Protocol:
> >> > http://kafka.apache.org/0100/protocol.html
> >> >
> >> > * Successful Jenkins builds for the 0.10.0 branch:
> >> > Unit/integration tests: *https://builds.apache.org/job
> >> /kafka-0.10.0-jdk7/182/
> >> > <https://builds.apache.org/job/kafka-0.10.0-jdk7/182/>*
> >> > System tests: *https://jenkins.confluent.io/
> >> job/system-test-kafka-0.10.0/138/
> >> > <https://jenkins.confluent.io/job/system-test-kafka-0.10.0/138/>*
> >> >
> >> > Thanks,
> >> > Ismael
> >>
> >>
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-72 Allow Sizing Incoming Request Queue in Bytes

2016-08-05 Thread Grant Henke
The proposal makes sense to me. I like that the plan to support both limits
simultaneously:

queued.max.requests is supported in addition queued.max.bytes (both
> respected at the same time). In this case a default value of
> queued.max.bytes = -1 would maintain backwards compatible behavior.


Especially since ByteBoundedBlockingQueue already supports a limit on byte
and element count.

I actually don't mind the property naming of queued.max.*. I am not sure if
the added clarity of requestQueue.max.* is worth the migration.





On Fri, Aug 5, 2016 at 9:34 AM, Tom Crayford <tcrayf...@heroku.com> wrote:

> This makes good sense to me, and seems to have very low amounts of downside
> with large amounts of upside. +1
>
> On Thursday, 4 August 2016, radai <radai.rosenbl...@gmail.com> wrote:
>
> > Hello,
> >
> > I'd like to initiate a discussion about
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 72%3A+Allow+Sizing+Incoming+Request+Queue+in+Bytes
> >
> > The goal of the KIP is to allow configuring a bound on the capacity (as
> in
> > bytes of memory used) of the incoming request queue, in addition to the
> > current bound on the number of messages.
> >
> > This comes after several incidents at Linkedin where a sudden "spike" of
> > large message batches caused an out of memory exception.
> >
> > Thank you,
> >
> >Radai
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-4 ACL Admin Schema

2016-07-21 Thread Grant Henke
Anyone else have any feedback on this protocol and implementation? I plan
to start a vote soon.

Thank you,
Grant

On Fri, Jul 15, 2016 at 1:04 PM, Gwen Shapira <g...@confluent.io> wrote:

> > My goal in the protocol design was to keep the request simple and be able
> > to answer what I think are the 3 most common questions/requests
> >
> >- What ACLs are on the cluster?
> >- What access do I/they have?
> >- Who has access to this resource?
>
> Thanks for clarifying. I think this is good. Perhaps just document
> this goal next to the protocol for the record :)
>
> > Isn't KIP-50 itself one gigantic compatibility concern? I don't see
> >> how your suggestions make it any worse...
> >
> >
> >
> >>  Yes, I also think we should take this chance to improve the Authorizer
> interface
> >> to make it more suitable for the ACL Admin requests.
> >
> >
> > I agree we can address this in KIP-50. What I was getting at was that I
> > wanted to handle that discussion there. We voted on KIP-50 before 0.10
> was
> > released with the intention that we could get it in. Now that 0.10 is
> > released and a longer time has gone by I am not sure if the opinion of
> > "breaking is okay" has changed. I will always prefer a backward
> compatible
> > approach if possible.
>
> Well, the entire KIP-50 discussion - both regarding compatibility and
> possible increased scope is probably out of context here. Especially
> since this proposal was written carefully to avoid any assumptions
> regarding other work. I suggest taking this in a separate thread.
>
> Gwen
>
> > Thank you,
> > Grant
> >
> >
> > On Fri, Jul 15, 2016 at 7:22 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> >> On Fri, Jul 15, 2016 at 6:45 AM, Gwen Shapira <g...@confluent.io>
> wrote:
> >> >
> >> > >>  - I suggest this be addressed in KIP-50 as well, though it
> >> has
> >> > >>  some compatibility concerns.
> >> >
> >> > Isn't KIP-50 itself one gigantic compatibility concern? I don't see
> >> > how your suggestions make it any worse...
> >> >
> >>
> >> Yes, I also think we should take this chance to improve the Authorizer
> >> interface to make it more suitable for the ACL Admin requests.
> >>
> >> Ismael
> >>
> >
> >
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Updated] (KAFKA-2507) Replace ControlledShutdown{Request,Response} with org.apache.kafka.common.requests equivalent

2016-07-18 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-2507:
---
Fix Version/s: 0.11.0.0

> Replace ControlledShutdown{Request,Response} with 
> org.apache.kafka.common.requests equivalent
> -
>
> Key: KAFKA-2507
> URL: https://issues.apache.org/jira/browse/KAFKA-2507
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Ismael Juma
>    Assignee: Grant Henke
> Fix For: 0.11.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3934) kafka-server-start.sh enables GC by default with no way to disable

2016-07-18 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3934:
---
Status: Patch Available  (was: Open)

> kafka-server-start.sh enables GC by default with no way to disable
> --
>
> Key: KAFKA-3934
> URL: https://issues.apache.org/jira/browse/KAFKA-3934
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
>
> In KAFKA-1127 the following line was added to kafka-server-start.sh:
> {noformat}
> EXTRA_ARGS="-name kafkaServer -loggc"
> {noformat}
> This prevents gc logging from being disabled without some unusual environment 
> variable workarounds. 
> I suggest EXTRA_ARGS is made overridable like below: 
> {noformat}
> if [ "x$EXTRA_ARGS" = "x" ]; then
> export EXTRA_ARGS="-name kafkaServer -loggc"
> fi
> {noformat}
> *Note:* I am also not sure I understand why the existing code uses the "x" 
> thing when checking the variable instead of the following:
> {noformat}
> export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
> {noformat}
> This lets the variable be overridden to "" without taking the default. 
> *Workaround:* As a workaround the user should be able to set 
> $KAFKA_GC_LOG_OPTS to fit their needs. Since kafka-run-class.sh will not 
> ignore the -loggc parameter if that is set. 
> {noformat}
> -loggc)
>   if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
> GC_LOG_ENABLED="true"
>   fi
>   shift
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-48 Support for delegation tokens as an authentication mechanism

2016-07-18 Thread Grant Henke
Hi Parth,

Are you still working on this? If you need any help please don't hesitate
to ask.

Thanks,
Grant

On Thu, Jun 30, 2016 at 4:35 PM, Jun Rao  wrote:

> Parth,
>
> Thanks for the reply.
>
> It makes sense to only allow the renewal by users that authenticated using
> *non* delegation token mechanism. Then, should we make the renewal a list?
> For example, in the case of rest proxy, it will be useful for every
> instance of rest proxy to be able to renew the tokens.
>
> It would be clearer if we can document the request protocol like
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicsRequest(KAFKA-2945):(VotedandPlannedforin0.10.1.0)
> .
>
> It would also be useful to document the client APIs.
>
> Thanks,
>
> Jun
>
> On Tue, Jun 28, 2016 at 2:55 PM, parth brahmbhatt <
> brahmbhatt.pa...@gmail.com> wrote:
>
> > Hi,
> >
> > I am suggesting that we will only allow the renewal by users that
> > authenticated using *non* delegation token mechanism. For example, If
> user
> > Alice authenticated using kerberos and requested delegation tokens, only
> > user Alice authenticated via non delegation token mechanism can renew.
> > Clients that have  access to delegation tokens can not issue renewal
> > request for renewing their own token and this is primarily important to
> > reduce the time window for which a compromised token will be valid.
> >
> > To clarify, Yes any authenticated user can request delegation tokens but
> > even here I would recommend to avoid creating a chain where a client
> > authenticated via delegation token request for more delegation tokens.
> > Basically anyone can request delegation token, as long as they
> authenticate
> > via a non delegation token mechanism.
> >
> > Aren't classes listed here
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka#KIP-48DelegationtokensupportforKafka-PublicInterfaces
> > >
> > sufficient?
> >
> > Thanks
> > Parth
> >
> >
> >
> > On Tue, Jun 21, 2016 at 4:33 PM, Jun Rao  wrote:
> >
> > > Parth,
> > >
> > > Thanks for the reply. A couple of comments inline below.
> > >
> > > On Tue, Jun 21, 2016 at 10:36 AM, parth brahmbhatt <
> > > brahmbhatt.pa...@gmail.com> wrote:
> > >
> > > > 1. Who / how are tokens renewed? By original requester only? or using
> > > > Kerberos
> > > > auth only?
> > > > My recommendation is to do this only using Kerberos auth and only
> threw
> > > the
> > > > renewer specified during the acquisition request.
> > > >
> > > >
> > > Hmm, not sure that I follow this. Are you saying that any client
> > > authenticated with the delegation token can renew, i.e. there is no
> > renewer
> > > needed?
> > >
> > > Also, just to be clear, any authenticated client (either through SASL
> or
> > > SSL) can request a delegation token for the authenticated user, right?
> > >
> > >
> > > > 2. Are tokens stored on each broker or in ZK?
> > > > My recommendation is still to store in ZK or not store them at all.
> The
> > > > whole controller based distribution is too much overhead with not
> much
> > to
> > > > achieve.
> > > >
> > > > 3. How are tokens invalidated / expired?
> > > > Either by expiration time out or through an explicit request to
> > > invalidate.
> > > >
> > > > 4. Which encryption algorithm is used?
> > > > SCRAM
> > > >
> > > > 5. What is the impersonation proposal (it wasn't in the KIP but was
> > > > discussed
> > > > in this thread)?
> > > > There is no imperonation proposal. I tried and explained how its a
> > > > different problem and why its not really necessary to discuss that as
> > > part
> > > > of this KIP.  This KIP will not support any impersonation, it will
> just
> > > be
> > > > another way to authenticate.
> > > >
> > > > 6. Do we need new ACLs, if so - for what actions?
> > > > We do not need new ACLs.
> > > >
> > > >
> > > Could we document the format of the new request/response and their
> > > associated Resource and Operation for ACL?
> > >
> > >
> > > > 7. How would the delegation token be configured in the client?
> > > > Should be through config. I wasn't planning on supporting JAAS for
> > > tokens.
> > > > I don't believe hadoop does this either.
> > > >
> > > > Thanks
> > > > Parth
> > > >
> > > >
> > > >
> > > > On Thu, Jun 16, 2016 at 4:03 PM, Jun Rao  wrote:
> > > >
> > > > > Harsha,
> > > > >
> > > > > Another question.
> > > > >
> > > > > 9. How would the delegation token be configured in the client? The
> > > > standard
> > > > > way is to do this through JAAS. However, we will need to think
> > through
> > > if
> > > > > this is convenient in a shared environment. For example, when a new
> > > task
> > > > is
> > > > > added to a Storm worker node, do we need to dynamically add a new
> > > section
> > > > > in the JAAS file? It may be more convenient if we 

[DISCUSS] KIP-4 ACL Admin Schema

2016-07-14 Thread Grant Henke
ction allows batching requests to the
>   authorizer via the Authorizer.addAcls and Authorizer.removeAcls calls.
>4. The request is not transactional. One failure wont stop others from
>running.
>   1. If an error occurs on one action, the others could still be run.
>   2. Errors are reported independently.
>   5. The principle must be authorized to the "All" Operation on the
>"Cluster" resource to alter ACLs.
>   - Unauthorized requests will receive a ClusterAuthorizationException
>   - This avoids adding a new operation that an existing authorizer
>   implementation may not be aware of.
>   - This can be reviewed and further refined/restricted as a follow
>   up ACLs review after this KIP. See Follow Up Changes
>   
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-follow-up-changes>
>   .
>
> QA:
>
>- Why doesn't this request have a timeout and implement any blocking
>like the CreateTopicsRequest?
>   - The Authorizer implementation is synchronous and exposes no
>   details about propagating the ACLs to other nodes.
>   - The best we can do in the existing implementation is
>   call Authorizer.addAcls and Authorizer.removeAcls and hope the 
> underlying
>   implementation handles the rest.
>- What happens if there is an error in the Authorizer?
>   - Currently the best we can do is log the error broker side and
>   return a generic exception because there are no "standard" exceptions
>   defined in the Authorizer interface to provide a more clear code
>   - KIP-50
>   
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-50+-+Move+Authorizer+to+o.a.k.common+package#KIP-50-MoveAuthorizertoo.a.k.commonpackage-AddexceptionsrelatedtoAuthorizer.>
>  is
>   tracking adding the standard exceptions
>   - The Authorizer interface also provides no feedback about
>   individual ACLs when added or deleted in a group
>   - Authorizer.addAcls is a void function, the best we can do is
>  return an error for all ACLs and let the user check the current 
> state by
>  listing the ACLs
>  - Autohrizer.removeAcls is a boolean function,  the best we can
>  do is return an error for all ACLs and let the user check the 
> current state
>  by listing the ACLs
>  - Behavior here could vary drastically between implementations
>  - I suggest this be addressed in KIP-50 as well, though it has
>  some compatibility concerns.
>   - Why require the request to go to the controller?
>   - The controller is responsible for the cluster metadata and its
>   propagation
>   - This ensures one instance of the Authorizer sees all the changes
>   and reduces concurrency issues, especially because the Authorizer 
> interface
>   exposes no details about propagating the ACLs to other nodes.
>   - See Request Forwarding
>   
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-request>
>below
>
> Alter ACLs Response
>
>
>
> AlterAcls Response (Version: 0) => [responses]
>   responses => resource [results]
> resource => resource_type resource_name
>   resource_type => INT8
>   resource_name => STRING
> results => action acl error_code
>   action => INT8
>   acl => acl_principle acl_permission_type acl_host acl_operation
> acl_principle => STRING
> acl_permission_type => INT8
> acl_host => STRING
> acl_operation => INT8
>   error_code => INT16
>
>
>

Thank you,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Updated] (KAFKA-2946) DeleteTopic - protocol and server side implementation

2016-07-12 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-2946:
---
Status: Patch Available  (was: In Progress)

> DeleteTopic - protocol and server side implementation
> -
>
> Key: KAFKA-2946
> URL: https://issues.apache.org/jira/browse/KAFKA-2946
> Project: Kafka
>  Issue Type: Sub-task
>        Reporter: Grant Henke
>    Assignee: Grant Henke
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Client Side Auto Topic Creation

2016-07-11 Thread Grant Henke
Thanks for all the feedback guys.

We could change the existing behavior if it's bad for most of the users. In
> the case of auto topic creation in the producer, it seems that it's at
> least convenient in a testing environment. So, I am not sure if that
> behavior is universally bad.


I agree that testing has a unique set of needs and tooling. Some things
that are critical at scale or in real environments can be automated and
ignored. However, those things are often critical in a real environment and
allowing users to ignore them is often a trap they don't know they are
walking into.

I have often though a public testing package for Kafka would be really
useful for third party developers and integrations to leverage. I think a
test package like that would be a great place for "unsafe but convenient"
functions like auto create.
Given all the feedback I should be able to write up a straw-man KIP.

Thank you,
Grant



On Mon, Jul 11, 2016 at 3:22 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> I'd prefer to have the auto-creation on producer to false by default.
>
> Also, I would suggest not have a broker-side default configs for created
> topics, but rather on the admin-client side, this way users may be more
> aware of the default config values.
>
>
> Guozhang
>
>
> On Fri, Jul 8, 2016 at 6:06 AM, Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Hi Jun,
> >
> > I agree that it's closer to the existing behaviour, which some people may
> > be used to by now. However, I am not sure that it won't surprise people.
> As
> > Grant said, auto-topic creation is a common source of confusion and it
> > interacts badly with topic deletion.
> >
> > If we need to provide auto-topic creation in the client as a migration
> path
> > for people who rely on it and so that we can remove the server based one
> > (after a suitable deprecation period), then can we at least have it false
> > by default? This way it's more likely that people who enable it would be
> > aware of the pitfalls and it would reduce the number of confused users.
> >
> > Ismael
> >
> > On Thu, Jul 7, 2016 at 9:47 PM, Jun Rao <j...@confluent.io> wrote:
> >
> > > It seems that it makes sense for the writer to trigger auto topic
> > creation,
> > > but not the reader. So, my preference is Jay's option #1: add a new
> > > configuration to enable topic creation on the producer side and
> defaults
> > to
> > > true. If the topic doesn't exist, the producer will send a
> > > createTopicRequest and pick up the broker side defaults for #partitions
> > and
> > > replication factor. This matches the current behavior and won't
> surprise
> > > people. People who want to enforce manual topic creation can disable
> auto
> > > topic creation on the producer.
> > >
> > > On the consumer side, throwing an exception to the client when a topic
> > > doesn't exist probably works for most cases. I am wondering if there
> is a
> > > case where a user really wants to start the consumer before the topic
> is
> > > created.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Jul 1, 2016 at 4:09 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I think there are a few things being discussed and it would be good
> to
> > > make
> > > > that explicit:
> > > >
> > > > 1. If and how we expose auto-topic creation in the client (under the
> > > > assumption that the server auto-topic creation will be deprecated and
> > > > eventually removed)
> > > > 2. The ability to create topics with the cluster defaults for
> > replication
> > > > factor and partition counts
> > > > 3. Support for topic "specs"
> > > > 4. The fact that some exceptions are retriable in some cases, but not
> > > > others
> > > >
> > > > My thoughts on each:
> > > >
> > > > 1. I prefer the approach where we throw an exception and let the
> > clients
> > > > create the topic via `AdminClient` if that's what they need.
> > > > 2. Like Grant, I'm unsure that will generally be used in a positive
> > way.
> > > > However, if this is what we need to be able to deprecate server
> > > auto-topic
> > > > creation, the benefits outweigh the costs in my opinion.
> > > > 3. Something like this would be good to have and could potentially
> > > provide
> > &

[jira] [Created] (KAFKA-3934) kafka-server-start.sh enables GC by default with no way to disable

2016-07-07 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-3934:
--

 Summary: kafka-server-start.sh enables GC by default with no way 
to disable
 Key: KAFKA-3934
 URL: https://issues.apache.org/jira/browse/KAFKA-3934
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.8.2.0
Reporter: Grant Henke
Assignee: Grant Henke


In KAFKA-1127 the following line was added to kafka-server-start.sh:

{noformat}
EXTRA_ARGS="-name kafkaServer -loggc"
{noformat}

This prevents gc logging from being disabled without some unusual environment 
variable workarounds. 

I suggest EXTRA_ARGS is made overridable like below: 

{noformat}
if [ "x$EXTRA_ARGS" = "x" ]; then
export EXTRA_ARGS="-name kafkaServer -loggc"
fi
{noformat}

*Note:* I am also not sure I understand why the existing code uses the "x" 
thing when checking the variable instead of the following:

{noformat}
export EXTRA_ARGS=${EXTRA_ARGS-'-name kafkaServer -loggc'}
{noformat}

This lets the variable be overridden to "" without taking the default. 

*Workaround:* As a workaround the user should be able to set $KAFKA_GC_LOG_OPTS 
to fit their needs. Since kafka-run-class.sh will not ignore the -loggc 
parameter if that is set. 

{noformat}
-loggc)
  if [ -z "$KAFKA_GC_LOG_OPTS" ]; then
GC_LOG_ENABLED="true"
  fi
  shift
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Client Side Auto Topic Creation

2016-06-29 Thread Grant Henke
may break if new partitions are
> added.
> > The issue is that there is no standard mechanism or convention to
> > communicate application requirements so that admins and application teams
> > can verify that they continue to be met over time.
> >
> > Imagine for a second that Kafka allowed arbitrary tags to be associated
> to
> > topics.  An application could now define a specification for it's
> > interaction with Kafka including topic names, min replication factors,
> > fault tolerance settings (replication factors, min.in.sync.replicas,
> > producer acks), compacted yes/no, topic retention settings, can
> add/remove
> > partitions, partition key, and data format.  Some of these requirements
> map
> > onto topics configs and some (like acks=all) are producer settings and
> some
> > (like partition key and data format) could be organizational conventions
> > stored as tags (format:avro).
> >
> > For organizations where only SREs/admins can create/modify topics, this
> > spec allows them to do their job while being sure they're not breaking
> the
> > application.  The application can verify on startup that it's
> requirements
> > are satisfied and fail early if not.  If the application has permissions
> to
> > create it's own topics then the spec is a declarative format for doing
> that
> > require and will not require the same topic creation boilerplate code to
> be
> > duplicated in every application.
> >
> > If people like this approach, perhaps we could define a topic spec (if
> all
> > fields besides topic name are empty it use "cluster defaults").  Then the
> > AdminClient would have an idempotent create method that takes a spec and
> > verifies that the spec is already met, tries to create topics to meet the
> > spec, or fails saying it cannot be met.  Perhaps the producer and
> consumer
> > APIs would only have a verify() method which checks if the spec is
> > satisfied.
> >
> > Cheers,
> >
> > Roger
> >
> > On Wed, Jun 29, 2016 at 8:50 AM, Grant Henke <ghe...@cloudera.com>
> wrote:
> >
> > > Thanks for the discussion, below are some thoughts and responses.
> > >
> > > One of the problems that we currently have with
> > > > the clients is that we retry silently on unknown topics under the
> > > > expectation that they will eventually be created (automatically or
> > not).
> > > > This makes it difficult to detect misconfiguration without looking
> for
> > > > warnings in the logs. This problem is compounded if the client isn't
> > > > authorized to the topic since then we don't actually know if the
> topic
> > > > exists or not and whether it is reasonable to keep retrying.
> > >
> > >
> > > Yeah this is a problem thats difficult and opaque to the user. I think
> > any
> > > of the proposed solutions would help solve this issue. Since the create
> > > would be done at the metadata request phase, instead of in the produce
> > > response handling. And if the create fails, the user would receive a
> > munch
> > > more clear authorization error.
> > >
> > > The current auto creation of topic by the broker appear to be the only
> > > > reason an unknown topic error is retriable
> > > > which leads to bugs (like
> > > https://issues.apache.org/jira/browse/KAFKA-3727
> > > > ) where the consumer hangs forever (or until woken up) and only debug
> > > > tracing shows what's going on.
> > > >
> > >
> > > I agree this is related, but should be solvable even with retriable
> > > exceptions. I think UnknownTopicOrPartitionException needs to remain
> > > generally retriable because it could occur due to outdated metadata and
> > not
> > > because a topic needs to be created. In the case of message production
> or
> > > consumption it could be explicitly handled differently in the client.
> > >
> > > Do we clearly define the expected behavior of subscribe and assign in
> the
> > > case of a missing topic? I can see reasons to fail early (partition
> will
> > > never exist, typo in topic name) and reasons to keep returning empty
> > record
> > > sets until the topic exists (consumer with a preconfigured list of
> topics
> > > that may or may not exist). Though I think failing and insisting topics
> > > exist is the most predictable. Especially since the Admin API will make
> > > creating topics easier.
> > >
> > >

Re: [DISCUSS] Client Side Auto Topic Creation

2016-06-29 Thread Grant Henke
l once you start adding AdminClient methods to the producer
> and consumer it's not really clear where to stop--e.g. if I can create I
> should be able to delete, list, etc.


I agree this gets weird and could lead to duplicate client code and
inconsistent behavior across clients. The one thing I don't like about
requiring a separate client is it maintains all its own connections and
metadata. Perhaps sometime down the road if we see a lot of mixed usage we
could break out the core cluster connection code into a KafkaConnection
class and instantiate clients with that. That way clients could share the
same KafkaConnection.

Thanks,
Grant


On Wed, Jun 29, 2016 at 9:29 AM, Jay Kreps <j...@confluent.io> wrote:

> For what it's worth the use case for auto-creation isn't using a dynamic
> set of topics, but rather letting apps flow through different
> dev/staging/prod/integration_testing/unit_testing environments without
> having the app configure appropriate replication/partitioning stuff in each
> environment and having complex logic to check if the topic is there.
> Basically if you leave this up to individual apps you get kind of a mess,
> it's better to have cluster defaults that are reasonable and controlled by
> an admin and then pre-provision anything that is weird (super big, unusual
> perms, whatever). Usually in the pre-prod environments you don't really
> care about the settings at all, and in prod you can pre-provision.
>
> This raises an important point about how we handle defaults, which I don't
> think we talked about. I do think it is really important that we allow a
> way to create topics with the "cluster defaults". I know this is possible
> for configs since if you omit them they inherit default values, but I think
> we should be able to do it with replication factor and partition count too.
> I think the Java API should expose this and maybe even encourage it.
>
> I don't have a super strong opinion on how this is exposed, though I kind
> of prefer one of two options:
> 1. Keep the approach we have now with a config option to allow auto create,
> but using this option just gives you a plain vanilla topic with no custom
> configs, for anything custom you need to use AdminClient "manually"
> 2. Just throw an exception and let you use AdminClient. This may be a bit
> of a transition for people relying on the current behavior.
>
> I kind of feel once you start adding AdminClient methods to the producer
> and consumer it's not really clear where to stop--e.g. if I can create I
> should be able to delete, list, etc.
>
> -Jay
>
> On Tue, Jun 28, 2016 at 9:26 AM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > With the KIP-4 create topic schema voted and passed and a PR available
> > upstream. I wanted to discuss moving the auto topic creation from the
> > broker side to the client side (KAFKA-2410
> > <https://issues.apache.org/jira/browse/KAFKA-2410>).
> >
> > This change has many benefits
> >
> >- Remove the need for failed messages until a topic is created
> >- Client can define the auto create parameters instead of a global
> >cluster setting
> >- Errors can be communicated back to the client more clearly
> >
> > Overall auto create is not my favorite feature, since topic creation is a
> > highly critical piece for Kafka, and with authorization added it becomes
> > even more involved. When creating a topic a user needs:
> >
> >- The access to create topics
> >- To set the correct partition count and replication factor for their
> >use case
> >- To set who has access to the topic
> >- Knowledge of how a new topic may impact regex consumers or
> mirrormaker
> >
> > Often I find use cases that look like they need auto topic creation, can
> > often be handled with a few pre made topics. That said, we still should
> > support the feature for the cases that need it (mirrormaker, streams).
> >
> > The question is how we should expose auto topic creation in the client. A
> > few options are:
> >
> >- Add configs like the broker configs today, and let the client
> >automatically create the topics if enabled
> >   - Both producer and consumer?
> >- Throw an error to the user and let them use a separate AdminClient
> >(KIP-4) api to create the topic
> >- Throw an error to the user and add a create api to the producer so
> >they can easily handle by creating a topic
> >
> > I am leaning towards the last 2 options but wanted to get some others
> > thoughts on the matter. Especially if you have use cases that use auto
> > topic creation today.
> >
> > Thanks,
> > Grant
> >
> > --
> > Grant Henke
> > Software Engineer | Cloudera
> > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[DISCUSS] Client Side Auto Topic Creation

2016-06-28 Thread Grant Henke
With the KIP-4 create topic schema voted and passed and a PR available
upstream. I wanted to discuss moving the auto topic creation from the
broker side to the client side (KAFKA-2410
<https://issues.apache.org/jira/browse/KAFKA-2410>).

This change has many benefits

   - Remove the need for failed messages until a topic is created
   - Client can define the auto create parameters instead of a global
   cluster setting
   - Errors can be communicated back to the client more clearly

Overall auto create is not my favorite feature, since topic creation is a
highly critical piece for Kafka, and with authorization added it becomes
even more involved. When creating a topic a user needs:

   - The access to create topics
   - To set the correct partition count and replication factor for their
   use case
   - To set who has access to the topic
   - Knowledge of how a new topic may impact regex consumers or mirrormaker

Often I find use cases that look like they need auto topic creation, can
often be handled with a few pre made topics. That said, we still should
support the feature for the cases that need it (mirrormaker, streams).

The question is how we should expose auto topic creation in the client. A
few options are:

   - Add configs like the broker configs today, and let the client
   automatically create the topics if enabled
  - Both producer and consumer?
   - Throw an error to the user and let them use a separate AdminClient
   (KIP-4) api to create the topic
   - Throw an error to the user and add a create api to the producer so
   they can easily handle by creating a topic

I am leaning towards the last 2 options but wanted to get some others
thoughts on the matter. Especially if you have use cases that use auto
topic creation today.

Thanks,
Grant

-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [VOTE] KIP-4 Delete Topics Schema

2016-06-28 Thread Grant Henke
Thanks to all who voted. The KIP-4 Delete Topics changes passed with +4
(binding), and +3 (non-binding).

There is a branch based off the create topics PR available here:
https://github.com/granthenke/kafka/tree/delete-wire-new
I will open a PR once the create topics patch is in:
https://github.com/apache/kafka/pull/1489

On Fri, Jun 24, 2016 at 5:43 PM, Gwen Shapira <g...@confluent.io> wrote:

> +1
>
> On Thu, Jun 23, 2016 at 8:32 PM, Grant Henke <ghe...@cloudera.com> wrote:
> > I would like to initiate the voting process for the "KIP-4 Delete Topics
> > Schema changes". This is not a vote for all of KIP-4, but specifically
> for
> > the delete topics changes. I have included the exact changes below for
> > clarity:
> >>
> >> Delete Topics Request (KAFKA-2946
> >> <https://issues.apache.org/jira/browse/KAFKA-2946>)
> >>
> >> DeleteTopics Request (Version: 0) => [topics] timeout
> >>   topics => STRING
> >>   timeout => INT32
> >>
> >> DeleteTopicsRequest is a batch request to initiate topic deletion.
> >>
> >> Request semantics:
> >>
> >>1. Must be sent to the controller broker
> >>2. If there are multiple instructions for the same topic in one
> >>request the extra request will be ingnored
> >>   - This is because the list of topics is modeled server side as a
> set
> >>   - Multiple deletes results in the same end goal, so handling this
> >>   error for the user should be okay
> >>3. When requesting to delete a topic that does not exist, a an
> >>InvalidTopic error will be returned for that topic.
> >>4. When requesting to delete a topic that is already marked for
> >>deletion, the request will wait up to the timeout until the delete is
> >>"complete" and return as usual.
> >>   - This is to avoid errors due to concurrent delete requests. The
> >>   end result is the same, the topic is deleted.
> >>5. The principal must be authorized to the "Delete" Operation on the
> >>"Topic" resource to delete the topic.
> >>   - Unauthorized requests will receive a TopicAuthorizationException
> >>   if they are authorized to the "Describe" Operation on the "Topic"
> resource
> >>   - Otherwise they will receive an InvalidTopicException as if the
> >>   topic does not exist.
> >>   6. Setting a timeout > 0 will allow the request to block until the
> >>delete is "complete" on the controller node.
> >>   - Complete means the local topic metadata cache no longer contains
> >>   the topic
> >>  - The topic metadata is updated when the controller sends out
> >>  update metadata requests to the brokers
> >>   - If a timeout error occurs, the topic could still be deleted
> >>   successfully at a later time. Its up to the client to query for
> the state
> >>   at that point.
> >>7. Setting a timeout <= 0 will validate arguments and trigger the
> >>delete topics and return immediately.
> >>   - This is essentially the fully asynchronous mode we have in the
> >>   Zookeeper tools today.
> >>   - The error code in the response will either contain an argument
> >>   validation exception or a timeout exception. If you receive a
> timeout
> >>   exception, because you asked for 0 timeout, you can assume the
> message was
> >>   valid and the topic deletion was triggered.
> >>8. The request is not transactional.
> >>   1. If an error occurs on one topic, the others could still be
> >>   deleted.
> >>   2. Errors are reported independently.
> >>
> >> QA:
> >>
> >>- Why is DeleteTopicsRequest a batch request?
> >>   - Scenarios where tools or admins want to delete many topics
> should
> >>   be able to with fewer requests
> >>   - Example: Removing all cluster topics
> >>- What happens if some topics error immediately? Will it
> >>return immediately?
> >>   - The request will block until all topics have either been
> deleted,
> >>   errors, or the timeout has been hit
> >>   - There is no "short circuiting" where 1 error stops the other
> >>   topics from being deleted
> >>- Why have a timeout at all? Deletes could take a while?
> >>   - True some deletes may tak

Re: [DISCUSS] KAFKA-3761: Controller has RunningAsBroker instead of RunningAsController state

2016-06-23 Thread Grant Henke
+1 The Metadata response now has the controller broker in it too
leveraging KafkaController.isActive.
Clients can use that to identify the controller.

On Thu, Jun 23, 2016 at 7:17 PM, Gwen Shapira <g...@confluent.io> wrote:

> +1 from me too
>
> On Thu, Jun 23, 2016 at 3:06 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> > +1 from me.
> >
> > Ismael
> >
> > On Thu, Jun 23, 2016 at 11:57 PM, Roger Hoover <roger.hoo...@gmail.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> Does anyone have an issue with removing the broker state called
> >> "RunningAsController"?
> >>
> >> The reasons to remove it are:
> >> 1. It's currently broken.  The purpose of the JIRA
> >> <https://issues.apache.org/jira/browse/KAFKA-3761> was to report that
> the
> >> RunningAsController state gets overwritten back to "RunningAsBroker".
> >> 2. It's not a useful state.
> >>   a. If clients want to use this metric to know whether a broker is
> ready
> >> to receive requests or not, they do not care whether or not the broker
> is
> >> the controller
> >>   b. there is already a separate boolean property,
> KafkaController.isActive
> >> which contains this information.
> >>
> >> Thanks,
> >>
> >> Roger
> >>
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[VOTE] KIP-4 Delete Topics Schema

2016-06-23 Thread Grant Henke
play/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-request>
>below
>
> Delete Topics Response
>
> DeleteTopics Response (Version: 0) => [topic_error_codes]
>   topic_error_codes => topic error_code
> topic => STRING
> error_code => INT16
>
> DeleteTopicsResponse contains a map between topic and topic creation
> result error code (see New Protocol Errors
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-NewProtocolErrors>
> ).
>
> Response semantics:
>
>1. When a request hits the timeout, the topics that are not "complete"
>will have the TimeoutException error code.
>   - The topics that did complete successfully with have no error.
>
>
The KIP is available here for reference (linked to the Create Topics schema
section):
*https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-DeleteTopicsRequest(KAFKA-2946)
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-DeleteTopicsRequest(KAFKA-2946)>*

A sample patch is on github:
https://github.com/granthenke/kafka/tree/delete-wire-new
Note: This branch and patch is based on the CreateTopic request/response
PR. I will open a PR once that patch is complete.

Here is a link to the past discussion on the mailing list:

*http://search-hadoop.com/m/uyzND1fokOBn6uNd2=+DISCUSS+KIP+4+Delete+Topic+Schema
<http://search-hadoop.com/m/uyzND1fokOBn6uNd2=+DISCUSS+KIP+4+Delete+Topic+Schema>*

Thank you,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [VOTE] KIP-4 Create Topics Schema

2016-06-23 Thread Grant Henke
Thanks to all who voted. The KIP-4 Create Topics changes passed with +4
(binding), and +4 (non-binding).

There is a patch available for review here:
https://github.com/apache/kafka/pull/1489

On Tue, Jun 21, 2016 at 1:11 AM, Manikumar Reddy <manikumar.re...@gmail.com>
wrote:

> +1 (non-binding)
>
> On Tue, Jun 21, 2016 at 9:16 AM, Ewen Cheslack-Postava <e...@confluent.io>
> wrote:
>
> > +1 (binding) and thanks for the work on this Grant!
> >
> > -Ewen
> >
> > On Mon, Jun 20, 2016 at 12:18 PM, Gwen Shapira <g...@confluent.io>
> wrote:
> >
> > > +1 (binding)
> > >
> > > On Mon, Jun 20, 2016 at 12:13 PM, Tom Crayford <tcrayf...@heroku.com>
> > > wrote:
> > > > +1 (non-binding)
> > > >
> > > > On Mon, Jun 20, 2016 at 8:07 PM, Harsha <ka...@harsha.io> wrote:
> > > >
> > > >> +1 (binding)
> > > >> -Harsha
> > > >>
> > > >> On Mon, Jun 20, 2016, at 11:33 AM, Ismael Juma wrote:
> > > >> > +1 (binding)
> > > >> >
> > > >> > On Mon, Jun 20, 2016 at 8:27 PM, Dana Powers <
> dana.pow...@gmail.com
> > >
> > > >> > wrote:
> > > >> >
> > > >> > > +1 -- thanks for the update
> > > >> > >
> > > >> > > On Mon, Jun 20, 2016 at 10:49 AM, Grant Henke <
> > ghe...@cloudera.com>
> > > >> wrote:
> > > >> > > > I have update the patch and wiki based on the feedback in the
> > > >> discussion
> > > >> > > > thread. The only change is that instead of logging and
> > > disconnecting
> > > >> in
> > > >> > > the
> > > >> > > > case of invalid messages (duplicate topics or both arguments)
> we
> > > now
> > > >> > > return
> > > >> > > > and InvalidRequest error back to the client for that topic.
> > > >> > > >
> > > >> > > > I would like to restart the vote now including that change. If
> > you
> > > >> have
> > > >> > > > already voted, please revote in this thread.
> > > >> > > >
> > > >> > > > Thank you,
> > > >> > > > Grant
> > > >> > > >
> > > >> > > > On Sun, Jun 19, 2016 at 8:57 PM, Ewen Cheslack-Postava <
> > > >> > > e...@confluent.io>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > >> Don't necessarily want to add noise here, but I'm -1 based on
> > the
> > > >> > > >> disconnect part. See discussion in other thread. (I'm +1
> > > otherwise,
> > > >> and
> > > >> > > >> happy to have my vote applied assuming we clean up that one
> > > issue.)
> > > >> > > >>
> > > >> > > >> -Ewen
> > > >> > > >>
> > > >> > > >> On Thu, Jun 16, 2016 at 6:05 PM, Harsha <ka...@harsha.io>
> > wrote:
> > > >> > > >>
> > > >> > > >> > +1 (binding)
> > > >> > > >> > Thanks,
> > > >> > > >> > Harsha
> > > >> > > >> >
> > > >> > > >> > On Thu, Jun 16, 2016, at 04:15 PM, Guozhang Wang wrote:
> > > >> > > >> > > +1.
> > > >> > > >> > >
> > > >> > > >> > > On Thu, Jun 16, 2016 at 3:47 PM, Ismael Juma <
> > > ism...@juma.me.uk
> > > >> >
> > > >> > > >> wrote:
> > > >> > > >> > >
> > > >> > > >> > > > +1 (binding)
> > > >> > > >> > > >
> > > >> > > >> > > > On Thu, Jun 16, 2016 at 11:50 PM, Grant Henke <
> > > >> > > ghe...@cloudera.com>
> > > >> > > >> > wrote:
> > > >> > > >> > > >
> > > >> > > >> > > > > I would like to initiate the voting process for the
> > > "KIP-4
> > > >> > > Create
> > > >> > > >> > Topics
> > > >> > > >> > > > &g

Re: [DISCUSS] KIP-4 Delete Topic Schema

2016-06-23 Thread Grant Henke
Hi Guozhang,

Thanks for the review!

In the timeout <= 0 case, if the client should always ignore and treat
> the timeout
> error code as "OK", should we just return no error code in this case?


The wiki behavior documentation was there, but a question was never brought
up for create or delete. I treat a 0ms timeout the same as 1ms timeout, and
that generally means that all valid topics will have a timeout exception. I
am not sure having special behavior for <= 0 is more intuitive. I am not
sure the client would always want to ignore the exception, it may want to
be relayed to the user. Regardless handling on the client side should be
fairly straight forward either way.

I could be convinced otherwise, but I am leaning towards leaving it as is.

Thanks,
Grant




On Tue, Jun 21, 2016 at 6:08 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> I see we have the similar setting for CreateTopic request timeout <= 0 as
> well, so maybe it has been discussed and I simply overlooked.. otherwise my
> question is for both of these cases.
>
> Guozhang
>
> On Tue, Jun 21, 2016 at 4:07 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Thanks Grant, looks good to me overall. One minor comment below:
> >
> > >   - The error code in the response will either contain an argument
> > >   validation exception or a timeout exception. If you receive a
> > timeout
> > >   exception, because you asked for 0 timeout, you can assume the
> > message was
> > >   valid and the topic deletion was triggered.
> >
> > In the timeout <= 0 case, if the client should always ignore and treat
> the timeout
> > error code as "OK", should we just return no error code in this case?
> >
> >
> > Guozhang
> >
> >
> > On Tue, Jun 21, 2016 at 8:17 AM, Grant Henke <ghe...@cloudera.com>
> wrote:
> >
> >> Hi Ismael,
> >>
> >> Thanks for the review. See my responses below.
> >>
> >> One potential issue is that the number of topics in the response won't
> >> > match the number of topics in the request. Worth checking what others
> >> think
> >> > of this one.
> >>
> >>
> >> Based on the choice of how we handled duplicate topics in the create
> >> topics
> >> protocol, I took the same approach here. At one point create topics
> would
> >> disconnect because I could't return an error per topic request. In the
> end
> >> the preference was to return and error code even though the cardinality
> >> would be different.
> >>
> >> 4. When requesting to delete a topic that is already marked for
> >> > > >deletion, the request will wait for the wait for the timeout
> and
> >> > > return as
> >> > > >usual
> >> >
> >> > Do you mean that it will wait up to the timeout until the delete is
> >> > "complete" as per the definition in `6`? Or will it wait
> unconditionally
> >> > until the timeout expires? It would be good to make that clear.
> >> >
> >>
> >> That's exactly what I meant. I updated the wiki.
> >>
> >> This could leak topic name information (as per KAFKA-3396, which was
> filed
> >> > by you). We would probably want to return `InvalidTopic` for the case
> >> where
> >> > the user doesn't have a valid `DESCRIBE TOPIC` ACL, right?
> >>
> >>
> >> Good point. I will update the wiki and patch.
> >>
> >> Unauthorized requests will receive a TopicAuthorizationException if they
> >> are authorized to the "Describe" Operation on the "Topic" resource.
> >> Otherwise they will receive an InvalidTopicException as if the topic
> does
> >> not exist.
> >>
> >> I was wondering (and this applies to the create topic as well), is there
> >> > any value in a flag that says whether the timeout expired or not?
> >>
> >>
> >> In both the create and delete response we return a TimeoutException
> error
> >> code for the topics that did not "complete" in time. This allows the
> >> client
> >> to know which topics actions completed and which timed out. I updated
> the
> >> wiki to explicitly call this out in the response section.
> >>
> >> Thanks,
> >> Grant
> >>
> >> On Tue, Jun 21, 2016 at 5:44 AM, Ismael Juma <ism...@juma.me.uk> wrote:
> >>
> >> > Thanks Grant. A few comments inline.
> >> >

Re: [DISCUSS] KIP-4 Delete Topic Schema

2016-06-21 Thread Grant Henke
Hi Ismael,

Thanks for the review. See my responses below.

One potential issue is that the number of topics in the response won't
> match the number of topics in the request. Worth checking what others think
> of this one.


Based on the choice of how we handled duplicate topics in the create topics
protocol, I took the same approach here. At one point create topics would
disconnect because I could't return an error per topic request. In the end
the preference was to return and error code even though the cardinality
would be different.

4. When requesting to delete a topic that is already marked for
> > >deletion, the request will wait for the wait for the timeout and
> > return as
> > >usual
>
> Do you mean that it will wait up to the timeout until the delete is
> "complete" as per the definition in `6`? Or will it wait unconditionally
> until the timeout expires? It would be good to make that clear.
>

That's exactly what I meant. I updated the wiki.

This could leak topic name information (as per KAFKA-3396, which was filed
> by you). We would probably want to return `InvalidTopic` for the case where
> the user doesn't have a valid `DESCRIBE TOPIC` ACL, right?


Good point. I will update the wiki and patch.

Unauthorized requests will receive a TopicAuthorizationException if they
are authorized to the "Describe" Operation on the "Topic" resource.
Otherwise they will receive an InvalidTopicException as if the topic does
not exist.

I was wondering (and this applies to the create topic as well), is there
> any value in a flag that says whether the timeout expired or not?


In both the create and delete response we return a TimeoutException error
code for the topics that did not "complete" in time. This allows the client
to know which topics actions completed and which timed out. I updated the
wiki to explicitly call this out in the response section.

Thanks,
Grant

On Tue, Jun 21, 2016 at 5:44 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Thanks Grant. A few comments inline.
>
> On Mon, Jun 20, 2016 at 9:09 PM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > >2. If there are multiple instructions for the same topic in one
> > >request the extra request will be ignored
> > >   - This is because the list of topics is modeled server side as a
> > set
> > >   - Multiple deletes results in the same end goal, so handling this
> > >   error for the user should be okay
> >
>
> One potential issue is that the number of topics in the response won't
> match the number of topics in the request. Worth checking what others think
> of this one.
>
> 4. When requesting to delete a topic that is already marked for
> > >deletion, the request will wait for the wait for the timeout and
> > return as
> > >usual
>
>
> Do you mean that it will wait up to the timeout until the delete is
> "complete" as per the definition in `6`? Or will it wait unconditionally
> until the timeout expires? It would be good to make that clear.
>
>
> > >5. The principal must be authorized to the "Delete" Operation on the
> >
> >"Topic" resource to delete the topic.
> > >   - Unauthorized requests will receive a
> TopicAuthorizationException
> >
>
> This could leak topic name information (as per KAFKA-3396, which was filed
> by you). We would probably want to return `InvalidTopic` for the case where
> the user doesn't have a valid `DESCRIBE TOPIC` ACL, right?
>
>
> > >- Why have a timeout at all? Deletes could take a while?
> >
>
> I was wondering (and this applies to the create topic as well), is there
> any value in a flag that says whether the timeout expired or not?
>
> Thanks,
> Ismael
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[DISCUSS] KIP-4 Delete Topic Schema

2016-06-20 Thread Grant Henke
-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-request>
>below
>
> Delete Topics Response
>
>
>
> DeleteTopics Response (Version: 0) => [topic_error_codes]
>   topic_error_codes => topic error_code
> topic => STRING
>     error_code => INT16
>
> DeleteTopicsResponse is similar to CreateTopicsResponse.
>

A sample patch is on github (
https://github.com/granthenke/kafka/tree/delete-wire-new) though it could
change drastically based on the feedback here.
Note: This branch and patch is based on the CreateTopic request/response
PR. I will open a PR once that patch is complete.

Thank you,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [VOTE] KIP-4 Create Topics Schema

2016-06-20 Thread Grant Henke
I have update the patch and wiki based on the feedback in the discussion
thread. The only change is that instead of logging and disconnecting in the
case of invalid messages (duplicate topics or both arguments) we now return
and InvalidRequest error back to the client for that topic.

I would like to restart the vote now including that change. If you have
already voted, please revote in this thread.

Thank you,
Grant

On Sun, Jun 19, 2016 at 8:57 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> Don't necessarily want to add noise here, but I'm -1 based on the
> disconnect part. See discussion in other thread. (I'm +1 otherwise, and
> happy to have my vote applied assuming we clean up that one issue.)
>
> -Ewen
>
> On Thu, Jun 16, 2016 at 6:05 PM, Harsha <ka...@harsha.io> wrote:
>
> > +1 (binding)
> > Thanks,
> > Harsha
> >
> > On Thu, Jun 16, 2016, at 04:15 PM, Guozhang Wang wrote:
> > > +1.
> > >
> > > On Thu, Jun 16, 2016 at 3:47 PM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Thu, Jun 16, 2016 at 11:50 PM, Grant Henke <ghe...@cloudera.com>
> > wrote:
> > > >
> > > > > I would like to initiate the voting process for the "KIP-4 Create
> > Topics
> > > > > Schema changes". This is not a vote for all of KIP-4, but
> > specifically
> > > > for
> > > > > the create topics changes. I have included the exact changes below
> > for
> > > > > clarity:
> > > > > >
> > > > > > Create Topics Request (KAFKA-2945
> > > > > > <https://issues.apache.org/jira/browse/KAFKA-2945>)
> > > > > >
> > > > > > CreateTopics Request (Version: 0) => [create_topic_requests]
> > timeout
> > > > > >   create_topic_requests => topic num_partitions
> replication_factor
> > > > > [replica_assignment] [configs]
> > > > > > topic => STRING
> > > > > > num_partitions => INT32
> > > > > > replication_factor => INT16
> > > > > > replica_assignment => partition_id [replicas]
> > > > > >   partition_id => INT32
> > > > > >   replicas => INT32
> > > > > > configs => config_key config_value
> > > > > >   config_key => STRING
> > > > > >   config_value => STRING
> > > > > >   timeout => INT32
> > > > > >
> > > > > > CreateTopicsRequest is a batch request to initiate topic creation
> > with
> > > > > > either predefined or automatic replica assignment and optionally
> > topic
> > > > > > configuration.
> > > > > >
> > > > > > Request semantics:
> > > > > >
> > > > > >1. Must be sent to the controller broker
> > > > > >2. If there are multiple instructions for the same topic in
> one
> > > > > >request an InvalidRequestException will be logged on the
> broker
> > and
> > > > > the
> > > > > >client will be disconnected.
> > > > > >   - This is because the list of topics is modeled server side
> > as a
> > > > > >   map with TopicName as the key
> > > > > >3. The principal must be authorized to the "Create" Operation
> > on the
> > > > > >"Cluster" resource to create topics.
> > > > > >   - Unauthorized requests will receive a
> > > > > ClusterAuthorizationException
> > > > > >4.
> > > > > >
> > > > > >Only one from ReplicaAssignment or (num_partitions +
> > > > > replication_factor
> > > > > >), can be defined in one instruction.
> > > > > >- If both parameters are specified an InvalidRequestException
> > will
> > > > be
> > > > > >   logged on the broker and the client will be disconnected.
> > > > > >   - In the case ReplicaAssignment is defined number of
> > partitions
> > > > and
> > > > > >   replicas will be calculated from the supplied
> > replica_assignment.
> > > > > >   - In the case of defined (num_partitions +
> > replication_factor)
> > > > > >   replica assignm

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-19 Thread Grant Henke
Apologies for the delay in response here.

It will take a bit of tracking inside the request object to track this
error and then handle it in KafkaApis, but it is possible. I am happy to
take that preferred approach. I will update the wiki & patch to handle this
scenario and re-initiate the vote tomorrow.

Thanks,
Grant

On Sun, Jun 19, 2016 at 8:59 PM, Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> I'm on the same page as Jun & Dana wrt disconnecting. Closing a connection
> should really be a last resort because we can no longer trust correct
> behavior in this session. In this case, we detect a bad request, but
> there's no reason to believe it will affect subsequent requests. There are
> dependencies to be sure, and if the client doesn't check errors, they may
> try to then write to topics that don't exist or something along those
> lines, but those requests can also be failed without killing the underlying
> TCP connection.
>
> -Ewen
>
> On Fri, Jun 17, 2016 at 1:46 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Grant,
> >
> > I think Dana has a valid point. Currently, we throw an
> > InvalidRequestException and close the connection only when the broker
> can't
> > deserialize the bytes into a request. In this case, the deserialization
> is
> > fine. It just that there are some additional constraints that can't be
> > specified at the protocol level. We can potentially just remember the
> > topics that violated those constraints in the request and handle them
> > accordingly with the right error code w/o disconnecting.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jun 17, 2016 at 8:40 AM, Dana Powers <dana.pow...@gmail.com>
> > wrote:
> >
> > > I'm unconvinced (crazy, right?). Comments below:
> > >
> > > On Fri, Jun 17, 2016 at 7:27 AM, Grant Henke <ghe...@cloudera.com>
> > wrote:
> > > > Hi Dana,
> > > >
> > > > You mentioned one of the reasons I error and disconnect. Because I
> > can't
> > > > return an error for every request so the cardinality between request
> > and
> > > > response would be different. Beyond that though, I am handling this
> > > > protocol rule/parsing error the same way all other messages do.
> > >
> > > But you can return an error for every topic, and isn't that the level
> > > of error required here?
> > >
> > > > CreateTopic Response (Version: 0) => [topic_error_codes]
> > > >   topic_error_codes => topic error_code
> > > > topic => STRING
> > > > error_code => INT16
> > >
> > > If I submit duplicate requests for a topic, it's an error isolated to
> > > that topic. If I mess up the partition / replication / etc semantics
> > > for a topic, that's an error isolated to that topic. Is there a
> > > cardinality problem at this level?
> > >
> > >
> > > >
> > > > Parsing is handled in the RequestChannel and any exception that
> occurs
> > > > during this phase is caught, converted into an
> InvalidRequestException
> > > and
> > > > results in a disconnect:
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L92-L95
> > > >
> > > > Since this is an error that could only occur (and would always occur)
> > due
> > > > to incorrect client implementations, and not because of any cluster
> > state
> > > > or unusual situation, I felt this behavior was okay and made sense.
> For
> > > > client developers the broker logging should make it obvious what the
> > > issue
> > > > is. My patch also clearly documents the protocol rules in the
> Protocol
> > > > definition.
> > >
> > > Documentation is great and definitely a must. But requiring client
> > > developers to dig through server logs is not ideal. Client developers
> > > don't always have direct access to those logs. And even if they do,
> > > the brokers may have other traffic, which makes it difficult to track
> > > down the exact point in the logs where the error occurred.
> > >
> > > As discussed above, I don't think you need to or should model this as
> > > a request-level parsing error. It may be easier for the current broker
> > > implementation to do that and just crash the connection, but I don't
> > > think it makes that much sense from a raw api perspective.
> > >
> > > > In 

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-17 Thread Grant Henke
Hi Dana,

You mentioned one of the reasons I error and disconnect. Because I can't
return an error for every request so the cardinality between request and
response would be different. Beyond that though, I am handling this
protocol rule/parsing error the same way all other messages do.

Parsing is handled in the RequestChannel and any exception that occurs
during this phase is caught, converted into an InvalidRequestException and
results in a disconnect:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L92-L95

Since this is an error that could only occur (and would always occur) due
to incorrect client implementations, and not because of any cluster state
or unusual situation, I felt this behavior was okay and made sense. For
client developers the broker logging should make it obvious what the issue
is. My patch also clearly documents the protocol rules in the Protocol
definition.

In the future having a response header with an error code (and optimally
error message) for every response would help solve this problem (for all
message types).

Thanks,
Grant


On Fri, Jun 17, 2016 at 12:04 AM, Dana Powers <dana.pow...@gmail.com> wrote:

> Why disconnect the client on a InvalidRequestException? The 2 errors
> you are catching are both topic-level: (1) multiple requests for the
> same topic, and (2) ReplicaAssignment and num_partitions /
> replication_factor both set. Wouldn't it be better to just error the
> offending create_topic_request, not the entire connection? The
> CreateTopicsResponse returns a map of topics to error codes. You could
> just return the topic that caused the error and an
> InvalidRequestException error code.
>
> -Dana
>
> On Thu, Jun 16, 2016 at 8:37 AM, Grant Henke <ghe...@cloudera.com> wrote:
> > I have updated the wiki and pull request based on the feedback. If there
> > are no objections I will start a vote at the end of the day.
> >
> > Details for this implementation can be read here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >
> > The updated pull request can be found here (feel free to review):
> > https://github.com/apache/kafka/pull/1489
> >
> > Below is the exact content for clarity:
> >
> >> Create Topics Request (KAFKA-2945
> >> <https://issues.apache.org/jira/browse/KAFKA-2945>)
> >>
> >>
> >>
> >> CreateTopics Request (Version: 0) => [create_topic_requests] timeout
> >>   create_topic_requests => topic num_partitions replication_factor
> [replica_assignment] [configs]
> >> topic => STRING
> >> num_partitions => INT32
> >> replication_factor => INT16
> >> replica_assignment => partition_id [replicas]
> >>   partition_id => INT32
> >>   replicas => INT32
> >> configs => config_key config_value
> >>   config_key => STRING
> >>   config_value => STRING
> >>   timeout => INT32
> >>
> >> CreateTopicsRequest is a batch request to initiate topic creation with
> >> either predefined or automatic replica assignment and optionally topic
> >> configuration.
> >>
> >> Request semantics:
> >>
> >>1. Must be sent to the controller broker
> >>2. If there are multiple instructions for the same topic in one
> >>request an InvalidRequestException will be logged on the broker and
> the
> >>client will be disconnected.
> >>   - This is because the list of topics is modeled server side as a
> >>   map with TopicName as the key
> >>3. The principal must be authorized to the "Create" Operation on the
> >>"Cluster" resource to create topics.
> >>   - Unauthorized requests will receive a
> ClusterAuthorizationException
> >>4.
> >>
> >>Only one from ReplicaAssignment or (num_partitions +
> replication_factor
> >>), can be defined in one instruction.
> >>- If both parameters are specified an InvalidRequestException will be
> >>   logged on the broker and the client will be disconnected.
> >>   - In the case ReplicaAssignment is defined number of partitions
> and
> >>   replicas will be calculated from the supplied replica_assignment.
> >>   - In the case of defined (num_partitions + replication_factor)
> >>   replica assignment will be automatically generated by the server.
> >>   - One or the other must be d

Re: [VOTE] KIP-62: Allow consumer to send heartbeats from a background thread

2016-06-16 Thread Grant Henke
+1

On Thu, Jun 16, 2016 at 8:50 PM, tao xiao <xiaotao...@gmail.com> wrote:

> +1
>
> On Fri, 17 Jun 2016 at 09:03 Harsha <ka...@harsha.io> wrote:
>
> > +1 (binding)
> > Thanks,
> > Harsha
> >
> > On Thu, Jun 16, 2016, at 05:46 PM, Henry Cai wrote:
> > > +1
> > >
> > > On Thu, Jun 16, 2016 at 3:46 PM, Ismael Juma <ism...@juma.me.uk>
> wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Fri, Jun 17, 2016 at 12:44 AM, Guozhang Wang <wangg...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1.
> > > > >
> > > > > On Thu, Jun 16, 2016 at 11:44 AM, Jason Gustafson <
> > ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I'd like to open the vote for KIP-62. This proposal attempts to
> > address
> > > > > one
> > > > > > of the recurring usability problems that users of the new
> consumer
> > have
> > > > > > faced with as little impact as possible. You can read the full
> > details
> > > > > > here:
> > > > > >
> > > > > >
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> > > > > > .
> > > > > >
> > > > > > After some discussion on this list, I think we were in agreement
> > that
> > > > > this
> > > > > > change addresses a major part of the problem and we've left the
> > door
> > > > open
> > > > > > for further improvements, such as adding a heartbeat() API or a
> > > > > separately
> > > > > > configured rebalance timeout. Thanks in advance to everyone who
> > helped
> > > > > > review the proposal.
> > > > > >
> > > > > > -Jason
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[VOTE] KIP-4 Create Topics Schema

2016-06-16 Thread Grant Henke
zedadministrativeoperations-cluster-consistent-blocking>
>below
>- Why require the request to go to the controller?
>   - The controller is responsible for the cluster metadata and its
>   propagation
>   - See Request Forwarding
>   
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-request>
>below
>
> Create Topics Response
>
>
>
> CreateTopics Response (Version: 0) => [topic_error_codes]
>   topic_error_codes => topic error_code
> topic => STRING
> error_code => INT16
>
> CreateTopicsResponse contains a map between topic and topic creation
> result error code (see New Protocol Errors
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-NewProtocolErrors>
> ).
>

The KIP is available here for reference (linked to the Create Topics schema
section):
*https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicsRequest(KAFKA-2945)
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicsRequest(KAFKA-2945)>*

A pull request is available implementing the proposed changes here:
https://github.com/apache/kafka/pull/1489

Here is a link to the past discussion on the mailing list:
*http://search-hadoop.com/m/uyzND1rfG6v1oixmZ=+DISCUSS+KIP+4+Create+Topic+Schema
<http://search-hadoop.com/m/uyzND1rfG6v1oixmZ=+DISCUSS+KIP+4+Create+Topic+Schema>*

Thank you,
Grant
-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] Java 8 as a minimum requirement

2016-06-16 Thread Grant Henke
> > > > > most phones will use that version sadly). This reduces (but does
> not
> > > > > eliminate) the chance that we would be the first project that would
> > > > cause a
> > > > > user to consider a Java upgrade.
> > > > >
> > > > > The main argument for not making the change is that a reasonable
> > number
> > > > of
> > > > > users may still be using Java 7 by the time Kafka 0.10.1.0 is
> > released.
> > > > > More specifically, we care about the subset who would be able to
> > > upgrade
> > > > to
> > > > > Kafka 0.10.1.0, but would not be able to upgrade the Java version.
> It
> > > > would
> > > > > be great if we could quantify this in some way.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Ismael
> > > > >
> > > > > [1] https://java.com/en/download/faq/java_7.xml
> > > > > [2]
> > https://blogs.oracle.com/thejavatutorials/entry/jdk_8_is_released
> > > > > [3] http://openjdk.java.net/projects/jdk9/
> > > > > [4] https://github.com/apache/cassandra/blob/trunk/README.asc
> > > > > [5]
> > > https://lucene.apache.org/#highlights-of-this-lucene-release-include
> > > > > [6] http://akka.io/news/2015/09/30/akka-2.4.0-released.html
> > > > > [7] https://issues.apache.org/jira/browse/HADOOP-11858
> > > > > [8] https://webtide.com/jetty-9-3-features/
> > > > > [9] http://markmail.org/message/l7s276y3xkga2eqf
> > > > > [10]
> > > > >
> > > > >
> > > >
> > >
> >
> https://intellij-support.jetbrains.com/hc/en-us/articles/206544879-Selecting-the-JDK-version-the-IDE-will-run-under
> > > > > [11] http://markmail.org/message/l7s276y3xkga2eqf
> > > > >
> > > >
> > >
> >
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-16 Thread Grant Henke
ing" instead of fully async or fully
>consistent?
>   - See Cluster Consistent Blocking
>   
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-cluster-consistent-blocking>
>below
>- Why require the request to go to the controller?
>   - The controller is responsible for the cluster metadata and
>   its propagation
>   - See Request Forwarding
>   
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-request>
>below
>
> Create Topics Response
>
>
>
> CreateTopics Response (Version: 0) => [topic_error_codes]
>   topic_error_codes => topic error_code
> topic => STRING
> error_code => INT16
>
> CreateTopicsResponse contains a map between topic and topic creation
> result error code (see New Protocol Errors
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-NewProtocolErrors>
> ).
>
>
Thank you,
Grant


On Wed, Jun 15, 2016 at 4:11 PM, Grant Henke <ghe...@cloudera.com> wrote:

> Turns out we already have an InvalidRequestException:
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L75-L98
>
> We just don't map it in Errors.java so it results in an UNKNOWN error when
> sent back to the client.
>
> I will migrate the InvalidRequestException to the client package, add it
> to Errors and use that to signify any protocol parsing/rule errors.
>
>
>
> On Wed, Jun 15, 2016 at 2:55 PM, Dana Powers <dana.pow...@gmail.com>
> wrote:
>
>> On Wed, Jun 15, 2016 at 12:19 PM, Ismael Juma <ism...@juma.me.uk> wrote:
>> > Hi Grant,
>> >
>> > Comments below.
>> >
>> > On Wed, Jun 15, 2016 at 6:52 PM, Grant Henke <ghe...@cloudera.com>
>> wrote:
>> >>
>> >> The one thing I want to avoid is to many super specific error codes. I
>> am
>> >> not sure how much of a problem it really is but in the case of wire
>> >> protocol errors like multiple instances of the same topic, do you have
>> any
>> >> thoughts on the error? Should we make a generic InvalidRequest error
>> and
>> >> log the detailed message on the broker for client authors to debug?
>> >>
>> >
>> > That is a good question. It would be good to get input from client
>> > developers like Dana on this.
>>
>> I think generic error codes are fine if the wire protocol requirements
>> are documented [i.e., no duplicate topics and partitions/replicas are
>> either/or not both]. If I get a broker error at the protocol level
>> that I don't understand, the first place I look is the protocol docs.
>> It may cause a few more emails to the mailing lists asking for
>> clarification, but I think those will be easier to triage than
>> confused emails like "I said create topic with 10 partitions, but I
>> only got 5???"
>>
>> -Dana
>>
>
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-15 Thread Grant Henke
Turns out we already have an InvalidRequestException:
https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/network/RequestChannel.scala#L75-L98

We just don't map it in Errors.java so it results in an UNKNOWN error when
sent back to the client.

I will migrate the InvalidRequestException to the client package, add it to
Errors and use that to signify any protocol parsing/rule errors.



On Wed, Jun 15, 2016 at 2:55 PM, Dana Powers <dana.pow...@gmail.com> wrote:

> On Wed, Jun 15, 2016 at 12:19 PM, Ismael Juma <ism...@juma.me.uk> wrote:
> > Hi Grant,
> >
> > Comments below.
> >
> > On Wed, Jun 15, 2016 at 6:52 PM, Grant Henke <ghe...@cloudera.com>
> wrote:
> >>
> >> The one thing I want to avoid is to many super specific error codes. I
> am
> >> not sure how much of a problem it really is but in the case of wire
> >> protocol errors like multiple instances of the same topic, do you have
> any
> >> thoughts on the error? Should we make a generic InvalidRequest error and
> >> log the detailed message on the broker for client authors to debug?
> >>
> >
> > That is a good question. It would be good to get input from client
> > developers like Dana on this.
>
> I think generic error codes are fine if the wire protocol requirements
> are documented [i.e., no duplicate topics and partitions/replicas are
> either/or not both]. If I get a broker error at the protocol level
> that I don't understand, the first place I look is the protocol docs.
> It may cause a few more emails to the mailing lists asking for
> clarification, but I think those will be easier to triage than
> confused emails like "I said create topic with 10 partitions, but I
> only got 5???"
>
> -Dana
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Commented] (KAFKA-3818) Change Mirror Maker default assignment strategy to round robin

2016-06-15 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332524#comment-15332524
 ] 

Grant Henke commented on KAFKA-3818:


A few older threads mention that its possible to get clumping (due to the hash 
on the RoundRobinAssignor). Does that problem still exist? Is that something we 
should fix before changing the default?

This thread discusses it recently: 
http://search-hadoop.com/m/uyzND135BcA1lXiM=Re+DISCUSS+KIP+49+Fair+Partition+Assignment+Strategy
{quote}
 - WRT roundrobin we later realized a significant flaw in the way we lay
   out partitions: we originally wanted to randomize the partition layout to
   reduce the likelihood of most partitions of the same topic from ending up
   on a given consumer which is important if you have a few very large topics.
   Unfortunately we used hashCode - which does a splendid job of clumping
   partitions from the same topic together :( We can probably just "fix" that
   in the new consumer's roundrobin assignor.
{quote}

And this older jira looks to describe the issue [~jjkoshy] is referring to: 
KAFKA-2019

[~jjkoshy] do you have any thoughts?


> Change Mirror Maker default assignment strategy to round robin
> --
>
> Key: KAFKA-3818
> URL: https://issues.apache.org/jira/browse/KAFKA-3818
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Vahid Hashemian
>
> It might make more sense to use round robin assignment by default for MM 
> since it gives a better balance between the instances, in particular when the 
> number of MM instances exceeds the typical number of partitions per topic. 
> There doesn't seem to be any need to keep range assignment since 
> copartitioning is not an issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-15 Thread Grant Henke
Thanks for the feedback Ewen & Ismael.

Fine about keeping replication_factor as INT32 (you mentioned 32k
> partitions in your reply, but I was talking about replication_factor which
> doesn't seem feasible to ever be that large. I didn't mention
> num_partitions for the reason you mentioned).


My apologies I did misread that.  We can use a INT16 for replication_factor.

Ewen's reply sums up my thoughts on the error handling points. It doesn't
> seem ideal to justify the wire protocol behaviour based on the Java
> implementation. If we need map-like semantics in the protocol, then maybe
> we need a `Map` type to complement `Array`? Otherwise, I still think we
> should consider throwing the appropriate errors instead of silently picking
> a behaviour. It would be good to know what others think.
>

Yeah, I am convinced. Better to be more strict in this case. I will update
the wiki and PR to mention that these scenarios will error.

The one thing I want to avoid is to many super specific error codes. I am
not sure how much of a problem it really is but in the case of wire
protocol errors like multiple instances of the same topic, do you have any
thoughts on the error? Should we make a generic InvalidRequest error and
log the detailed message on the broker for client authors to debug?

With regards to the ACLs, I think your proposal makes sense given where we
> are (if starting from scratch, I think I would have a `CREATE` operation on
> the `TOPIC` resource). And we can explore that in more detail when handling
> the Update ACLs request type. For the `Create Topics` request, it is
> following the same approach as auto-created topics aside from the error
> code (ClusterAuthorizationException instead
> of TopicAuthorizationException), which looks reasonable.
>

When looking at the changing the patch, it looks like changing from CREATE
to CREATE_TOPIC might pose some compatibility concerns. Is it alright if we
leave it CREATE for now and revisit after KIP-4? It should not collide with
the ACLs permission since we have control over that because its new.

Finally, on the timeout point, do we use negative timeouts to mean 0
> elsewhere in the protocol? In the code, negative timeouts are typically
> disallowed or they mean an infinite timeout (we have moved from the latter
> to the former in some of the Java networking code in recent releases).
>

The produce request timeout is very similar to this timeout. There is no
bounds validation on -1. Anything less than 0 is essentially 0. We could
validate the timeout too and return an InvalidRequest (or whatever is
discussed above) error in this case too if you prefer.

Thanks,
Grant

On Wed, Jun 15, 2016 at 6:42 AM, Ismael Juma <ism...@juma.me.uk> wrote:

> Thanks for the responses Grant.
>
> Fine about keeping replication_factor as INT32 (you mentioned 32k
> partitions in your reply, but I was talking about replication_factor which
> doesn't seem feasible to ever be that large. I didn't mention
> num_partitions for the reason you mentioned).
>
> Ewen's reply sums up my thoughts on the error handling points. It doesn't
> seem ideal to justify the wire protocol behaviour based on the Java
> implementation. If we need map-like semantics in the protocol, then maybe
> we need a `Map` type to complement `Array`? Otherwise, I still think we
> should consider throwing the appropriate errors instead of silently picking
> a behaviour. It would be good to know what others think.
>
> With regards to the ACLs, I think your proposal makes sense given where we
> are (if starting from scratch, I think I would have a `CREATE` operation on
> the `TOPIC` resource). And we can explore that in more detail when handling
> the Update ACLs request type. For the `Create Topics` request, it is
> following the same approach as auto-created topics aside from the error
> code (ClusterAuthorizationException instead
> of TopicAuthorizationException), which looks reasonable.
>
> Finally, on the timeout point, do we use negative timeouts to mean 0
> elsewhere in the protocol? In the code, negative timeouts are typically
> disallowed or they mean an infinite timeout (we have moved from the latter
> to the former in some of the Java networking code in recent releases).


> Ismael
>
> On Tue, Jun 14, 2016 at 11:51 PM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > Thanks for the review Ismael.
> >
> > `partition_count` or  `num_partitions` seems clearer to me.
> >
> >
> > Agreed, I updated the wiki and patch to use num_partitions.
> >
> > I wondered if this should be `INT16`. Maybe not worth it as it won't make
> > > much of a difference in terms of the request size though.
> > >
> >
> > Since Integer is used throughout for these values I think we should kee

[jira] [Updated] (KAFKA-3691) Confusing logging during metadata update timeout

2016-06-15 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3691:
---
Fix Version/s: 0.10.0.1

> Confusing logging during metadata update timeout
> 
>
> Key: KAFKA-3691
> URL: https://issues.apache.org/jira/browse/KAFKA-3691
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> When the KafkaProducer calls waitOnMetadata it will loop decrementing the 
> remainingWaitMs until it either receives the request metadata or runs out of 
> time. Inside the loop Metadata.awaitUpdate is called with the value in 
> remainingWaitMs. Inside Metadata.awaitUpdate a timeout execption could be 
> thrown using the remainingWaitMs which results in messages like:
> {noformat}
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
> after 3 ms.
> {noformat}
> Perhaps we should catch the exception and log the real maxWaitMs or change 
> the language to make the exception more clear. 
> Note: I still need to investigate further to be sure exactly when this 
> happens, but wanted to log the jira to make sure this is not forgotten. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3691) Confusing logging during metadata update timeout

2016-06-15 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3691:
---
Fix Version/s: 0.10.1.0

> Confusing logging during metadata update timeout
> 
>
> Key: KAFKA-3691
> URL: https://issues.apache.org/jira/browse/KAFKA-3691
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> When the KafkaProducer calls waitOnMetadata it will loop decrementing the 
> remainingWaitMs until it either receives the request metadata or runs out of 
> time. Inside the loop Metadata.awaitUpdate is called with the value in 
> remainingWaitMs. Inside Metadata.awaitUpdate a timeout execption could be 
> thrown using the remainingWaitMs which results in messages like:
> {noformat}
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
> after 3 ms.
> {noformat}
> Perhaps we should catch the exception and log the real maxWaitMs or change 
> the language to make the exception more clear. 
> Note: I still need to investigate further to be sure exactly when this 
> happens, but wanted to log the jira to make sure this is not forgotten. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3691) Confusing logging during metadata update timeout

2016-06-15 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3691:
---
Affects Version/s: 0.10.0.0

> Confusing logging during metadata update timeout
> 
>
> Key: KAFKA-3691
> URL: https://issues.apache.org/jira/browse/KAFKA-3691
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> When the KafkaProducer calls waitOnMetadata it will loop decrementing the 
> remainingWaitMs until it either receives the request metadata or runs out of 
> time. Inside the loop Metadata.awaitUpdate is called with the value in 
> remainingWaitMs. Inside Metadata.awaitUpdate a timeout execption could be 
> thrown using the remainingWaitMs which results in messages like:
> {noformat}
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
> after 3 ms.
> {noformat}
> Perhaps we should catch the exception and log the real maxWaitMs or change 
> the language to make the exception more clear. 
> Note: I still need to investigate further to be sure exactly when this 
> happens, but wanted to log the jira to make sure this is not forgotten. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3691) Confusing logging during metadata update timeout

2016-06-15 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3691:
---
Status: Patch Available  (was: Open)

> Confusing logging during metadata update timeout
> 
>
> Key: KAFKA-3691
> URL: https://issues.apache.org/jira/browse/KAFKA-3691
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>    Reporter: Grant Henke
>    Assignee: Grant Henke
> Fix For: 0.10.1.0, 0.10.0.1
>
>
> When the KafkaProducer calls waitOnMetadata it will loop decrementing the 
> remainingWaitMs until it either receives the request metadata or runs out of 
> time. Inside the loop Metadata.awaitUpdate is called with the value in 
> remainingWaitMs. Inside Metadata.awaitUpdate a timeout execption could be 
> thrown using the remainingWaitMs which results in messages like:
> {noformat}
> org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
> after 3 ms.
> {noformat}
> Perhaps we should catch the exception and log the real maxWaitMs or change 
> the language to make the exception more clear. 
> Note: I still need to investigate further to be sure exactly when this 
> happens, but wanted to log the jira to make sure this is not forgotten. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-14 Thread Grant Henke
ki)

> >- Why is CreateTopicRequest a batch request?
> >
>
> Should it be `CreateTopicsRequest` then?
>

Sure, I will update that in the patch and wiki.

P.S. I fixed a couple of typos I spotted on the wiki page, I hope that's OK.
>

Absolutely. Feel free to improve the wiki anytime.

Thanks,
Grant

On Tue, Jun 14, 2016 at 3:09 PM, Ismael Juma <ism...@juma.me.uk> wrote:

> Hi Grant,
>
> Thanks for the proposal. A few comments and questions below.
>
> On Fri, Jun 10, 2016 at 6:21 PM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > > CreateTopic Request (Version: 0) => [create_topic_requests] timeout
> > >   create_topic_requests => topic partitions replication_factor
> > [replica_assignment] [configs]
> > > topic => STRING
> > > partitions => INT32
> >
>
> `partition_count` or  `num_partitions` seems clearer to me.
>
> > replication_factor => INT32
> >
>
> I wondered if this should be `INT16`. Maybe not worth it as it won't make
> much of a difference in terms of the request size though.
>
> >2. Multiple instructions for the same topic in one request will be
> > >silently ignored, only the last from the list will be executed.
> > >   - This is because the list of topics is modeled server side as a
> > >   map with TopicName as the key
> >
>
> Silently ignoring what is likely a user error makes me uncomfortable
> generally. Is this really the best option?
>
>
> > >3. The principle must be authorized to the "Create" Operation on the
> > >"Cluster" resource to create topics.
> > >   - Unauthorized requests will receive a
> > ClusterAuthorizationException
> >
>
> Now that we are starting to use the `Create` operation, are we sure that
> the right model doesn't involve specifying the resource type? It seems to
> me that a `Create Topics` permission would make more sense as that would
> allow someone to be given `Create Topics` permission, but not `Create ACLs`
> for example. Was this discussed and discarded already?
>
>
> > >4.
> > >
> > >Only one from ReplicaAssignment or (Partitions + ReplicationFactor),
> > can
> > >be defined in one instruction. If both parameters are specified -
> > >ReplicaAssignment takes precedence.
> >
>
> This is similar to `2`, do we want to silently ignore data from the user or
> fail-fast?
>
>
> > >- In the case ReplicaAssignment is defined number of partitions and
> > >   replicas will be calculated from the supplied ReplicaAssignment.
> > >   - In the case of defined (Partitions + ReplicationFactor) replica
> > >   assignment will be automatically generated by the server.
> > >   - One or the other must be defined. The existing broker side auto
> > >   create defaults will not be used
> > >   (default.replication.factor, num.partitions). The client
> > implementation can
> > >   have defaults for these options when generating the messages.
> > >5. Setting a timeout > 0 will allow the request to block until the
> > >topic metadata is "complete" on the controller node.
> >
>
> What happens if timeout < 0?
>
>
> > >- Why is CreateTopicRequest a batch request?
> >
>
> Should it be `CreateTopicsRequest` then?
>
> Thanks,
> Ismael
>
> P.S. I fixed a couple of typos I spotted on the wiki page, I hope that's
> OK.
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-14 Thread Grant Henke
Hi Guozhang,

Thanks for the review.

I listed the possible CreateTopic Response error codes in my patch. Below
are the relevant error codes and how I think the admin client or producer
would handle them. This can be vetted more thoroughly in the code reviews:

   - INVALID_TOPIC_EXCEPTION(17)
   - Producer: Pass to the user to handle
  - Admin: Pass to the user to handle
   - CLUSTER_AUTHORIZATION_FAILED(31)
   - Producer: Pass to the user to handle
  - Admin: Pass to the user to handle
   - TOPIC_ALREADY_EXISTS(32)
   - Producer: Silently ignore (since this is auto create)
  - Admin: Pass to the user to handle
   - INVALID_PARTITIONS(38)
   - Producer:  Pass to the user to handle (shouldn't really occur)
  - Admin: Pass to the user to handle
   - INVALID_REPLICATION_FACTOR(39)
   - Producer: Pass to the user to handle
 - This could be avoided/retryable by adjusting down from the
 default to the number of live brokers. We should discuss this
during the
 client auto create KIP.
  - Admin: Pass to the user to handle
   - INVALID_REPLICA_ASSIGNMENT(40)
   - Producer: Pass to the user to handle (shouldn't really occur)
  - Admin: Pass to the user to handle
   - INVALID_CONFIG(41)
   - Producer: Pass to the user to handle
  - Admin: Pass to the user to handle
   - NOT_CONTROLLER(42)
   - Producer: Refresh metadata and retry until timeout
  - Admin: Refresh metadata and retry until timeout

Reviewing those did make me think we may want something like a
"auto.create.topics.config"
for producers. Just noting here for that client auto create KIP.

Thanks,
Grant



On Tue, Jun 14, 2016 at 1:37 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> Thanks Grant, the design proposal LGTM overall.
>
> One minor question about the error codes in CreateTopic Response, what are
> the possible values? I know this may be out of the scope of this KIP, but
> would also want to think about how producers should handle each one of them
> accordingly, especially if the create topic request is for a batch of
> topics, and different error codes are returned.
>
> Guozhang
>
>
>
> On Mon, Jun 13, 2016 at 6:54 PM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > Thanks for the review Jun.
> >
> > You probably want to make it clearer if timeout > 0, what waiting for
> topic
> > > metadata is "complete" means. In the first implementation, it really
> > means
> > > that the topic metadata is propagated to the controller's metadata
> cache.
> >
> >
> > I updated the wiki to be more descriptive. Below is the updated text:
> >
> > Setting a timeout > 0 will allow the request to block until the topic
> > > metadata is "complete" on the controller node.
> > >
> > >- Complete means the local topic metadata cache been completely
> > >populated and all partitions have leaders
> > >   - The topic metadata is updated when the controller sends out
> > >   update metadata requests to the brokers
> > >- If a timeout error occurs, the topic could still be created
> > >successfully at a later time. Its up to the client to query for the
> > state
> > >at that point.
> > >
> > >
> > Thanks,
> > Grant
> >
> >
> > On Sun, Jun 12, 2016 at 4:14 PM, Jun Rao <j...@confluent.io> wrote:
> >
> > > Grant,
> > >
> > > Thanks for the proposal. It looks good to me.
> > >
> > > You probably want to make it clearer if timeout > 0, what waiting for
> > topic
> > > metadata is "complete" means. In the first implementation, it really
> > means
> > > that the topic metadata is propagated to the controller's metadata
> cache.
> > >
> > > Jun
> > >
> > > On Fri, Jun 10, 2016 at 9:21 AM, Grant Henke <ghe...@cloudera.com>
> > wrote:
> > >
> > > > Now that Kafka 0.10 has been released I would like to start work on
> the
> > > new
> > > > protocol messages and client implementation for KIP-4. In order to
> > break
> > > up
> > > > the discussion and feedback I would like to continue breaking up the
> > > > content in to smaller pieces.
> > > >
> > > > This discussion thread is for the CreateTopic request/response and
> > server
> > > > side implementation. Details for this implementation can be read
> here:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadmin

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-13 Thread Grant Henke
Thanks for the review Jun.

You probably want to make it clearer if timeout > 0, what waiting for topic
> metadata is "complete" means. In the first implementation, it really means
> that the topic metadata is propagated to the controller's metadata cache.


I updated the wiki to be more descriptive. Below is the updated text:

Setting a timeout > 0 will allow the request to block until the topic
> metadata is "complete" on the controller node.
>
>- Complete means the local topic metadata cache been completely
>populated and all partitions have leaders
>   - The topic metadata is updated when the controller sends out
>   update metadata requests to the brokers
>- If a timeout error occurs, the topic could still be created
>successfully at a later time. Its up to the client to query for the state
>at that point.
>
>
Thanks,
Grant


On Sun, Jun 12, 2016 at 4:14 PM, Jun Rao <j...@confluent.io> wrote:

> Grant,
>
> Thanks for the proposal. It looks good to me.
>
> You probably want to make it clearer if timeout > 0, what waiting for topic
> metadata is "complete" means. In the first implementation, it really means
> that the topic metadata is propagated to the controller's metadata cache.
>
> Jun
>
> On Fri, Jun 10, 2016 at 9:21 AM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > Now that Kafka 0.10 has been released I would like to start work on the
> new
> > protocol messages and client implementation for KIP-4. In order to break
> up
> > the discussion and feedback I would like to continue breaking up the
> > content in to smaller pieces.
> >
> > This discussion thread is for the CreateTopic request/response and server
> > side implementation. Details for this implementation can be read here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >
> > I have included the exact content below for clarity:
> >
> > > Create Topic Request (KAFKA-2945
> > > <https://issues.apache.org/jira/browse/KAFKA-2945>)
> > >
> > >
> > > CreateTopic Request (Version: 0) => [create_topic_requests] timeout
> > >   create_topic_requests => topic partitions replication_factor
> > [replica_assignment] [configs]
> > > topic => STRING
> > > partitions => INT32
> > > replication_factor => INT32
> > > replica_assignment => partition_id [replicas]
> > >   partition_id => INT32
> > >   replicas => INT32
> > > configs => config_key config_value
> > >   config_key => STRING
> > >   config_value => STRING
> > >   timeout => INT32
> > >
> > > CreateTopicRequest is a batch request to initiate topic creation with
> > > either predefined or automatic replica assignment and optionally topic
> > > configuration.
> > >
> > > Request semantics:
> > >
> > >1. Must be sent to the controller broker
> > >2. Multiple instructions for the same topic in one request will be
> > >silently ignored, only the last from the list will be executed.
> > >   - This is because the list of topics is modeled server side as a
> > >   map with TopicName as the key
> > >3. The principle must be authorized to the "Create" Operation on the
> > >"Cluster" resource to create topics.
> > >   - Unauthorized requests will receive a
> > ClusterAuthorizationException
> > >4.
> > >
> > >Only one from ReplicaAssignment or (Partitions + ReplicationFactor),
> > can
> > >be defined in one instruction. If both parameters are specified -
> > >ReplicaAssignment takes precedence.
> > >- In the case ReplicaAssignment is defined number of partitions and
> > >   replicas will be calculated from the supplied ReplicaAssignment.
> > >   - In the case of defined (Partitions + ReplicationFactor) replica
> > >   assignment will be automatically generated by the server.
> > >   - One or the other must be defined. The existing broker side auto
> > >   create defaults will not be used
> > >   (default.replication.factor, num.partitions). The client
> > implementation can
> > >   have defaults for these options when generating the messages.
> > >5. Setting a timeout > 0 will allow the request to block until the
> &

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-13 Thread Grant Henke
Thanks for the review Gwen.

1. The replica assignment protocol takes [replicas], there is the
> implicit assumption that the first replica is the leader. This matches
> current behavior elsewhere, but lets document it explicitly.


I added this to the wiki and will update the protocol doc string in the
patch.

2. I like the timeout, but want to clarify why, since it may not be
> obvious to everyone:


I tried to describe why a timeout, even if not global, is useful in
the "Cluster
Consistent Blocking
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-clusterconsistentblocking>"
section. I have a QA that links to that section in the Create Topic section
(Fixed the broken link). Below is the relevant text from that section:

The intermediate changes in KIP-4 should allow an easy transition to
> "complete blocking" when the work can be done. This is supported by
> providing optional local blocking in the mean time. This local blocking
> only blocks until the local state on the controller is correct. We will
> still provide a polling mechanism for users that do not want to block at
> all. A polling mechanism is required in the optimal implementation too
> because users still need a way to check state after a timeout occurs
> because operations like "create topic" are not transactional. Local
> blocking has the added benefit of avoiding wasted poll requests to other
> brokers when its impossible for the request to be completed. If the
> controllers state is not correct, then the other brokers cant be either.
> Clients who don't want to validate the entire cluster state is correct can
> block on the controller and avoid polling all together with reasonable
> confidence that though they may get a retriable error on follow up
> requests, the requested change was successful and the cluster will be
> accurate eventually.

Because we already add a timeout field to the requests wire protocols,
> changing the behavior to block until the cluster is consistent in the
> future would not require a protocol change. Though the version could be
> bumped to indicate a behavior change.


Thanks,
Grant


On Fri, Jun 10, 2016 at 12:34 PM, Gwen Shapira <g...@confluent.io> wrote:

> Thank you for the clear proposal, Grant!
>
> I like the request/response objects and the timeout semantics. Two
> comments:
>
> 1. The replica assignment protocol takes [replicas], there is the
> implicit assumption that the first replica is the leader. This matches
> current behavior elsewhere, but lets document it explicitly.
>
> 2. I like the timeout, but want to clarify why, since it may not be
> obvious to everyone:
> Currently, the response is sent when the controller has sent the
> "update metadata" request to the brokers involved with the new topic.
> It is a rather weak guarantee, but if clients decide to poll the
> brokers for updates, it does reduce the time spent polling.
> More important, this behavior is net improvement on current state
> (completely async and ZK dependent) and when we do have a way to get
> "ack" from replicas, we will be able to add the new behavior without
> changing the protocol (just the semantics of waiting).
>
> Gwen
>
> On Fri, Jun 10, 2016 at 7:21 PM, Grant Henke <ghe...@cloudera.com> wrote:
> > Now that Kafka 0.10 has been released I would like to start work on the
> new
> > protocol messages and client implementation for KIP-4. In order to break
> up
> > the discussion and feedback I would like to continue breaking up the
> > content in to smaller pieces.
> >
> > This discussion thread is for the CreateTopic request/response and server
> > side implementation. Details for this implementation can be read here:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-CreateTopicRequest
> >
> > I have included the exact content below for clarity:
> >
> >> Create Topic Request (KAFKA-2945
> >> <https://issues.apache.org/jira/browse/KAFKA-2945>)
> >>
> >>
> >> CreateTopic Request (Version: 0) => [create_topic_requests] timeout
> >>   create_topic_requests => topic partitions replication_factor
> [replica_assignment] [configs]
> >> topic => STRING
> >> partitions => INT32
> >> replication_factor => INT32
> >> replica_assignment => partition_id [replicas]
> >>   partition_id => INT32
> >>   replicas => INT32
> >> configs => config_key 

Re: [DISCUSS] KIP-4 Create Topic Schema

2016-06-13 Thread Grant Henke
Hi Jay,

Good point one of the main benefits of the create topic api is removing the
server side auto create. The work is noted in the Follow Up Changes
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-FollowUpChangesfollow-up-changes>
section
of the KIP-4 wiki and tacked by KAFKA-2410
<https://issues.apache.org/jira/browse/KAFKA-2410>.

You are pretty much spot on with the plan. But there are some things that
would need to be discussed during that change that likely deserve their own
KIP. I will lay out some of my thoughts below. However, do you mind if we
defer the rest of the discussion until after the create topics patch is
done? (I am happy drive that KIP as soon as this patch is in)

High level plan:

   1. We set auto.create.topics.enable=false by default on the server and
   deprecate it
   2. We add a few new producer configs
  1. auto.create.topics.enable (should document that privileges are
  required if using ACLs)
  2. auto.create.topics.partitions
  3. auto.create.topics.replicas
   3. The producer tracks the location of the controller which it now gets
   in the metadata response and includes this in its internal cluster metadata
   representation
   4. The producer uses the create topic api to make a request to
   the controller when it gets an error about a non-existent topic

I mocked up a quick implementation based off of my create topics PR to vet
the basics. But there are a still some open questions and things to test.

The strawman implementation does the following:

   - Updates the metadata with the controller information
   - Set "topicsToBeCreated" in the metadata in NetworkClient.handleResponse
  - topics are added when receiving an UNKNOWN_TOPIC_OR_PARTITION error
   - Sends CreateTopicRequests to the controller
   in NetworkClient.maybeUpdate
  - This effectively makes create topic part of the metadata updates

Some of the things that need to be thought through include:

   - Should auto.create.topics.replicas scale down to the number of known
   live servers at create time?
   - Should the consumer be able to auto create topics too?
   - What happens when both client are broker side auto create are enabled?
  - I think the broker wins in this case since metadata request happens
  first
   - What happens when the user is unauthorized to create topics?
  - Either throw exception or wait for metadata update timeout

Thanks,
Grant

On Sun, Jun 12, 2016 at 2:08 PM, Jay Kreps <j...@confluent.io> wrote:

> Hey Grant,
>
> Great to see this progressing. That API looks good to me. Thanks for the
> thoughtful write-up.
>
> One thing that would be great to add to this KIP would be a quick sketch of
> how the create topic api can be used to get rid of the thing where we
> create topics when you ask for their metadata. This doesn't need to be in
> any great depth, just enough to make sure this new api will work for that
> use case.
>
> I think the plan is something like
>
>1. We set auto.create.topics.enable=false by default on the server and
>deprecate it
>2. We add a new producer config auto.create.topics.enable
>3. The producer tracks the location of the controller which it now gets
>in the metadata response and includes this in its internal cluster
> metadata
>representation
>4. The producer uses the create topic api to make a request to the
>controller when it gets an error about a non-existent topic
>
> I think the semantics of this at first will be the same as they are now--if
> retries are disabled the first produce request will potentially fail if the
> topic creation hasn't quite completed. This isn't great but it isn't worse
> than the current state and I think would be fixed either by future
> improvements to make the requests fully blocking or by idempotence for the
> producer (which would mean retries were always enabled).
>
> One thing I'm not sure of is whether the admin java api, which would
> maintain its own connection pool etc, would be used internally by the
> producer (and potentially consumer) or if they would just reuse the request
> objects.
>
> Just trying to write this down to sanity check that it will work.
>
> -Jay
>
> On Fri, Jun 10, 2016 at 9:21 AM, Grant Henke <ghe...@cloudera.com> wrote:
>
> > Now that Kafka 0.10 has been released I would like to start work on the
> new
> > protocol messages and client implementation for KIP-4. In order to break
> up
> > the discussion and feedback I would like to continue breaking up the
> > content in to smaller pieces.
> >
> > This discussion thread is for the CreateTopic request/response and server
>

[DISCUSS] KIP-4 Create Topic Schema

2016-06-10 Thread Grant Henke
n topic and topic creation
> result error code (see New Protocol Errors
> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-NewProtocolErrors>
> ).
>

A sample PR is on github (https://github.com/apache/kafka/pull/1489) though
it could change drastically based on the feedback here.

Thanks,
Grant

-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Updated] (KAFKA-3789) Upgrade Snappy to fix snappy decompression errors

2016-06-03 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3789:
---
Status: Patch Available  (was: Open)

> Upgrade Snappy to fix snappy decompression errors
> -
>
> Key: KAFKA-3789
> URL: https://issues.apache.org/jira/browse/KAFKA-3789
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0
>    Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Critical
> Fix For: 0.10.0.1
>
>
> snappy-java recently fixed a bug where parsing the MAGIC HEADER was being 
> handled incorrectly: https://github.com/xerial/snappy-java/issues/142
> This issue caused "unknown broker exceptions" in the clients and prevented 
> these messages from being appended to the log when messages were written 
> using snappy c bindings in clients like librdkafka or ruby-kafka and read 
> using snappy-java in the broker.   
> The related librdkafka issue is here: 
> https://github.com/edenhill/librdkafka/issues/645
> I am able to regularly reproduce the issue with librdkafka in 0.10 and after 
> upgrading snappy-java to 1.1.2.6 the issue is resolved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3789) Upgrade Snappy to fix snappy decompression errors

2016-06-03 Thread Grant Henke (JIRA)
Grant Henke created KAFKA-3789:
--

 Summary: Upgrade Snappy to fix snappy decompression errors
 Key: KAFKA-3789
 URL: https://issues.apache.org/jira/browse/KAFKA-3789
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.0.0
Reporter: Grant Henke
Assignee: Grant Henke
Priority: Critical
 Fix For: 0.10.0.1


snappy-java recently fixed a bug where parsing the MAGIC HEADER was being 
handled incorrectly: https://github.com/xerial/snappy-java/issues/142

This issue caused "unknown broker exceptions" in the clients and prevented 
these messages from being appended to the log when messages were written using 
snappy c bindings in clients like librdkafka or ruby-kafka and read using 
snappy-java in the broker.   

The related librdkafka issue is here: 
https://github.com/edenhill/librdkafka/issues/645

I am able to regularly reproduce the issue with librdkafka in 0.10 and after 
upgrading snappy-java to 1.1.2.6 the issue is resolved. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3764) Error processing append operation on partition

2016-06-02 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke reassigned KAFKA-3764:
--

Assignee: Grant Henke

> Error processing append operation on partition
> --
>
> Key: KAFKA-3764
> URL: https://issues.apache.org/jira/browse/KAFKA-3764
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Martin Nowak
>    Assignee: Grant Henke
>
> After updating Kafka from 0.9.0.1 to 0.10.0.0 I'm getting plenty of `Error 
> processing append operation on partition` errors. This happens with 
> ruby-kafka as producer and enabled snappy compression.
> {noformat}
> [2016-05-27 20:00:11,074] ERROR [Replica Manager on Broker 2]: Error 
> processing append operation on partition m2m-0 (kafka.server.ReplicaManager)
> kafka.common.KafkaException: 
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:159)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:85)
> at 
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNextOuter(ByteBufferMessageSet.scala:357)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNext(ByteBufferMessageSet.scala:369)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNext(ByteBufferMessageSet.scala:324)
> at 
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
> at 
> kafka.message.ByteBufferMessageSet.validateMessagesAndAssignOffsets(ByteBufferMessageSet.scala:427)
> at kafka.log.Log.liftedTree1$1(Log.scala:339)
> at kafka.log.Log.append(Log.scala:338)
> at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:443)
> at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:429)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:237)
> at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:406)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:392)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:392)
> at 
> kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:328)
> at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:405)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: failed to read chunk
> at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:433)
> at 
> org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:167)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readLong(DataInputStream.java:416)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.readMessageFromStream(ByteBufferMessageSet.scala:118)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:153)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3764) Error processing append operation on partition

2016-06-02 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15312916#comment-15312916
 ] 

Grant Henke commented on KAFKA-3764:


It looks like this is likely caused by 
https://github.com/xerial/snappy-java/issues/142 and is fixed in [snappy-java 
1.1.2.6|http://search.maven.org/#artifactdetails%7Corg.xerial.snappy%7Csnappy-java%7C1.1.2.6%7Cbundle].
 This has also been identified as the cause of 
https://github.com/edenhill/librdkafka/issues/645

I can upgrade to snappy-java 1.1.2.6, test with librdkafka and send a PR.

> Error processing append operation on partition
> --
>
> Key: KAFKA-3764
> URL: https://issues.apache.org/jira/browse/KAFKA-3764
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0
>Reporter: Martin Nowak
>
> After updating Kafka from 0.9.0.1 to 0.10.0.0 I'm getting plenty of `Error 
> processing append operation on partition` errors. This happens with 
> ruby-kafka as producer and enabled snappy compression.
> {noformat}
> [2016-05-27 20:00:11,074] ERROR [Replica Manager on Broker 2]: Error 
> processing append operation on partition m2m-0 (kafka.server.ReplicaManager)
> kafka.common.KafkaException: 
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:159)
> at 
> kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:85)
> at 
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNextOuter(ByteBufferMessageSet.scala:357)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNext(ByteBufferMessageSet.scala:369)
> at 
> kafka.message.ByteBufferMessageSet$$anon$2.makeNext(ByteBufferMessageSet.scala:324)
> at 
> kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:64)
> at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:56)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at kafka.utils.IteratorTemplate.foreach(IteratorTemplate.scala:30)
> at 
> kafka.message.ByteBufferMessageSet.validateMessagesAndAssignOffsets(ByteBufferMessageSet.scala:427)
> at kafka.log.Log.liftedTree1$1(Log.scala:339)
> at kafka.log.Log.append(Log.scala:338)
> at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:443)
> at kafka.cluster.Partition$$anonfun$11.apply(Partition.scala:429)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:237)
> at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:406)
> at 
> kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:392)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at 
> scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
> at scala.collection.AbstractTraversable.map(Traversable.scala:104)
> at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:392)
> at 
> kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:328)
> at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:405)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: failed to read chunk
> at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:433)
> at 
> org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:167)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readLong(DataInputStream.java:416)
>  

Re: [DISCUSS] KIP-62: Allow consumer to send heartbeats from a background thread

2016-05-26 Thread Grant Henke
Hi Jason,

Thanks for writing up a proposal (and a thorough one)! This is something
that I had been thinking about this week too as I have run into it more
than a handful of times now.

I like the idea of having a larger processing timeout, that timeout in
unison with max.poll.records should in many cases provide a reasonable
assurance that the consumer will stay alive.

In rejected alternatives "Add a separate API the user can call to indicate
liveness" is listed. I think a heartbeat api could be added along with
these new timeout configurations and used for "advanced" use cases where
the processing time could be highly variant and less predictable. I think a
place where we might use the heartbeat api in Kafka is MirrorMaker.

Today, I have seen people trying to find ways to leverage the existing api
to "force" heartbeats by:

1. Calling poll to get the batch of records to process
2. Call pause on all partitions
3. Process the record batch
3a. While processing periodically call poll (which is essentially just
heartbeat since it returns no records and is paused)
4. Commit offsets and un-pause
5. Repeat from 1

Thanks,
Grant






On Wed, May 25, 2016 at 6:32 PM, Jason Gustafson <ja...@confluent.io> wrote:

> Hi All,
>
> One of the persistent problems we see with the new consumer is the use of
> the session timeout in order to ensure progress. Whenever there is a delay
> in message processing which exceeds the session timeout, no heartbeats can
> be sent and the consumer is removed from the group. We seem to hit this
> problem everywhere the consumer is used (including Kafka Connect and Kafka
> Streams) and we don't always have a great solution. I've written a KIP to
> address this problem here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-62%3A+Allow+consumer+to+send+heartbeats+from+a+background+thread
> .
> Have a look and let me know what you think.
>
> Thanks,
> Jason
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [VOTE] KIP-58 - Make Log Compaction Point Configurable

2016-05-25 Thread Grant Henke
+1 (non-binding)

On Wed, May 25, 2016 at 8:20 AM, Ben Stopford <b...@confluent.io> wrote:

> +1 (non-binding)
>
> > On 25 May 2016, at 14:07, Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > +1 (binding)
> >
> > I also think `log.cleaner.compaction.delay.ms` is clearer. As an aside,
> I
> > did notice that the topic level config for `log.segment.delete.delay.ms`
> > (mentioned by Ewen) is `file.delete.delay.ms`, which seems a bit
> > inconsistent.
> >
> > Ismael
> >
> > On Wed, May 25, 2016 at 4:43 AM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> >> +1 (binding)
> >>
> >> Agreed that the log.cleaner.compaction.delay.ms is probably a better
> name,
> >> and consistent with log.segment.delete.delay.ms. Checked configs for
> other
> >> suffixes that seemed reasonable and despite only appearing in that one
> >> broker config, it seems the best match.
> >>
> >> -Ewen
> >>
> >> On Tue, May 24, 2016 at 8:16 PM, Jay Kreps <j...@confluent.io> wrote:
> >>
> >>> I'm +1 on the concept.
> >>>
> >>> As with others I think the core challenge is to express this in an
> >>> intuitive way, and carry the same terminology across the docs, the
> >> configs,
> >>> and docstrings for the configs. Pictures would help.
> >>>
> >>> -Jay
> >>>
> >>> On Tue, May 24, 2016 at 6:54 PM, James Cheng <wushuja...@gmail.com>
> >> wrote:
> >>>
> >>>> I'm not sure what are the rules for who is allowed to vote, but I'm:
> >>>>
> >>>> +1 (non-binding) on the proposal
> >>>>
> >>>> I agree that the "log.cleaner.min.compaction.lag.ms" name is a little
> >>>> confusing.
> >>>>
> >>>> I like Becket's "log.cleaner.compaction.delay.ms", or something
> >> similar.
> >>>>
> >>>> The KIP describes it as the portion of the topic "that will remain
> >>>> uncompacted", so if you're open to alternate names:
> >>>>
> >>>> "log.cleaner.uncompacted.range.ms"
> >>>> "log.cleaner.uncompacted.head.ms" (Except that I always get "log
> tail"
> >>>> and "log head" mixed up...)
> >>>> "log.cleaner.uncompacted.retention.ms" (Will it be confusing to have
> >> the
> >>>> word "retention" in non-time-based topics?)
> >>>>
> >>>> I just thought of something: what happens to the value of "
> >>>> log.cleaner.delete.retention.ms"? Does it still have the same meaning
> >> as
> >>>> before? Does the timer start when log compaction happens (as it
> >> currently
> >>>> does), so in reality, tombstones will only be removed from the log
> some
> >>>> time after (log.cleaner.min.compaction.lag.ms +
> >>>> log.cleaner.delete.retention.ms)?
> >>>>
> >>>> -James
> >>>>
> >>>>> On May 24, 2016, at 5:46 PM, Becket Qin <becket@gmail.com>
> >> wrote:
> >>>>>
> >>>>> +1 (non-binding) on the proposal. Just a minor suggestion.
> >>>>>
> >>>>> I am wondering should we change the config name to "
> >>>>> log.cleaner.compaction.delay.ms"? The first glance at the
> >>> configuration
> >>>>> name is a little confusing. I was thinking do we have a "max" lag?
> >> And
> >>> is
> >>>>> this "lag" a bad thing?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Jiangjie (Becket) Qin
> >>>>>
> >>>>>
> >>>>> On Tue, May 24, 2016 at 4:21 PM, Gwen Shapira <g...@confluent.io>
> >>> wrote:
> >>>>>
> >>>>>> +1 (binding)
> >>>>>>
> >>>>>> Thanks for responding to all my original concerns in the discussion
> >>>> thread.
> >>>>>>
> >>>>>> On Tue, May 24, 2016 at 1:37 PM, Eric Wasserman <
> >>>> eric.wasser...@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I would like to begin voting on KIP-58 - Make Log Compaction Point
> >>>>>>> Configurable
> >>>>>>>
> >>>>>>> KIP-58 is here:  <
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-58+-+Make+Log+Compaction+Point+Configurable
> >>>>>>>>
> >>>>>>>
> >>>>>>> The Jira ticket KAFKA-1981 Make log compaction point configurable
> >>>>>>> is here: <https://issues.apache.org/jira/browse/KAFKA-1981>
> >>>>>>>
> >>>>>>> The original pull request is here: <
> >>>>>>> https://github.com/apache/kafka/pull/1168>
> >>>>>>> (this includes configurations for size and message count lags that
> >>> will
> >>>>>> be
> >>>>>>> removed per discussion of KIP-58).
> >>>>>>>
> >>>>>>> The vote will run for 72 hours.
> >>>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks,
> >> Ewen
> >>
>
>


-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


Re: [ANNOUCE] Apache Kafka 0.10.0.0 Released

2016-05-24 Thread Grant Henke
Awesome! Thanks for managing the release Gwen!

On Tue, May 24, 2016 at 11:24 AM, Gwen Shapira <gwens...@apache.org> wrote:

> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 0.10.0.0.
> This is a major release with exciting new features, including first
> release of KafkaStreams and many other improvements.
>
> All of the changes in this release can be found:
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/RELEASE_NOTES.html
>
> Apache Kafka is high-throughput, publish-subscribe messaging system
> rethought of as a distributed commit log.
>
> ** Fast => A single Kafka broker can handle hundreds of megabytes of reads
> and
> writes per second from thousands of clients.
>
> ** Scalable => Kafka is designed to allow a single cluster to serve as the
> central data backbone
> for a large organization. It can be elastically and transparently expanded
> without downtime.
> Data streams are partitioned and spread over a cluster of machines to allow
> data streams
> larger than the capability of any single machine and to allow clusters of
> co-ordinated consumers.
>
> ** Durable => Messages are persisted on disk and replicated within the
> cluster to prevent
> data loss. Each broker can handle terabytes of messages without performance
> impact.
>
> ** Distributed by Design => Kafka has a modern cluster-centric design that
> offers
> strong durability and fault-tolerance guarantees.
>
> You can download the source release from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka-0.10.0.0-src.tgz
>
> and binary releases from
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.0.0/kafka_2.11-0.10.0.0.tgz
>
> A big thank you for the following people who have contributed to the
> 0.10.0.0 release.
>
> Adam Kunicki, Aditya Auradkar, Alex Loddengaard, Alex Sherwin, Allen
> Wang, Andrea Cosentino, Anna Povzner, Ashish Singh, Atul Soman, Ben
> Stopford, Bill Bejeck, BINLEI XUE, Chen Shangan, Chen Zhu, Christian
> Posta, Cory Kolbeck, Damian Guy, dan norwood, Dana Powers, David
> Jacot, Denise Fernandez, Dionysis Grigoropoulos, Dmitry Stratiychuk,
> Dong Lin, Dongjoon Hyun, Drausin Wulsin, Duncan Sands, Dustin Cote,
> Eamon Zhang, edoardo, Edward Ribeiro, Eno Thereska, Ewen
> Cheslack-Postava, Flavio Junqueira, Francois Visconte, Frank Scholten,
> Gabriel Zhang, gaob13, Geoff Anderson, glikson, Grant Henke, Greg
> Fodor, Guozhang Wang, Gwen Shapira, Igor Stepanov, Ishita Mandhan,
> Ismael Juma, Jaikiran Pai, Jakub Nowak, James Cheng, Jason Gustafson,
> Jay Kreps, Jeff Klukas, Jeremy Custenborder, Jesse Anderson, jholoman,
> Jiangjie Qin, Jin Xing, jinxing, Jonathan Bond, Jun Rao, Ján Koščo,
> Kaufman Ng, kenji yoshida, Kim Christensen, Kishore Senji, Konrad,
> Liquan Pei, Luciano Afranllie, Magnus Edenhill, Maksim Logvinenko,
> manasvigupta, Manikumar reddy O, Mark Grover, Matt Fluet, Matt
> McClure, Matthias J. Sax, Mayuresh Gharat, Micah Zoltu, Michael Blume,
> Michael G. Noll, Mickael Maison, Onur Karaman, ouyangliduo, Parth
> Brahmbhatt, Paul Cavallaro, Pierre-Yves Ritschard, Piotr Szwed,
> Praveen Devarao, Rafael Winterhalter, Rajini Sivaram, Randall Hauch,
> Richard Whaling, Ryan P, Samuel Julius Hecht, Sasaki Toru, Som Sahu,
> Sriharsha Chintalapani, Stig Rohde Døssing, Tao Xiao, Tom Crayford,
> Tom Dearman, Tom Graves, Tom Lee, Tomasz Nurkiewicz, Vahid Hashemian,
> William Thurston, Xin Wang, Yasuhiro Matsuda, Yifan Ying, Yuto
> Kawamura, zhuchen1018
>
> We welcome your help and feedback. For more information on how to
> report problems, and to get involved, visit the project website at
> http://kafka.apache.org/
>
> Thanks,
>
> Gwen
>



-- 
Grant Henke
Software Engineer | Cloudera
gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke


[jira] [Updated] (KAFKA-3717) Support building aggregate javadoc for all project modules

2016-05-16 Thread Grant Henke (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KAFKA-3717:
---
Summary: Support building aggregate javadoc for all project modules  (was: 
On 0.10.0 branch, building javadoc results in very small subset of expected 
javadocs)

> Support building aggregate javadoc for all project modules
> --
>
> Key: KAFKA-3717
> URL: https://issues.apache.org/jira/browse/KAFKA-3717
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>    Assignee: Grant Henke
>
> If you run "./gradlew javadoc", you will only get JavaDoc for the High Level 
> Consumer. All the new clients are missing.
> See here: http://home.apache.org/~gwenshap/0.10.0.0-rc5/javadoc/
> I suggest fixing in 0.10.0 branch and in trunk, not rolling a new release 
> candidate, but updating our docs site.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3717) On 0.10.0 branch, building javadoc results in very small subset of expected javadocs

2016-05-16 Thread Grant Henke (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285897#comment-15285897
 ] 

Grant Henke commented on KAFKA-3717:


Currently the build places a javadoc directory and jar in each modules build 
directory. This means you need to manually grab or merge all of them. I 
confirmed with [~gwenshap] that this was good enough for the 0.10 release. 

Going forward it would be nice to aggregate the docs output for all 
sub-modules. This is related to the work tracked by KAFKA-3405.  

Since manually collecting the javadocs works for now, I will update the title 
to track aggregating javadocs. 

> On 0.10.0 branch, building javadoc results in very small subset of expected 
> javadocs
> 
>
> Key: KAFKA-3717
> URL: https://issues.apache.org/jira/browse/KAFKA-3717
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>    Assignee: Grant Henke
>
> If you run "./gradlew javadoc", you will only get JavaDoc for the High Level 
> Consumer. All the new clients are missing.
> See here: http://home.apache.org/~gwenshap/0.10.0.0-rc5/javadoc/
> I suggest fixing in 0.10.0 branch and in trunk, not rolling a new release 
> candidate, but updating our docs site.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   7   8   >