[jira] [Resolved] (KAFKA-2521) Documentation: Unclear cleanup policy on wiki

2016-03-27 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy resolved KAFKA-2521.

Resolution: Fixed

This got fixed in KAFKA-2583

> Documentation: Unclear cleanup policy on wiki
> -
>
> Key: KAFKA-2521
> URL: https://issues.apache.org/jira/browse/KAFKA-2521
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chris Hiestand
>
> "The default policy for handling log tails. Can be either delete or dedupe."
> The other documentation says this should either be delete or compact. 
>  so I'm guessing this wiki 
> needs to be updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (KAFKA-3473) Add controller channel manager request queue time metric.

2016-03-27 Thread Dong Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin reassigned KAFKA-3473:
---

Assignee: Dong Lin

> Add controller channel manager request queue time metric.
> -
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
>Assignee: Dong Lin
> Fix For: 0.10.1.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: [documentation] Fix small typo in design secti...

2016-03-27 Thread paulcavallaro
GitHub user paulcavallaro opened a pull request:

https://github.com/apache/kafka/pull/1151

[documentation] Fix small typo in design section

Sentence was missing "as", minor grammar clean up.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/paulcavallaro/kafka docs-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1151.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1151


commit e8745676cfb14bd6143f7cece6091309b63777d2
Author: Paul Cavallaro 
Date:   2016-03-28T03:45:52Z

[documentation] Fix small typo in design section




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3474) add metrics to track replica fetcher timeouts

2016-03-27 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao updated KAFKA-3474:
---
Status: Patch Available  (was: Open)

> add metrics to track replica fetcher timeouts
> -
>
> Key: KAFKA-3474
> URL: https://issues.apache.org/jira/browse/KAFKA-3474
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Jun Rao
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3474: add metrics to track replica fetch...

2016-03-27 Thread junrao
GitHub user junrao opened a pull request:

https://github.com/apache/kafka/pull/1150

KAFKA-3474: add metrics to track replica fetcher timeouts



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junrao/kafka kafka-3474

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1150


commit 9a960a3b7e31d75d5ce47e26c375777329174494
Author: Jun Rao 
Date:   2016-03-28T03:15:22Z

add timeout metrics




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3474) add metrics to track replica fetcher timeouts

2016-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213748#comment-15213748
 ] 

ASF GitHub Bot commented on KAFKA-3474:
---

GitHub user junrao opened a pull request:

https://github.com/apache/kafka/pull/1150

KAFKA-3474: add metrics to track replica fetcher timeouts



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/junrao/kafka kafka-3474

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1150


commit 9a960a3b7e31d75d5ce47e26c375777329174494
Author: Jun Rao 
Date:   2016-03-28T03:15:22Z

add timeout metrics




> add metrics to track replica fetcher timeouts
> -
>
> Key: KAFKA-3474
> URL: https://issues.apache.org/jira/browse/KAFKA-3474
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Jun Rao
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3474) add metrics to track replica fetcher timeouts

2016-03-27 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-3474:
--

 Summary: add metrics to track replica fetcher timeouts
 Key: KAFKA-3474
 URL: https://issues.apache.org/jira/browse/KAFKA-3474
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 0.9.0.0
Reporter: Jun Rao
Assignee: Jun Rao






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3473) Add controller channel manager request queue time metric.

2016-03-27 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3473:

Assignee: (was: Neha Narkhede)

> Add controller channel manager request queue time metric.
> -
>
> Key: KAFKA-3473
> URL: https://issues.apache.org/jira/browse/KAFKA-3473
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.10.0.0
>Reporter: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> Currently controller appends the requests to brokers into controller channel 
> manager queue during state transition. i.e. the state transition are 
> propagated asynchronously. We need to track the request queue time on the 
> controller side to see how long the state propagation is delayed after the 
> state transition finished on the controller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3473) Add controller channel manager request queue time metric.

2016-03-27 Thread Jiangjie Qin (JIRA)
Jiangjie Qin created KAFKA-3473:
---

 Summary: Add controller channel manager request queue time metric.
 Key: KAFKA-3473
 URL: https://issues.apache.org/jira/browse/KAFKA-3473
 Project: Kafka
  Issue Type: Improvement
  Components: controller
Affects Versions: 0.10.0.0
Reporter: Jiangjie Qin
Assignee: Neha Narkhede
 Fix For: 0.10.1.0


Currently controller appends the requests to brokers into controller channel 
manager queue during state transition. i.e. the state transition are propagated 
asynchronously. We need to track the request queue time on the controller side 
to see how long the state propagation is delayed after the state transition 
finished on the controller.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3436) Speed up controlled shutdown.

2016-03-27 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-3436:

Status: Patch Available  (was: In Progress)

> Speed up controlled shutdown.
> -
>
> Key: KAFKA-3436
> URL: https://issues.apache.org/jira/browse/KAFKA-3436
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> Currently rolling bounce a Kafka cluster with tens of thousands of partitions 
> can take very long (~2 min for each broker with ~5000 partitions/broker in 
> our environment). The majority of the time is spent on shutting down a 
> broker. The time of shutting down a broker usually  includes the following 
> parts:
> T1: During controlled shutdown, people usually want to make sure there is no 
> under replicated partitions. So shutting down a broker during a rolling 
> bounce will have to wait for the previous restarted broker to catch up. This 
> is T1.
> T2: The time to send controlled shutdown request and receive controlled 
> shutdown response. Currently the a controlled shutdown request will trigger 
> many LeaderAndIsrRequest and UpdateMetadataRequest. And also involving many 
> zookeeper update in serial.
> T3: The actual time to shutdown all the components. It is usually small 
> compared with T1 and T2.
> T1 is related to:
> A) the inbound throughput on the cluster, and 
> B) the "down" time of the broker (time between replica fetchers stop and 
> replica fetchers restart)
> The larger the traffic is, or the longer the broker stopped fetching, the 
> longer it will take for the broker to catch up and get back into ISR. 
> Therefore the longer T1 will be. Assume:
> * the in bound network traffic is X bytes/second on a broker
> * the time T1.B ("down" time) mentioned above is T
> Theoretically it will take (X * T) / (NetworkBandwidth - X) = 
> InBoundNetworkUtilization * T / (1 - InboundNetworkUtilization) for a the 
> broker to catch up after the restart. While X is out of our control, T is 
> largely related to T2.
> The purpose of this ticket is to reduce T2 by:
> 1. Batching the LeaderAndIsrRequest and UpdateMetadataRequest during 
> controlled shutdown.
> 2. Use async zookeeper write to pipeline zookeeper writes. According to 
> Zookeeper wiki(https://wiki.apache.org/hadoop/ZooKeeper/Performance), a 3 
> node ZK cluster should be able to handle 20K writes (1K size). So if we use 
> async write, likely we will be able to reduce zookeeper update time to lower 
> seconds or even sub-second level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3436) Speed up controlled shutdown.

2016-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213709#comment-15213709
 ] 

ASF GitHub Bot commented on KAFKA-3436:
---

GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/1149

KAFKA-3436: Speed up controlled shutdown.

This patch does the followings:
1. Batched LeaderAndIsrRequest and UpdateMetadataRequest during controlled 
shutdown.
2. Added async read and write method to an extending ZkClient. Used the 
async zk operation for LeaderAndIsr read and update. The async method can be 
used in other places as well (e.g. preferred leader election, replica 
reassignment, controller bootstrap, etc), but those are out of the scope of 
this ticket.

Conducted some rolling boucne test, a controlled shutdown involving 2500 
partitions takes around 3 seconds now. Previously it can takes more than 30 
seconds.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-3436

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1149.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1149


commit c2d22821c6c3ad7aa45090def6b984719209f5af
Author: Jiangjie Qin 
Date:   2016-03-27T21:29:30Z

KAFKA-3436: Speed up controlled shutdown

commit 7e7cf3fb1fc4a44d7af4ea935b38bf2e90e6cadd
Author: Jiangjie Qin 
Date:   2016-03-28T00:47:22Z

Remove pre-sent StopReplicaRequests and split state transition into 
multiple groups.




> Speed up controlled shutdown.
> -
>
> Key: KAFKA-3436
> URL: https://issues.apache.org/jira/browse/KAFKA-3436
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> Currently rolling bounce a Kafka cluster with tens of thousands of partitions 
> can take very long (~2 min for each broker with ~5000 partitions/broker in 
> our environment). The majority of the time is spent on shutting down a 
> broker. The time of shutting down a broker usually  includes the following 
> parts:
> T1: During controlled shutdown, people usually want to make sure there is no 
> under replicated partitions. So shutting down a broker during a rolling 
> bounce will have to wait for the previous restarted broker to catch up. This 
> is T1.
> T2: The time to send controlled shutdown request and receive controlled 
> shutdown response. Currently the a controlled shutdown request will trigger 
> many LeaderAndIsrRequest and UpdateMetadataRequest. And also involving many 
> zookeeper update in serial.
> T3: The actual time to shutdown all the components. It is usually small 
> compared with T1 and T2.
> T1 is related to:
> A) the inbound throughput on the cluster, and 
> B) the "down" time of the broker (time between replica fetchers stop and 
> replica fetchers restart)
> The larger the traffic is, or the longer the broker stopped fetching, the 
> longer it will take for the broker to catch up and get back into ISR. 
> Therefore the longer T1 will be. Assume:
> * the in bound network traffic is X bytes/second on a broker
> * the time T1.B ("down" time) mentioned above is T
> Theoretically it will take (X * T) / (NetworkBandwidth - X) = 
> InBoundNetworkUtilization * T / (1 - InboundNetworkUtilization) for a the 
> broker to catch up after the restart. While X is out of our control, T is 
> largely related to T2.
> The purpose of this ticket is to reduce T2 by:
> 1. Batching the LeaderAndIsrRequest and UpdateMetadataRequest during 
> controlled shutdown.
> 2. Use async zookeeper write to pipeline zookeeper writes. According to 
> Zookeeper wiki(https://wiki.apache.org/hadoop/ZooKeeper/Performance), a 3 
> node ZK cluster should be able to handle 20K writes (1K size). So if we use 
> async write, likely we will be able to reduce zookeeper update time to lower 
> seconds or even sub-second level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3436: Speed up controlled shutdown.

2016-03-27 Thread becketqin
GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/1149

KAFKA-3436: Speed up controlled shutdown.

This patch does the followings:
1. Batched LeaderAndIsrRequest and UpdateMetadataRequest during controlled 
shutdown.
2. Added async read and write method to an extending ZkClient. Used the 
async zk operation for LeaderAndIsr read and update. The async method can be 
used in other places as well (e.g. preferred leader election, replica 
reassignment, controller bootstrap, etc), but those are out of the scope of 
this ticket.

Conducted some rolling boucne test, a controlled shutdown involving 2500 
partitions takes around 3 seconds now. Previously it can takes more than 30 
seconds.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-3436

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1149.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1149


commit c2d22821c6c3ad7aa45090def6b984719209f5af
Author: Jiangjie Qin 
Date:   2016-03-27T21:29:30Z

KAFKA-3436: Speed up controlled shutdown

commit 7e7cf3fb1fc4a44d7af4ea935b38bf2e90e6cadd
Author: Jiangjie Qin 
Date:   2016-03-28T00:47:22Z

Remove pre-sent StopReplicaRequests and split state transition into 
multiple groups.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: Conform to POSIX kill usage

2016-03-27 Thread matthewlmcclure
GitHub user matthewlmcclure opened a pull request:

https://github.com/apache/kafka/pull/1148

Conform to POSIX kill usage

I believe this addresses KAFKA-3384.

The POSIX kill manpage is at 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/matthewlmcclure/kafka KAFKA-3384

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1148


commit 8662f0dde8d920cdf59d2e4a425a0b5eab976082
Author: Matt McClure 
Date:   2016-03-28T01:03:43Z

Conform to POSIX kill usage

The POSIX kill manpage is at 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html

commit 9251cd2057c5019fa9117502091a349d8c665877
Author: Matt McClure 
Date:   2016-03-28T01:05:40Z

Conform to POSIX kill usage

The POSIX kill manpage is at 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3384) bin scripts may not be portable/POSIX compliant

2016-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213700#comment-15213700
 ] 

ASF GitHub Bot commented on KAFKA-3384:
---

GitHub user matthewlmcclure opened a pull request:

https://github.com/apache/kafka/pull/1148

Conform to POSIX kill usage

I believe this addresses KAFKA-3384.

The POSIX kill manpage is at 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/matthewlmcclure/kafka KAFKA-3384

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1148


commit 8662f0dde8d920cdf59d2e4a425a0b5eab976082
Author: Matt McClure 
Date:   2016-03-28T01:03:43Z

Conform to POSIX kill usage

The POSIX kill manpage is at 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html

commit 9251cd2057c5019fa9117502091a349d8c665877
Author: Matt McClure 
Date:   2016-03-28T01:05:40Z

Conform to POSIX kill usage

The POSIX kill manpage is at 
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html




> bin scripts may not be portable/POSIX compliant
> ---
>
> Key: KAFKA-3384
> URL: https://issues.apache.org/jira/browse/KAFKA-3384
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.1.0
>
>
> We may be using some important tools in a non-POSIX compliant and 
> non-portable way. In particular, we've discovered that we can sometimes 
> trigger this error:
> /usr/bin/kafka-server-stop: line 22: kill: SIGTERM: invalid signal 
> specification
> which looks like it is caused by invoking a command like {{kill -SIGTERM 
> }}. (This is a lightly modified version of {{kafka-server-stop.sh}}, but 
> nothing of relevance has been affected.)
> Googling seems to suggest that passing the signal in that way is not 
> compliant -- it's a shell extensions. We're using {{/bin/sh}}, but that may 
> be aliased to other more liberal shells on some platforms. To be honest, I'm 
> not sure exactly the requirements for triggering this since running the 
> command directly on the same host via an interactive shell still works, but 
> we are definitely limiting portability using the current approach.
> There are a couple of possible solutions:
> 1. Standardize on bash. This lets us make more permissive wrt shell features 
> that we use. We're already using /bin/bash in the majority of scripts anyway. 
> Might help us avoid a bunch of assumptions people make when bash is aliased 
> to sh: https://wiki.ubuntu.com/DashAsBinSh
> 2. Try to clean up scripts as we discover incompatibilities. The immediate 
> fix for this issue seems to be to use {{kill -s TERM}} instead of {{kill 
> -SIGTERM}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3384) bin scripts may not be portable/POSIX compliant

2016-03-27 Thread Matt McClure (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213701#comment-15213701
 ] 

Matt McClure commented on KAFKA-3384:
-

I see one symptom of nonconformance to POSIX during {{vagrant provision}}.

{noformat}
...
==> broker3: Killing server
==> broker3: kill: 
==> broker3: invalid argument S
==> broker3: 
==> broker3: Usage:
==> broker3:  kill [options]  [...]
==> broker3: 
==> broker3: Options:
==> broker3:   [...]send signal to every  listed
==> broker3:  -, -s, --signal 
==> broker3: specify the  to be sent
==> broker3:  -l, --list=[]  list all signal names, or convert one to a 
name
==> broker3:  -L, --tablelist all signal names in a nice table
==> broker3:  -h, --help display this help and exit
==> broker3:  -V, --version  output version information and exit
==> broker3: 
==> broker3: For more details see kill(1).
==> broker3: Starting server
...
{noformat}

I opened a pull request to address this at 
https://github.com/apache/kafka/pull/1148

> bin scripts may not be portable/POSIX compliant
> ---
>
> Key: KAFKA-3384
> URL: https://issues.apache.org/jira/browse/KAFKA-3384
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
> Fix For: 0.10.1.0
>
>
> We may be using some important tools in a non-POSIX compliant and 
> non-portable way. In particular, we've discovered that we can sometimes 
> trigger this error:
> /usr/bin/kafka-server-stop: line 22: kill: SIGTERM: invalid signal 
> specification
> which looks like it is caused by invoking a command like {{kill -SIGTERM 
> }}. (This is a lightly modified version of {{kafka-server-stop.sh}}, but 
> nothing of relevance has been affected.)
> Googling seems to suggest that passing the signal in that way is not 
> compliant -- it's a shell extensions. We're using {{/bin/sh}}, but that may 
> be aliased to other more liberal shells on some platforms. To be honest, I'm 
> not sure exactly the requirements for triggering this since running the 
> command directly on the same host via an interactive shell still works, but 
> we are definitely limiting portability using the current approach.
> There are a couple of possible solutions:
> 1. Standardize on bash. This lets us make more permissive wrt shell features 
> that we use. We're already using /bin/bash in the majority of scripts anyway. 
> Might help us avoid a bunch of assumptions people make when bash is aliased 
> to sh: https://wiki.ubuntu.com/DashAsBinSh
> 2. Try to clean up scripts as we discover incompatibilities. The immediate 
> fix for this issue seems to be to use {{kill -s TERM}} instead of {{kill 
> -SIGTERM}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3320) Add successful acks verification to ProduceConsumeValidateTest

2016-03-27 Thread Anna Povzner (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213677#comment-15213677
 ] 

Anna Povzner commented on KAFKA-3320:
-

If you look at verifiable_producer.py, it collects all successfully produced 
messages into acked_values. If producer send() was unsuccessful, those messages 
are collected into not_acked_values. However, our tests do not check whether 
any produce send() got an error. Suppose the test tried to produce 100 
messages, and only 50 were successfully produced. If the consumer successfully 
consumed 50 messages, then the test is considered a success. It would be good 
to verify that we also did not get any produce errors for some tests.

> Add successful acks verification to ProduceConsumeValidateTest
> --
>
> Key: KAFKA-3320
> URL: https://issues.apache.org/jira/browse/KAFKA-3320
> Project: Kafka
>  Issue Type: Test
>Reporter: Anna Povzner
>
> Currently ProduceConsumeValidateTest only validates that each acked message 
> was consumed. Some tests may want an additional verification that all acks 
> were successful.
> This JIRA is to add an addition optional verification that all acks were 
> successful and use it in couple of tests that need that verification. Example 
> is compression test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-35: Retrieving protocol version

2016-03-27 Thread Dana Powers
Great questions.

But I wonder if we're expanding the scope of this KIP too much? The
questions you've raised relate to java client development, and broker
testing. Issues #2 and #3 are not related to this KIP at all: how does the
server test and validate api protocol changes and compatibility? That is a
fundamental question regardless whether clients can get version metadata or
not. It is the basis for client forwards compatibility (and broker
backwards compatibility).

While these are great questions that need good solutions, I dont think the
KIP was intended to solve them. Rather, it is aimed at kafka clients that
are attempting to be backwards compatible, namely librdkafka and
kafka-python. It also seems that the Java client dev team doesn't have
backwards compatible clients as high on the priority list as we do. That's
ok! But lets not let that delay or prevent the *server* dev team from
adding support for this simple API to help other client teams.

Recall that you were both (Jay and Gwen) in favor of this approach a year
ago:
http://mail-archives.apache.org/mod_mbox/kafka-dev/201501.mbox/%3ccaoejijgnidpnvr-tpcpkzqfkuwdj+rtymvgkkkdwmkczfqm...@mail.gmail.com%3E

KIP-35 is exactly option #2 from that thread. This doesn't seem like a
controversial API at all.

It's a bit frustrating that something this simple, and which is seemingly
unopposed, is taking so long to get approval. If there's anything I can do
to help facilitate, please let me know.

-Dana
We (Jay + me) had some extra information we wanted to see in the KIP before
we are comfortable voting:

* Where does the Java client fits in. Hopefully we can use this KIP to
standardize behavior and guarantees between Java and non-Java clients, so
when we reason about the Java clients, which most Kafka developers are
familiar with, we will make the right decisions for all clients.
* When do we bump the protocol? I think 90% of the issue is not that the
version got bumped but rather that we changed behavior without bumping
versions. For the new VersionRequest to be useful, we need to all know when
to get new versions...
* How do we test / validate - I think our recent experience shows that our
protocol tests and compatibility tests are still inadequate. Having
VersionRequest is useless if we can't validate that Kafka actually
implements the protocol it says it does (and we caught this breaks twice in
the last two weeks)
* Error handling of protocol mismatches

Ashish kindly agreed to think about this and improve the KIP.
We'll resume the vote as soon as he's back :)

Gwen


On Wed, Mar 23, 2016 at 5:55 PM, Dana Powers  wrote:

> speaking of pending KIPs, what's the status on this one?
>
>
> On Fri, Mar 18, 2016 at 9:47 PM, Ashish Singh  wrote:
>
> > Hey Jay,
> >
> > Answers inline.
> >
> > On Fri, Mar 18, 2016 at 10:45 AM, Jay Kreps  wrote:
> >
> > Hey Ashish,
> > >
> > > Couple quick things:
> > >
> > > 1. You list as a rejected alternative "making the documentation the
> > > source of truth for the protocol", but I think what you actually
> > > describe in that section is global versioning, which of those two
> > > things are we voting to reject? I think this is a philosophical point
> > > but an important one...
> > >
> > One of the major differences between Option 3 and other options
discussed
> > on KIP is that Option 3 is documentation oriented and it is that what I
> > wanted to capture in the title. I am happy to change it to global
> > versioning.
> >
> >
> > > 2. Can you describe the changes necessary and classes we'd have to
> > > update in the java clients to make use of this feature? What would
> > > that look like? One concern I have is just the complexity necessary to
> > > do the per-connection protocol version check and really handle all the
> > > cases. I assume you've thought through what that looks like, can you
> > > sketch that out for people?
> > >
> > I would imagine any client, even Java client, would follow the steps
> > mentioned here
> > <
> >
>
https://cwiki.apache.org/confluence/display/KAFKA/KIP-35+-+Retrieving+protocol+version#KIP-35-Retrievingprotocolversion-Aclientdeveloperwantstoaddsupportforanewfeature.1
> > >.
> > Below are my thoughts on how java client can maintain api versions
> > supported by various brokers in cluster.
> >
> >1. ClusterConnectionStates can provide info on whether api versions
> have
> >been retrieved for a connection or not.
> >2. NetworkClient.handleConnections can send ApiVersionQueryRequest to
> >newly connected nodes.
> >3. NetworkClient can be enhanced to handle ApiVersionQueryResponse
and
> >set ClusterConnectionStates to indicate api versions have been
> retrieved
> >for the node.
> >4. NetworkClient maintains mapping Node -> [(api_key, min_ver,
> >max_ver)], brokerApiVersions, cached.
> >5. NetworkClient.processDisconnection can remove entry for a node
from
> >brokerApiVersions 

[jira] [Commented] (KAFKA-3472) Allow MirrorMaker to copy selected partitions and choose target topic name

2016-03-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15213355#comment-15213355
 ] 

ASF GitHub Bot commented on KAFKA-3472:
---

GitHub user hsun-cnnxty opened a pull request:

https://github.com/apache/kafka/pull/1147

[KAFKA-3472] Allow MirrorMaker to copy selected partitions and choose 
target topic name

Please see the jira issue for detail: 
https://issues.apache.org/jira/browse/KAFKA-3472

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsun-cnnxty/kafka k3472

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1147


commit 9ada717f805d6c4e22c3d6567bafe1c3940c7d06
Author: Hang Sun 
Date:   2016-03-27T06:01:57Z

KAFKA-3472: allow MirrorMaker to copy selected partitions and choose target 
topic




> Allow MirrorMaker to copy selected partitions and choose target topic name
> --
>
> Key: KAFKA-3472
> URL: https://issues.apache.org/jira/browse/KAFKA-3472
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 0.9.0.1
>Reporter: Hang Sun
>Priority: Minor
>  Labels: mirror-maker
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> It would be nice if MirrorMaker can be used to copy only a few partitions 
> instead of all to a different topic.  My use case is to sample a small 
> portion of production traffic in the pre-production environment for testing.  
> The pre-production environment is usually smaller and cannot handle the full 
> load from production.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: [KAFKA-3472] Allow MirrorMaker to copy selecte...

2016-03-27 Thread hsun-cnnxty
GitHub user hsun-cnnxty opened a pull request:

https://github.com/apache/kafka/pull/1147

[KAFKA-3472] Allow MirrorMaker to copy selected partitions and choose 
target topic name

Please see the jira issue for detail: 
https://issues.apache.org/jira/browse/KAFKA-3472

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hsun-cnnxty/kafka k3472

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1147.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1147


commit 9ada717f805d6c4e22c3d6567bafe1c3940c7d06
Author: Hang Sun 
Date:   2016-03-27T06:01:57Z

KAFKA-3472: allow MirrorMaker to copy selected partitions and choose target 
topic




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---