date:20161207

[jira] [Commented] (KAFKA-4477) Node reduces its ISR to itself, and doesn't recover. Other nodes do not take leadership, cluster remains sick until node is restarted.

2016-12-07 Thread Jan Omar (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731461#comment-15731461
 ] 

Jan Omar commented on KAFKA-4477:
-

We're seeing the same issue on FreeBSD 10.3. Also Kafka 0.10.1.0. Exact same 
Stacktraces as described by Michael. We've already seen this happening 2 or 3 
times.

Regards

Jan

> Node reduces its ISR to itself, and doesn't recover. Other nodes do not take 
> leadership, cluster remains sick until node is restarted.
> --
>
> Key: KAFKA-4477
> URL: https://issues.apache.org/jira/browse/KAFKA-4477
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.1.0
> Environment: RHEL7
> java version "1.8.0_66"
> Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
>Reporter: Michael Andre Pearce (IG)
>Priority: Critical
>  Labels: reliability
> Attachments: kafka.jstack
>
>
> We have encountered a critical issue that has re-occured in different 
> physical environments. We haven't worked out what is going on. We do though 
> have a nasty work around to keep service alive. 
> We do have not had this issue on clusters still running 0.9.01.
> We have noticed a node randomly shrinking for the partitions it owns the 
> ISR's down to itself, moments later we see other nodes having disconnects, 
> followed by finally app issues, where producing to these partitions is 
> blocked.
> It seems only by restarting the kafka instance java process resolves the 
> issues.
> We have had this occur multiple times and from all network and machine 
> monitoring the machine never left the network, or had any other glitches.
> Below are seen logs from the issue.
> Node 7:
> [2016-12-01 07:01:28,112] INFO Partition 
> [com_ig_trade_v1_position_event--demo--compacted,10] on broker 7: Shrinking 
> ISR for partition [com_ig_trade_v1_position_event--demo--compacted,10] from 
> 1,2,7 to 7 (kafka.cluster.Partition)
> All other nodes:
> [2016-12-01 07:01:38,172] WARN [ReplicaFetcherThread-0-7], Error in fetch 
> kafka.server.ReplicaFetcherThread$FetchRequest@5aae6d42 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 7 was disconnected before the response was 
> read
> All clients:
> java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.NetworkException: The server disconnected 
> before a response was received.
> After this occurs, we then suddenly see on the sick machine an increasing 
> amount of close_waits and file descriptors.
> As a work around to keep service we are currently putting in an automated 
> process that tails and regex's for: and where new_partitions hit just itself 
> we restart the node. 
> "\[(?P.+)\] INFO Partition \[.*\] on broker .* Shrinking ISR for 
> partition \[.*\] from (?P.+) to (?P.+) 
> \(kafka.cluster.Partition\)"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] 0.10.1.1 RC0

2016-12-07 Thread Ismael Juma

Thanks Guozhang. The Maven staging repo link doesn't seem to have 0.10.1.1.
Can you please double-check?

Ismael

On 7 Dec 2016 10:47 pm, "Guozhang Wang"  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 0.10.1.1. This
> is
> a bug fix release and it includes fixes and improvements from 27 JIRAs. See
> the release notes for more details:
>
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, 13 December, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 8b77507083fdd427ce81021228e7e346da0d814c
>
>
> Thanks,
> Guozhang
>

[jira] [Updated] (KAFKA-4485) Follower should be in the isr if its FetchRequest has fetched up to the logEndOffset of leader

2016-12-07 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-4485:

Description: 
As of current implementation, we will exclude follower from ISR if the begin 
offset of FetchRequest from this follower is smaller than logEndOffset of 
leader for more than replicaLagTimeMaxMs. Also, we will add a follower to ISR 
if the beginOffset of FetchRequest from this follower is equal or larger than 
high watermark of this partition.

This is problematic for the following reasons:

1) The criteria for ISR is inconsistent between maybeExpandIsr() and 
maybeShrinkIsr(). Therefore a follower may be repeatedly remove and added to 
the ISR (e.g. in the scenario described below).

2) A follower may be removed from the ISR even if its fetch rate can keep up 
with produce rate. Suppose a produce keeps producing a lot of small requests at 
high request rate but low byte rate (e.g. many mirror makers), and the follower 
is always able to read all the available data at the time leader receives it. 
However, the begin offset of fetch request will always be smaller than 
logEndOffset of leader. Thus the follower will be removed from ISR after 
replicaLagTimeMaxMs.

In the following we describe the solution to this problem.

Terminology:
- Definition of replica lag: we say a replica lags behind leader by X ms if its 
current log end offset if equivalent to the log end offset of leader X ms ago.
- Definition of pseudo-ISR set: pseudo-ISR set of a partition = { replica | 
replica belongs to the given partition AND replica's lag <= replicaLagTimeMaxMs}
- Definition of high-watermark of a partition: high-watermark of a partition is 
the max(current high-watermark of the partition, min(offset of replicas in the 
pseudo-ISR set of this partition))
- Definition of ISR set: ISR set of a partition = {replica | replica is in 
pseudo-ISR set of the given partition AND log end offset of replica >= 
high-watermark of the given partition}

Guarantee:
1) If a follower is close enough to the replica in the sense that its replica 
lag <= replicaLagTimeMaxMs, then this follower will be in the pseudo-ISR set. 
Thus the high-watermark will stop to increase until this follower's log end 
offset >= high-watermark, at which moment this follower will be added to the 
ISR set. This allows us the solve the 2nd problem described above.
2) If a follower lags behind leader for more than X ms, it will be removed out 
of ISR set.
3) High watermark of a partition will never decrease.
4) For any replica in ISR set, its log end offset >= high-watermark. 

Implementation:
1) For each follower, the leader keeps track of the time of the last fetch 
request from this follower. Let's call it lastFetchTime. In addition, the 
leader also maintains the log end offset of the leader at the lastFetchTime for 
each follower. Let's call it lastFetchLeaderLEO. Both variables will be updated 
after leader has processed a FetchRequest from a follower.
2) When leader receives FetchRequest from a follower, if begin offset of the 
FetchRequest >= current leader's LEO, follower's lastCatchUpTimeMs will be set 
to current system time. Otherwise, if begin offset of the FetchRequest >= 
lastFetchLeaderLEO, follower's lastCatchUpTimeMs will be set to lastFetchTime. 
Replica's lag = current system time - lastCatchUpTimeMs.
3) The leader can update pseudo-ISR set, high-watermark and ISR set of the 
partition based on the lag of replicas of this partition, according to the 
definition described above.





  was:
As of current implementation, we will exclude follower from ISR if the begin 
offset of FetchRequest from this follower is smaller than logEndOffset of 
leader for more than replicaLagTimeMaxMs. Also, we will add a follower to ISR 
if the beginOffset of FetchRequest from this follower is equal or larger than 
high watermark of this partition.

This is problematic for the following reasons:

1) The criteria for ISR is inconsistent between maybeExpandIsr() and 
maybeShrinkIsr(). Therefore a follower may be repeatedly remove and added to 
the ISR (e.g. in the scenario described below).

2) A follower may be removed from the ISR even if its fetch rate can keep up 
with produce rate. Suppose a produce keeps producing a lot of small requests at 
high request rate but low byte rate (e.g. many mirror makers), and the follower 
is always able to read all the available data at the time leader receives it. 
However, the begin offset of fetch request will always be smaller than 
logEndOffset of leader. Thus the follower will be removed from ISR after 
replicaLagTimeMaxMs.

In the following we describe the solution to this problem.

Terminology:
- Definition of replica lag: we say a replica lags behind leader by X ms if its 
current log end offset if equivalent to the log end offset of leader X ms ago.
- Definition of pseudo-ISR set: pseudo-ISR set of a partition = {

[jira] [Updated] (KAFKA-4485) Follower should be in the isr if its FetchRequest has fetched up to the logEndOffset of leader

2016-12-07 Thread Dong Lin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin updated KAFKA-4485:

Description: 
As of current implementation, we will exclude follower from ISR if the begin 
offset of FetchRequest from this follower is smaller than logEndOffset of 
leader for more than replicaLagTimeMaxMs. Also, we will add a follower to ISR 
if the beginOffset of FetchRequest from this follower is equal or larger than 
high watermark of this partition.

This is problematic for the following reasons:

1) The criteria for ISR is inconsistent between maybeExpandIsr() and 
maybeShrinkIsr(). Therefore a follower may be repeatedly remove and added to 
the ISR (e.g. in the scenario described below).

2) A follower may be removed from the ISR even if its fetch rate can keep up 
with produce rate. Suppose a produce keeps producing a lot of small requests at 
high request rate but low byte rate (e.g. many mirror makers), and the follower 
is always able to read all the available data at the time leader receives it. 
However, the begin offset of fetch request will always be smaller than 
logEndOffset of leader. Thus the follower will be removed from ISR after 
replicaLagTimeMaxMs.

In the following we describe the solution to this problem.

Terminology:
- Definition of replica lag: we say a replica lags behind leader by X ms if its 
current log end offset if equivalent to the log end offset of leader X ms ago.
- Definition of pseudo-ISR set: pseudo-ISR set of a partition = { replica | 
replica belongs to the given partition AND replica's lag <= replicaLagTimeMaxMs}
- Definition of high-watermark of a partition: high-watermark of a partition is 
the max(current high-watermark of the partition, min(offset of replicas in the 
pseudo-ISR set of this partition))
- Definition of ISR set: ISR set of a partition = {replica | replica is in 
pseudo-ISR set of the given partition AND log end offset of replica >= 
high-watermark of the given partition}

Guarantee:
1) If a follower is close enough to the replica in the sense that its replica 
lag <= replicaLagTimeMaxMs, then this follower will be in the pseudo-ISR set. 
Thus the high-watermark will stop to increase until this follower's log end 
offset >= high-watermark, at which moment this follower will be added to the 
ISR set. This allows us the solve the 2nd problem described above.
2) If a follower lags behind leader for more than X ms, it will be removed out 
of ISR set.
3) High watermark of a partition will never decrease.
4) For any replica in ISR set, its log end offset >= high-watermark. 

Implementation:
1) For each follower, the leader keeps track of the time of the last fetch 
request from this follower. Let's call it lastFetchTime. In addition, the 
leader also maintains the log end offset of the leader at the lastFetchTime for 
each follower. Let's call it lastFetchLeaderLEO. Both variables will be updated 
after leader has processed a FetchRequest from a follower.
2) When leader receives FetchRequest from a follower, it will set the 
follower's lag to lastFetchTime if begin offset of the FetchRequest >= 
lastFetchLeaderLEO.
3) The leader can update pseudo-ISR set, high-watermark and ISR set of the 
partition based on the lag of replicas of this partition, according to the 
definition described above.





  was:
As of current implementation, we will exclude follower from ISR if the begin 
offset of FetchRequest from this follower is always smaller than logEndOffset 
of leader for more than replicaLagTimeMaxMs.

Also, we will add a follower to ISR if the beginOffset of FetchRequest from 
this follower is equal or larger than high watermark of this partition.

This is problematic for the following reasons:

1) The criteria for ISR is inconsistent between maybeExpandIsr() and 
maybeShrinkIsr(). A follower may be repeatedly remove and added to the ISR 
(e.g. in the scenario described below).

2) A follower may be removed from the ISR even if its fetch rate can keep up 
with produce rate. Suppose a produce keeps producing a lot of small requests at 
high request rate but low byte rate (e.g. many mirror makers), and the follower 
is always able to read all the available data at the time leader receives it. 
However, the begin offset of fetch request will always be smaller than 
logEndOffset of leader. Thus the follower will be removed from ISR after 
replicaLagTimeMaxMs.

The solution to the problem is the following:

A follower should be in ISR if begin offset of its FetchRequest >= max(high 
watermark of partition, log end offset of leader at the time the leader 
receives the previous FetchRequest). The follower should be removed from ISR if 
this criteria is not met for more than replicaLagTimeMaxMs. Note that we are 
comparing begin offset of FetchRequest with log end offset of leader at the 
time the leader receives the previous FetchRequest as an approximate way to

[jira] [Updated] (KAFKA-4500) Kafka Code Improvements

2016-12-07 Thread Rekha Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated KAFKA-4500:
---
Summary: Kafka Code Improvements  (was: Code Improvements)

> Kafka Code Improvements
> ---
>
> Key: KAFKA-4500
> URL: https://issues.apache.org/jira/browse/KAFKA-4500
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.2
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>Priority: Minor
>
> Code Corrections on clients module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-4500) Code Improvements

2016-12-07 Thread Rekha Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated KAFKA-4500:
---
Priority: Minor  (was: Trivial)

> Code Improvements
> -
>
> Key: KAFKA-4500
> URL: https://issues.apache.org/jira/browse/KAFKA-4500
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.2
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>Priority: Minor
>
> Code Corrections on clients module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4447) Controller resigned but it also acts as a controller for a long time

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731033#comment-15731033
 ] 

ASF GitHub Bot commented on KAFKA-4447:
---

GitHub user xiguantiaozhan reopened a pull request:

https://github.com/apache/kafka/pull/2191

KAFKA-4447: Controller resigned but it also acts as a controller for a long 
time



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xiguantiaozhan/kafka avoid_swamp_controllerLog

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2191


commit 4cd2b6bd83558e536b2f330deec177c3dd28e8ff
Author: tuyang 
Date:   2016-12-08T03:49:11Z

fix Controller resigned but it also acts as a controller for a long time bug




> Controller resigned but it also acts as a controller for a long time 
> -
>
> Key: KAFKA-4447
> URL: https://issues.apache.org/jira/browse/KAFKA-4447
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Environment: Linux Os
>Reporter: Json Tu
>  Labels: reliability
> Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes，and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in 
> the cluster, and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read 
> all partition reassign rules from the zk path, and executed all 
> onPartitionReassignment for all partition that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes 
> also expired from zk.
> 5.then controller invoke onControllerResignation to resigned as the 
> controller.
> we found after the controller is resigned, it acts as controller for about 3 
> minutes, which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2191: KAFKA-4447: Controller resigned but it also acts a...

2016-12-07 Thread xiguantiaozhan

GitHub user xiguantiaozhan reopened a pull request:

https://github.com/apache/kafka/pull/2191

KAFKA-4447: Controller resigned but it also acts as a controller for a long 
time



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xiguantiaozhan/kafka avoid_swamp_controllerLog

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2191


commit 4cd2b6bd83558e536b2f330deec177c3dd28e8ff
Author: tuyang 
Date:   2016-12-08T03:49:11Z

fix Controller resigned but it also acts as a controller for a long time bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (KAFKA-4500) Code Improvements

2016-12-07 Thread Rekha Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated KAFKA-4500:
---
Summary: Code Improvements  (was: Code Corrections)

> Code Improvements
> -
>
> Key: KAFKA-4500
> URL: https://issues.apache.org/jira/browse/KAFKA-4500
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Affects Versions: 0.9.0.2
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>Priority: Trivial
>
> Code Corrections on clients module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3796) SslTransportLayerTest.testInvalidEndpointIdentification fails on trunk

2016-12-07 Thread Rekha Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rekha Joshi updated KAFKA-3796:
---
  Priority: Trivial  (was: Major)
Issue Type: Improvement  (was: Bug)

> SslTransportLayerTest.testInvalidEndpointIdentification fails on trunk
> --
>
> Key: KAFKA-3796
> URL: https://issues.apache.org/jira/browse/KAFKA-3796
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients, security
>Affects Versions: 0.10.0.0
>Reporter: Rekha Joshi
>Assignee: Rekha Joshi
>Priority: Trivial
>
> org.apache.kafka.common.network.SslTransportLayerTest > 
> testEndpointIdentificationDisabled FAILED
> java.net.BindException: Can't assign requested address
> at sun.nio.ch.Net.bind0(Native Method)
> at sun.nio.ch.Net.bind(Net.java:433)
> at sun.nio.ch.Net.bind(Net.java:425)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
> at 
> org.apache.kafka.common.network.NioEchoServer.(NioEchoServer.java:48)
> at 
> org.apache.kafka.common.network.SslTransportLayerTest.testEndpointIdentificationDisabled(SslTransportLayerTest.java:120)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4447) Controller resigned but it also acts as a controller for a long time

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730981#comment-15730981
 ] 

ASF GitHub Bot commented on KAFKA-4447:
---

Github user xiguantiaozhan closed the pull request at:

https://github.com/apache/kafka/pull/2191


> Controller resigned but it also acts as a controller for a long time 
> -
>
> Key: KAFKA-4447
> URL: https://issues.apache.org/jira/browse/KAFKA-4447
> Project: Kafka
>  Issue Type: Improvement
>  Components: controller
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1
> Environment: Linux Os
>Reporter: Json Tu
>  Labels: reliability
> Attachments: log.tar.gz
>
>
> We have a cluster with 10 nodes，and we execute following operation as below.
> 1.we execute some topic partition reassign from one node to other 9 nodes in 
> the cluster, and which triggered controller.
> 2.controller invoke PartitionsReassignedListener's handleDataChange and read 
> all partition reassign rules from the zk path, and executed all 
> onPartitionReassignment for all partition that match conditions.
> 3.but the controller is expired from zk, after what some nodes of 9 nodes 
> also expired from zk.
> 5.then controller invoke onControllerResignation to resigned as the 
> controller.
> we found after the controller is resigned, it acts as controller for about 3 
> minutes, which can be found in my attachment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2191: KAFKA-4447: Controller resigned but it also acts a...

2016-12-07 Thread xiguantiaozhan

Github user xiguantiaozhan closed the pull request at:

https://github.com/apache/kafka/pull/2191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-4505) Cannot get topic lag since kafka upgrade from 0.8.1.0 to 0.10.1.0

2016-12-07 Thread huxi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730958#comment-15730958
 ] 

huxi commented on KAFKA-4505:
-

Is the group active? Check if znode '/consumers//owners' exists or has 
any children znodes.

> Cannot get topic lag since kafka upgrade from 0.8.1.0 to 0.10.1.0
> -
>
> Key: KAFKA-4505
> URL: https://issues.apache.org/jira/browse/KAFKA-4505
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, metrics, offset manager
>Affects Versions: 0.10.1.0
>Reporter: Romaric Parmentier
>Priority: Critical
>
> We were using kafka 0.8.1.1 and we just migrate to version 0.10.1.0.
> Since we migrate we are using the new script kafka-consumer-groups.sh to 
> retreive topic lags but it don't seem to work anymore. 
> Because the application is using the 0.8 driver we have added the following 
> conf to each kafka servers:
> log.message.format.version=0.8.2
> inter.broker.protocol.version=0.10.0.0
> When I'm using the option --list with kafka-consumer-groups.sh I can see 
> every consumer groups I'm using but the --describe is not working:
> /usr/share/kafka$ bin/kafka-consumer-groups.sh --zookeeper ip:2181 --describe 
> --group group_name
> No topic available for consumer group provided
> GROUP  TOPIC  PARTITION  
> CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
> When I'm looking into zookeeper I can see the offset increasing for this 
> consumer group.
> Any idea ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4468) Correctly calculate the window end timestamp after read from state stores

2016-12-07 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730787#comment-15730787
 ] 

Bill Bejeck commented on KAFKA-4468:


Unless I'm missing something, this task implies we'll need to include the 
window_size (and forgo the 8 bytes per key storage savings) on serialization 
with {{WindowedSerializer}}.   As after we've read it via the 
{{WindowedDeserializer}} we only have the key and the start timestamp and don't 
have access to the original window_size to do the calculation.  

Is this correct? 
Thanks!

> Correctly calculate the window end timestamp after read from state stores
> -
>
> Key: KAFKA-4468
> URL: https://issues.apache.org/jira/browse/KAFKA-4468
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: architecture
>
> When storing the WindowedStore on the persistent KV store, we only use the 
> start timestamp of the window as part of the combo-key as (start-timestamp, 
> key). The reason that we do not add the end-timestamp as well is that we can 
> always calculate it from the start timestamp + window_length, and hence we 
> can save 8 bytes per key on the persistent KV store.
> However, after read it (via {{WindowedDeserializer}}) we do not set its end 
> timestamp correctly but just read it as an {{UnlimitedWindow}}. We should fix 
> this by calculating its end timestamp as mentioned above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE]: KIP-97: The client compatibility KIP

2016-12-07 Thread Gwen Shapira

+1 (binding)

On Wed, Dec 7, 2016 at 9:17 AM, Colin McCabe  wrote:
> Hi all,
>
> I heard that the VOTE and DISCUSS threads for the KIP-97 discussion
> appeared to be in the same email thread for some people using gmail.  So
> I'm reposting in hopes of getting a separate email thread this time for
> those viewers. :)
>
> I'd like to start voting on KIP-97
> (https://cwiki.apache.org/confluence/display/KAFKA/KIP-97%3A+Improved+Kafka+Client+RPC+Compatibility+Policy
> ).
>
> The discussion thread can be found here:
> https://www.mail-archive.com/dev@kafka.apache.org/msg60955.html
>
> Thanks for your feedback.
>
> best,
> Colin McCabe



-- 
Gwen Shapira
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog

[jira] [Updated] (KAFKA-4494) Significant startup delays in KStreams app

2016-12-07 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4494:
-
Component/s: streams

> Significant startup delays in KStreams app
> --
>
> Key: KAFKA-4494
> URL: https://issues.apache.org/jira/browse/KAFKA-4494
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: AWS Linux ami, mac os
>Reporter: j yeargers
>  Labels: performance
>
> Often starting a KStreams based app results in significant (5-10 minutes) 
> delay before processing of stream begins. 
> Sample debug output: 
> https://gist.github.com/jyeargers/e8398fb353d67397f99148bc970479ee
> Topology in question: stream -> map -> groupbykey.aggregate -> print
> Stream is JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-4494) Significant startup delays in KStreams app

2016-12-07 Thread Guozhang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-4494:
-
Labels: performance  (was: )

> Significant startup delays in KStreams app
> --
>
> Key: KAFKA-4494
> URL: https://issues.apache.org/jira/browse/KAFKA-4494
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: AWS Linux ami, mac os
>Reporter: j yeargers
>  Labels: performance
>
> Often starting a KStreams based app results in significant (5-10 minutes) 
> delay before processing of stream begins. 
> Sample debug output: 
> https://gist.github.com/jyeargers/e8398fb353d67397f99148bc970479ee
> Topology in question: stream -> map -> groupbykey.aggregate -> print
> Stream is JSON.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Build failed in Jenkins: kafka-trunk-jdk8 #1087

2016-12-07 Thread Apache Jenkins Server

See 

Changes:

[wangguoz] KAFKA-4492: Make the streams cache eviction policy tolerable to

--
[...truncated 14480 lines...]

org.apache.kafka.streams.KafkaStreamsTest > shouldNotGetAllTasksWhenNotRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > shouldNotGetAllTasksWhenNotRunning 
PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndPartitionerWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndPartitionerWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndSerializerWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetTaskWithKeyAndSerializerWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldReturnFalseOnCloseWhenThreadsHaventTerminated STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldReturnFalseOnCloseWhenThreadsHaventTerminated PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldNotGetAllTasksWithStoreWhenNotRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartOnceClosed PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStartAndClose PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[0] PASSED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] STARTED

org.apache.kafka.streams.integration.KStreamKTableJoinIntegrationTest > 
shouldCountClicksPerRegion[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[0] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldBeAbleToQueryState[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
shouldNotMakeStoreAvailableUntilAllStoresAvailable[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
queryOnRebalance[1] PASSED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] STARTED

org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
concurrentAccesses[1] PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testShouldReadFromRegexAndNamedTopics PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testRegexMatchesTopicsAWhenCreated PASSED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
testMultipleConsumersCanReadFromPartitionedTopic STARTED

org.apache.kafka.streams.integration.RegexSourceIntegrationTest >

[jira] [Updated] (KAFKA-4510) StreamThread must finish rebalance in state PENDING_SHUTDOWN

2016-12-07 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4510:
---
Status: Patch Available  (was: In Progress)

> StreamThread must finish rebalance in state PENDING_SHUTDOWN
> 
>
> Key: KAFKA-4510
> URL: https://issues.apache.org/jira/browse/KAFKA-4510
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>
> If a Streams application runs with multiple threads within one JVM and the 
> application gets stopped. this can triggers a rebalance when the first 
> threads finishes processing because not all thread shut down at the same time.
> Because we try to reuse tasks, on rebalance task are not closed immediately 
> in order for a potential reuse (if a task gets assigned the its original 
> thread). However, if a thread is in state {{PENDING_SHUTDOWN}} it does finish 
> the rebalance operation completely and thus does not release the suspended 
> task and the application exits with not all locks released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4454) Authorizer should also include the Principal generated by the PrincipalBuilder.

2016-12-07 Thread Mayuresh Gharat (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730371#comment-15730371
 ] 

Mayuresh Gharat commented on KAFKA-4454:


[~junrao], thanks a lot for the review.
The thought of adding an equality check for channelPrincipal(if it exist) did 
cross my mind, but I left it out purposely. The reason was, I thought that 
Kafka internally, mainly cares about the principal name and principal type and 
the principal name actually comes from the channelPrincipal. But now that I 
think more about it, the channelPrincipal might be different custom types like 
servicePrincipal or userPrincipal and so on, with same name and so it does make 
sense to add proper equality check there.

I agree that there might be a better way of doing this. I will write up a KIP 
for this and submit for review soon. 

> Authorizer should also include the Principal generated by the 
> PrincipalBuilder.
> ---
>
> Key: KAFKA-4454
> URL: https://issues.apache.org/jira/browse/KAFKA-4454
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.1
>Reporter: Mayuresh Gharat
>Assignee: Mayuresh Gharat
> Fix For: 0.10.2.0
>
>
> Currently kafka allows users to plugin a custom PrincipalBuilder and a custom 
> Authorizer.
> The Authorizer.authorize() object takes in a Session object that wraps 
> KafkaPrincipal and InetAddress.
> The KafkaPrincipal currently has a PrincipalType and Principal name, which is 
> the name of Principal generated by the PrincipalBuilder. 
> This Principal, generated by the pluggedin PrincipalBuilder might have other 
> fields that might be required by the pluggedin Authorizer but currently we 
> loose this information since we only extract the name of Principal while 
> creating KaflkaPrincipal in SocketServer.  
> It would be great if KafkaPrincipal has an additional field 
> "channelPrincipal" which is used to store the Principal generated by the 
> plugged in PrincipalBuilder.
> The pluggedin Authorizer can then use this "channelPrincipal" to do 
> authorization.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4510) StreamThread must finish rebalance in state PENDING_SHUTDOWN

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15730369#comment-15730369
 ] 

ASF GitHub Bot commented on KAFKA-4510:
---

GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2227

KAFKA-4510: StreamThread must finish rebalance in state PENDING_SHUTDOWN



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka 
kafka-4510-finish-rebalance-on-shutdown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2227


commit 1568713abe7bd95495f673d507f96812484efcc8
Author: Matthias J. Sax 
Date:   2016-12-07T23:55:36Z

KAFKA-4510: StreamThread must finish rebalance in state PENDING_SHUTDOWN




> StreamThread must finish rebalance in state PENDING_SHUTDOWN
> 
>
> Key: KAFKA-4510
> URL: https://issues.apache.org/jira/browse/KAFKA-4510
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>
> If a Streams application runs with multiple threads within one JVM and the 
> application gets stopped. this can triggers a rebalance when the first 
> threads finishes processing because not all thread shut down at the same time.
> Because we try to reuse tasks, on rebalance task are not closed immediately 
> in order for a potential reuse (if a task gets assigned the its original 
> thread). However, if a thread is in state {{PENDING_SHUTDOWN}} it does finish 
> the rebalance operation completely and thus does not release the suspended 
> task and the application exits with not all locks released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2227: KAFKA-4510: StreamThread must finish rebalance in ...

2016-12-07 Thread mjsax

GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2227

KAFKA-4510: StreamThread must finish rebalance in state PENDING_SHUTDOWN



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka 
kafka-4510-finish-rebalance-on-shutdown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2227


commit 1568713abe7bd95495f673d507f96812484efcc8
Author: Matthias J. Sax 
Date:   2016-12-07T23:55:36Z

KAFKA-4510: StreamThread must finish rebalance in state PENDING_SHUTDOWN




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Work started] (KAFKA-4510) StreamThread must finish rebalance in state PENDING_SHUTDOWN

2016-12-07 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4510 started by Matthias J. Sax.
--
> StreamThread must finish rebalance in state PENDING_SHUTDOWN
> 
>
> Key: KAFKA-4510
> URL: https://issues.apache.org/jira/browse/KAFKA-4510
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>
> If a Streams application runs with multiple threads within one JVM and the 
> application gets stopped. this can triggers a rebalance when the first 
> threads finishes processing because not all thread shut down at the same time.
> Because we try to reuse tasks, on rebalance task are not closed immediately 
> in order for a potential reuse (if a task gets assigned the its original 
> thread). However, if a thread is in state {{PENDING_SHUTDOWN}} it does finish 
> the rebalance operation completely and thus does not release the suspended 
> task and the application exits with not all locks released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-4510) StreamThread must finish rebalance in state PENDING_SHUTDOWN

2016-12-07 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created KAFKA-4510:
--

 Summary: StreamThread must finish rebalance in state 
PENDING_SHUTDOWN
 Key: KAFKA-4510
 URL: https://issues.apache.org/jira/browse/KAFKA-4510
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax


If a Streams application runs with multiple threads within one JVM and the 
application gets stopped. this can triggers a rebalance when the first threads 
finishes processing because not all thread shut down at the same time.

Because we try to reuse tasks, on rebalance task are not closed immediately in 
order for a potential reuse (if a task gets assigned the its original thread). 
However, if a thread is in state {{PENDING_SHUTDOWN}} it does finish the 
rebalance operation completely and thus does not release the suspended task and 
the application exits with not all locks released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] 0.10.1.1 RC0

2016-12-07 Thread Guozhang Wang

One notification is that in this bug-fix release we include artifacts built
from Scala 2.12.1 as well, as a pre-alpha product for the Scala community
to try and test it out (it is built with Java8 while all other artifacts
are built with Java7). We hope to formally add the scala 2.12 support in
future minor releases.


Guozhang

On Wed, Dec 7, 2016 at 2:46 PM, Guozhang Wang  wrote:

> Hello Kafka users, developers and client-developers,
>
> This is the first candidate for the release of Apache Kafka 0.10.1.1.
> This is a bug fix release and it includes fixes and improvements from 27
> JIRAs. See the release notes for more details:
>
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/RELEASE_NOTES.html
>
> *** Please download, test and vote by Monday, 13 December, 8am PT ***
>
> Kafka's KEYS file containing PGP keys we use to sign the release:
> http://kafka.apache.org/KEYS
>
> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/
>
> * Maven artifacts to be voted upon:
> https://repository.apache.org/content/groups/staging/org/apache/kafka/
>
> * Javadoc:
> http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/javadoc/
>
> * Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
> https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
> 8b77507083fdd427ce81021228e7e346da0d814c
>
>
> Thanks,
> Guozhang
>



-- 
-- Guozhang

[VOTE] 0.10.1.1 RC0

2016-12-07 Thread Guozhang Wang

Hello Kafka users, developers and client-developers,

This is the first candidate for the release of Apache Kafka 0.10.1.1. This is
a bug fix release and it includes fixes and improvements from 27 JIRAs. See
the release notes for more details:

http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by Monday, 13 December, 8am PT ***

Kafka's KEYS file containing PGP keys we use to sign the release:
http://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
http://home.apache.org/~guozhang/kafka-0.10.1.1-rc0/javadoc/

* Tag to be voted upon (off 0.10.0 branch) is the 0.10.0.1-rc0 tag:
https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=8b77507083fdd427ce81021228e7e346da0d814c


Thanks,
Guozhang

Build failed in Jenkins: kafka-trunk-jdk8 #1086

2016-12-07 Thread Apache Jenkins Server

See 

Changes:

[becket.qin] KAFKA-4445; PreferredLeaderElectionCommand should query zookeeper 
only

--
[...truncated 16005 lines...]

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByAlias STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByAlias PASSED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByShortAlias 
STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddConnectorByShortAlias 
PASSED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidConnector STARTED

org.apache.kafka.connect.runtime.WorkerTest > testStopInvalidConnector PASSED

org.apache.kafka.connect.runtime.WorkerTest > testReconfigureConnectorTasks 
STARTED

org.apache.kafka.connect.runtime.WorkerTest > testReconfigureConnectorTasks 
PASSED

org.apache.kafka.connect.runtime.WorkerTest > testAddRemoveTask STARTED

org.apache.kafka.connect.runtime.WorkerTest > testAddRemoveTask PASSED

org.apache.kafka.connect.runtime.WorkerTest > testStartTaskFailure STARTED

org.apache.kafka.connect.runtime.WorkerTest > testStartTaskFailure PASSED

org.apache.kafka.connect.runtime.WorkerTest > testCleanupTasksOnStop STARTED

org.apache.kafka.connect.runtime.WorkerTest > testCleanupTasksOnStop PASSED

org.apache.kafka.connect.runtime.WorkerTest > testConverterOverrides STARTED

org.apache.kafka.connect.runtime.WorkerTest > testConverterOverrides PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testPollsInBackground PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommit PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitFailure 
STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitFailure 
PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitSuccessFollowedByFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitSuccessFollowedByFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testCommitConsumerFailure PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testCommitTimeout 
PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testAssignmentPauseResume PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > testRewind PASSED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testRewindOnRebalanceDuringPoll STARTED

org.apache.kafka.connect.runtime.WorkerSinkTaskThreadedTest > 
testRewindOnRebalanceDuringPoll PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupFollower STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupFollower PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testMetadata STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testMetadata PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment1 STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment1 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment2 STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testLeaderPerformAssignment2 PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testJoinLeaderCannotAssign STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testJoinLeaderCannotAssign PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testRejoinGroup STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testRejoinGroup PASSED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupLeader STARTED

org.apache.kafka.connect.runtime.distributed.WorkerCoordinatorTest > 
testNormalJoinGroupLeader PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testJoinAssignment PASSED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance STARTED

org.apache.kafka.connect.runtime.distributed.DistributedHerderTest > 
testRebalance PASSED

[jira] [Created] (KAFKA-4509) Task reusage on rebalance fails for threads on same host

2016-12-07 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created KAFKA-4509:
--

 Summary: Task reusage on rebalance fails for threads on same host
 Key: KAFKA-4509
 URL: https://issues.apache.org/jira/browse/KAFKA-4509
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax


In https://issues.apache.org/jira/browse/KAFKA-3559 task reusage on rebalance 
was introduces as a performance optimization. Instead of closing a task on 
rebalance (ie, {{onPartitionsRevoked()}}, it only get's suspended for a 
potential reuse in {{onPartitionsAssigned()}}. Only if a task cannot be reused, 
it will eventually get closed in {{onPartitionsAssigned()}}.

This mechanism can fail, if multiple {{StreamThreads}} run in the same host 
(same or different JVM). The scenario is as follows:

 - assume 2 running threads A and B
 - assume 3 tasks t1, t2, t3
 - assignment: A-(t1,t2) and B-(t3)
 - on the same host, a new single threaded Stream application (same app-id) 
gets started (thread C)
 - on rebalance, t2 (could also be t1 -- does not matter) will be moved from A 
to C
 - as assignment is only sticky base on an heurictic t1 can sometimes be 
assigned to B, too -- and t3 get's assigned to A (thre is a race condition if 
this "task flipping" happens or not)
 - on revoke, A will suspend task t1 and t2 (not releasing any locks)
 - on assign
- A tries to create t3 but as B did not release it yet, A dies with an 
"cannot get lock" exception
- B tries to create t1 but as A did not release it yet, B dies with an 
"cannot get lock" exception
- as A and B trie to create the task first, this will always fail if task 
flipping happened
   - C tries to create t2 but A did not release t2 lock yet (race condition) 
and C dies with an exception (this could even happen without "task flipping" 
between A and B)

We want to fix this, by:
  # first release unassigned suspended tasks in {{onPartitionsAssignment()}}, 
and afterward create new tasks (this fixes the "task flipping" issue)
  # use a "backoff and retry mechanism" if a task cannot be created (to handle 
release-create race condition between different threads)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Work started] (KAFKA-4509) Task reusage on rebalance fails for threads on same host

2016-12-07 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4509 started by Matthias J. Sax.
--
> Task reusage on rebalance fails for threads on same host
> 
>
> Key: KAFKA-4509
> URL: https://issues.apache.org/jira/browse/KAFKA-4509
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>
> In https://issues.apache.org/jira/browse/KAFKA-3559 task reusage on rebalance 
> was introduces as a performance optimization. Instead of closing a task on 
> rebalance (ie, {{onPartitionsRevoked()}}, it only get's suspended for a 
> potential reuse in {{onPartitionsAssigned()}}. Only if a task cannot be 
> reused, it will eventually get closed in {{onPartitionsAssigned()}}.
> This mechanism can fail, if multiple {{StreamThreads}} run in the same host 
> (same or different JVM). The scenario is as follows:
>  - assume 2 running threads A and B
>  - assume 3 tasks t1, t2, t3
>  - assignment: A-(t1,t2) and B-(t3)
>  - on the same host, a new single threaded Stream application (same app-id) 
> gets started (thread C)
>  - on rebalance, t2 (could also be t1 -- does not matter) will be moved from 
> A to C
>  - as assignment is only sticky base on an heurictic t1 can sometimes be 
> assigned to B, too -- and t3 get's assigned to A (thre is a race condition if 
> this "task flipping" happens or not)
>  - on revoke, A will suspend task t1 and t2 (not releasing any locks)
>  - on assign
> - A tries to create t3 but as B did not release it yet, A dies with an 
> "cannot get lock" exception
> - B tries to create t1 but as A did not release it yet, B dies with an 
> "cannot get lock" exception
> - as A and B trie to create the task first, this will always fail if task 
> flipping happened
>- C tries to create t2 but A did not release t2 lock yet (race condition) 
> and C dies with an exception (this could even happen without "task flipping" 
> between A and B)
> We want to fix this, by:
>   # first release unassigned suspended tasks in {{onPartitionsAssignment()}}, 
> and afterward create new tasks (this fixes the "task flipping" issue)
>   # use a "backoff and retry mechanism" if a task cannot be created (to 
> handle release-create race condition between different threads)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-4486) Kafka Streams - exception in process still commits offsets

2016-12-07 Thread Eno Thereska (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska updated KAFKA-4486:

Fix Version/s: 0.10.2.0

> Kafka Streams - exception in process still commits offsets
> --
>
> Key: KAFKA-4486
> URL: https://issues.apache.org/jira/browse/KAFKA-4486
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: Java 8
>Reporter: Joel Lundell
>Assignee: Eno Thereska
> Fix For: 0.10.2.0
>
>
> I'm building a streams application and would like to be able to control the 
> commits manually using ProcessorContext#commit() from an instance of 
> org.apache.kafka.streams.processor.Processor.
> My use case is that I want to read messages from a topic and push them to AWS 
> SQS and I need to be able to guarantee that all messages reach the queue at 
> least once. I also want to use SQS batching support so my approach at the 
> moment is that in Processor#process i'm saving X records in a data structure 
> and when I have a full batch I send it off and if successful i commit. If I 
> for any reason can't deliver the records I don't want the offsets being 
> committed so that when processing works again I can start processing from the 
> last successful record.
> When I was trying out the error handling I noticed that if I create a 
> Processor and in the process method always throw an exception that will 
> trigger StreamThread#shutdownTaskAndState which calls 
> StreamThread#commitOffsets and next time I run the application it starts as 
> if the previous "record" was successfully processed.
> Is there a way to achieve what I'm looking for?
> I found a similar discussion in 
> https://issues.apache.org/jira/browse/KAFKA-3491 but that issue is still open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4492) java.lang.IllegalStateException: Attempting to put a clean entry for key... into NamedCache

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729783#comment-15729783
 ] 

ASF GitHub Bot commented on KAFKA-4492:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2226


> java.lang.IllegalStateException: Attempting to put a clean entry for key... 
> into NamedCache
> ---
>
> Key: KAFKA-4492
> URL: https://issues.apache.org/jira/browse/KAFKA-4492
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> This follows on from https://issues.apache.org/jira/browse/KAFKA-4311
> The exception seems to be triggered by topologies with multiple joined 
> tables. As a new record arrives in one table it triggers an eviction. The 
> eviction causes a flush which will trigger a join processor. These in-turn 
> does a cache lookup and if the value is not in the cache, then it will be 
> retrieved from the store and put in the cache, triggering another eviction. 
> And so on.
> Exception reported on mailing list
> https://gist.github.com/mfenniak/509fb82dfcfda79a21cfc1b07dafa89c
> Further investigation into this also reveals that this same eviction process 
> can send the cache eviction into an infinite loop. It seems that the LRU is 
> broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2226: KAFKA-4492: java.lang.IllegalStateException: Attem...

2016-12-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2226


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (KAFKA-3209) Support single message transforms in Kafka Connect

2016-12-07 Thread Shikhar Bhushan (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729737#comment-15729737
 ] 

Shikhar Bhushan commented on KAFKA-3209:


[~snisarg] and [~jjchorrobe], I revived the discussion thread and I'd welcome 
your thoughts on there about this proposal: 
https://cwiki.apache.org/confluence/display/KAFKA/Connect+Transforms+-+Proposed+Design

> Support single message transforms in Kafka Connect
> --
>
> Key: KAFKA-3209
> URL: https://issues.apache.org/jira/browse/KAFKA-3209
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Neha Narkhede
>  Labels: needs-kip
> Fix For: 0.10.2.0
>
>
> Users should be able to perform light transformations on messages between a 
> connector and Kafka. This is needed because some transformations must be 
> performed before the data hits Kafka (e.g. filtering certain types of events 
> or PII filtering). It's also useful for very light, single-message 
> modifications that are easier to perform inline with the data import/export.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-07 Thread Shikhar Bhushan

Hi all,

I have another iteration at a proposal for this feature here:
https://cwiki.apache.org/confluence/display/KAFKA/Connect+Transforms+-+Proposed+Design

I'd welcome your feedback and comments.

Thanks,

Shikhar

On Tue, Aug 2, 2016 at 7:21 PM Ewen Cheslack-Postava 
wrote:

On Thu, Jul 28, 2016 at 11:58 PM, Shikhar Bhushan 
wrote:

> >
> >
> > Hmm, operating on ConnectRecords probably doesn't work since you need to
> > emit the right type of record, which might mean instantiating a new one.
> I
> > think that means we either need 2 methods, one for SourceRecord, one for
> > SinkRecord, or we'd need to limit what parts of the message you can
> modify
> > (e.g. you can change the key/value via something like
> > transformKey(ConnectRecord) and transformValue(ConnectRecord), but other
> > fields would remain the same and the fmwk would handle allocating new
> > Source/SinkRecords if needed)
> >
>
> Good point, perhaps we could add an abstract method on ConnectRecord that
> takes all the shared fields as parameters and the implementations return a
> copy of the narrower SourceRecord/SinkRecord type as appropriate.
> Transformers would only operate on ConnectRecord rather than caring about
> SourceRecord or SinkRecord (in theory they could instanceof/cast, but the
> API should discourage it)
>
>
> > Is there a use case for hanging on to the original? I can't think of a
> > transformation where you'd need to do that (or couldn't just order
things
> > differently so it isn't a problem).
>
>
> Yeah maybe this isn't really necessary. No strong preference here.
>
> That said, I do worry a bit that farming too much stuff out to
transformers
> > can result in "programming via config", i.e. a lot of the simplicity you
> > get from Connect disappears in long config files. Standardization would
> be
> > nice and might just avoid this (and doesn't cost that much implementing
> it
> > in each connector), and I'd personally prefer something a bit less
> flexible
> > but consistent and easy to configure.
>
>
> Not sure what the you're suggesting :-) Standardized config properties for
> a small set of transformations, leaving it upto connectors to integrate?
>

I just mean that you get to the point where you're practically writing a
Kafka Streams application, you're just doing it through either an
incredibly convoluted set of transformers and configs, or a single
transformer with incredibly convoluted set of configs. You basically get to
the point where you're config is a mini DSL and you're not really saving
that much.

The real question is how much we want to venture into the "T" part of ETL.
I tend to favor minimizing how much we take on since the rest of Connect
isn't designed for it, it's designed around the E & L parts.

-Ewen


> Personally I'm skeptical of that level of flexibility in transformers --
> > its getting awfully complex and certainly takes us pretty far from
> "config
> > only" realtime data integration. It's not clear to me what the use cases
> > are that aren't covered by a small set of common transformations that
can
> > be chained together (e.g. rename/remove fields, mask values, and maybe a
> > couple more).
> >
>
> I agree that we should have some standard transformations that we ship
with
> connect that users would ideally lean towards for routine tasks. The ones
> you mention are some good candidates where I'd imagine can expose simple
> config, e.g.
>transform.filter.whitelist=x,y,z # filter to a whitelist of fields
>transfom.rename.spec=oldName1=>newName1, oldName2=>newName2
>topic.rename.replace=-/_
>topic.rename.prefix=kafka_
> etc..
>
> However the ecosystem will invariably have more complex transformers if we
> make this pluggable. And because ETL is messy, that's probably a good
thing
> if folks are able to do their data munging orthogonally to connectors, so
> that connectors can focus on the logic of how data should be copied
from/to
> datastores and Kafka.
>
>
> > In any case, we'd probably also have to change configs of connectors if
> we
> > allowed configs like that since presumably transformer configs will just
> be
> > part of the connector config.
> >
>
> Yeah, haven't thought much about how all the configuration would tie
> together...
>
> I think we'd need the ability to:
> - spec transformer chain (fully-qualified class names? perhaps special
> aliases for built-in ones? perhaps third-party fqcns can be assigned
> aliases by users in the chain spec, for easier configuration and to
> uniquely identify a transformation when it occurs more than one time in a
> chain?)
> - configure each transformer -- all properties prefixed with that
> transformer's ID (fqcn / alias) get destined to it
>
> Additionally, I think we would probably want to allow for topic-specific
> overrides  (e.g. you
> want
> certain transformations for one topic, but different ones for another...)
>

Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2016-12-07 Thread Rajini Sivaram

+1 (non-binding)

On Wed, Dec 7, 2016 at 7:22 PM, Apurva Mehta  wrote:

> +1 (non-binding)
>
> On Wed, Dec 7, 2016 at 10:05 AM, Jason Gustafson 
> wrote:
>
> > +1 Thanks for the KIP!
> >
> > On Wed, Dec 7, 2016 at 2:53 AM, Ismael Juma  wrote:
> >
> > > Thanks for the KIP, Vahid. +1 (binding)
> > >
> > > On Mon, Dec 5, 2016 at 6:16 PM, Vahid S Hashemian <
> > > vahidhashem...@us.ibm.com
> > > > wrote:
> > >
> > > > Happy Monday,
> > > >
> > > > I'd like to start voting on KIP-88 (
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 88%3A+OffsetFetch+Protocol+Update
> > > > ).
> > > > The discussion thread can be found here:
> > > > https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
> > > >
> > > > Thank you for your feedback.
> > > >
> > > > Regards,
> > > > --Vahid
> > > >
> > > >
> > >
> >
>



-- 
Regards,

Rajini

Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2016-12-07 Thread Apurva Mehta

+1 (non-binding)

On Wed, Dec 7, 2016 at 10:05 AM, Jason Gustafson  wrote:

> +1 Thanks for the KIP!
>
> On Wed, Dec 7, 2016 at 2:53 AM, Ismael Juma  wrote:
>
> > Thanks for the KIP, Vahid. +1 (binding)
> >
> > On Mon, Dec 5, 2016 at 6:16 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com
> > > wrote:
> >
> > > Happy Monday,
> > >
> > > I'd like to start voting on KIP-88 (
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 88%3A+OffsetFetch+Protocol+Update
> > > ).
> > > The discussion thread can be found here:
> > > https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
> > >
> > > Thank you for your feedback.
> > >
> > > Regards,
> > > --Vahid
> > >
> > >
> >
>

[GitHub] kafka pull request #2221: MINOR: fix metric collection NPE during shutdown

2016-12-07 Thread xvrl

Github user xvrl closed the pull request at:

https://github.com/apache/kafka/pull/2221


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] kafka pull request #2221: MINOR: fix metric collection NPE during shutdown

2016-12-07 Thread xvrl

GitHub user xvrl reopened a pull request:

https://github.com/apache/kafka/pull/2221

MINOR: fix metric collection NPE during shutdown

collecting socket server metrics during shutdown may throw 
NullPointerException

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xvrl/kafka fix-metrics-npe-on-shutdown

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2221


commit 074361cc21e70917eaf2bd4add068e3727f14adf
Author: Xavier LÃ©autÃ© 
Date:   2016-12-07T01:16:01Z

MINOR: fix metric collection NPE during shutdown




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (KAFKA-4445) PreferredLeaderElectionCommand should query zk only once per topic

2016-12-07 Thread Jiangjie Qin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin resolved KAFKA-4445.
-
Resolution: Fixed

> PreferredLeaderElectionCommand should query zk only once per topic
> --
>
> Key: KAFKA-4445
> URL: https://issues.apache.org/jira/browse/KAFKA-4445
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.1.0
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 0.10.2.0
>
>
> Currently PreferredLeaderElection will query zookeeper once for every 
> partition that are included in the preferred leader election. This is very 
> inefficient and makes the preferred leader election script slow. It should 
> only query zookeeper once per topic.
> In addition, the script currently reports that the preferred leader election 
> is successfully even if some partitions are not found in the zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4445) PreferredLeaderElectionCommand should query zk only once per topic

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729435#comment-15729435
 ] 

ASF GitHub Bot commented on KAFKA-4445:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2170


> PreferredLeaderElectionCommand should query zk only once per topic
> --
>
> Key: KAFKA-4445
> URL: https://issues.apache.org/jira/browse/KAFKA-4445
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.1.0
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 0.10.2.0
>
>
> Currently PreferredLeaderElection will query zookeeper once for every 
> partition that are included in the preferred leader election. This is very 
> inefficient and makes the preferred leader election script slow. It should 
> only query zookeeper once per topic.
> In addition, the script currently reports that the preferred leader election 
> is successfully even if some partitions are not found in the zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-4445) PreferredLeaderElectionCommand should query zk only once per topic

2016-12-07 Thread Jiangjie Qin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin updated KAFKA-4445:

Affects Version/s: 0.10.1.0
Fix Version/s: 0.10.2.0
   Issue Type: Improvement  (was: Bug)

> PreferredLeaderElectionCommand should query zk only once per topic
> --
>
> Key: KAFKA-4445
> URL: https://issues.apache.org/jira/browse/KAFKA-4445
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.10.1.0
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 0.10.2.0
>
>
> Currently PreferredLeaderElection will query zookeeper once for every 
> partition that are included in the preferred leader election. This is very 
> inefficient and makes the preferred leader election script slow. It should 
> only query zookeeper once per topic.
> In addition, the script currently reports that the preferred leader election 
> is successfully even if some partitions are not found in the zookeeper.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2170: KAFKA-4445; PreferredLeaderElectionCommand should ...

2016-12-07 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2170


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2016-12-07 Thread Jason Gustafson

+1 Thanks for the KIP!

On Wed, Dec 7, 2016 at 2:53 AM, Ismael Juma  wrote:

> Thanks for the KIP, Vahid. +1 (binding)
>
> On Mon, Dec 5, 2016 at 6:16 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com
> > wrote:
>
> > Happy Monday,
> >
> > I'd like to start voting on KIP-88 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 88%3A+OffsetFetch+Protocol+Update
> > ).
> > The discussion thread can be found here:
> > https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
> >
> > Thank you for your feedback.
> >
> > Regards,
> > --Vahid
> >
> >
>

[jira] [Created] (KAFKA-4508) Create system tests that run newer versions of the client against older versions of the broker

2016-12-07 Thread Colin P. McCabe (JIRA)

Colin P. McCabe created KAFKA-4508:
--

 Summary: Create system tests that run newer versions of the client 
against older versions of the broker
 Key: KAFKA-4508
 URL: https://issues.apache.org/jira/browse/KAFKA-4508
 Project: Kafka
  Issue Type: Sub-task
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Create system tests that run newer versions of the client against older 
versions of the broker.  These system tests will assume that a tarball of the 
Apache release of Kafka 0.10.0 (and possibly other releases) is located 
somewhere, and go from there.  We can write the overall test harness stuff in 
Python, and have some utilities in Java.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-4507) The client should send older versions of requests to the broker if necessary

2016-12-07 Thread Colin P. McCabe (JIRA)

Colin P. McCabe created KAFKA-4507:
--

 Summary: The client should send older versions of requests to the 
broker if necessary
 Key: KAFKA-4507
 URL: https://issues.apache.org/jira/browse/KAFKA-4507
 Project: Kafka
  Issue Type: Sub-task
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


The client should send older versions of requests to the broker if necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-4506) Refactor AbstractRequest to contain version information

2016-12-07 Thread Colin P. McCabe (JIRA)

Colin P. McCabe created KAFKA-4506:
--

 Summary: Refactor AbstractRequest to contain version information
 Key: KAFKA-4506
 URL: https://issues.apache.org/jira/browse/KAFKA-4506
 Project: Kafka
  Issue Type: Sub-task
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


Refactor AbstractRequest to contain version information.  Remove some client 
code which implicitly assumes that the latest version of each message will 
always be processed and used.  Always match the API version in request headers 
to the request version.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4492) java.lang.IllegalStateException: Attempting to put a clean entry for key... into NamedCache

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729341#comment-15729341
 ] 

ASF GitHub Bot commented on KAFKA-4492:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2226

KAFKA-4492: java.lang.IllegalStateException: Attempting to put a clean 
entry for key... into NamedCache

The NamedCache wasn't correctly dealing with its re-entrant nature. This 
would result in the LRU becoming corrupted, and the above exception occurring 
during eviction. For example:
Cache A: dirty key 1
eviction runs on Cache A
Node for key 1 gets marked as clean
Entry for key 1 gets flushed downstream
Downstream there is a processor that also refers to the table fronted by 
Cache A
Downstream processor puts key 2 into Cache A
This triggers an eviction of key 1 again ( it is still the oldest node as 
hasn't been removed from the LRU)
As the Node for key 1 is clean flush doesn't run and it is immediately 
removed from the cache. 
So now we have dirtyKey set with key =1, but the value doesn't exist in the 
cache.
Downstream processor tries to put key = 1 into the cache, it fails as key 
=1 is in the dirtyKeySet.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka cache-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2226


commit 926ef4b1900c6870819948e12950f215ab3998ac
Author: Damian Guy 
Date:   2016-12-07T14:48:34Z

fix cache bug

commit 6b6f3a57acacae1f0e4642ebc080ec9bc451e98c
Author: Damian Guy 
Date:   2016-12-07T17:12:47Z

add comment about cache eviction




> java.lang.IllegalStateException: Attempting to put a clean entry for key... 
> into NamedCache
> ---
>
> Key: KAFKA-4492
> URL: https://issues.apache.org/jira/browse/KAFKA-4492
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.2.0
>
>
> This follows on from https://issues.apache.org/jira/browse/KAFKA-4311
> The exception seems to be triggered by topologies with multiple joined 
> tables. As a new record arrives in one table it triggers an eviction. The 
> eviction causes a flush which will trigger a join processor. These in-turn 
> does a cache lookup and if the value is not in the cache, then it will be 
> retrieved from the store and put in the cache, triggering another eviction. 
> And so on.
> Exception reported on mailing list
> https://gist.github.com/mfenniak/509fb82dfcfda79a21cfc1b07dafa89c
> Further investigation into this also reveals that this same eviction process 
> can send the cache eviction into an infinite loop. It seems that the LRU is 
> broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2226: KAFKA-4492: java.lang.IllegalStateException: Attem...

2016-12-07 Thread dguy

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2226

KAFKA-4492: java.lang.IllegalStateException: Attempting to put a clean 
entry for key... into NamedCache

The NamedCache wasn't correctly dealing with its re-entrant nature. This 
would result in the LRU becoming corrupted, and the above exception occurring 
during eviction. For example:
Cache A: dirty key 1
eviction runs on Cache A
Node for key 1 gets marked as clean
Entry for key 1 gets flushed downstream
Downstream there is a processor that also refers to the table fronted by 
Cache A
Downstream processor puts key 2 into Cache A
This triggers an eviction of key 1 again ( it is still the oldest node as 
hasn't been removed from the LRU)
As the Node for key 1 is clean flush doesn't run and it is immediately 
removed from the cache. 
So now we have dirtyKey set with key =1, but the value doesn't exist in the 
cache.
Downstream processor tries to put key = 1 into the cache, it fails as key 
=1 is in the dirtyKeySet.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka cache-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2226


commit 926ef4b1900c6870819948e12950f215ab3998ac
Author: Damian Guy 
Date:   2016-12-07T14:48:34Z

fix cache bug

commit 6b6f3a57acacae1f0e4642ebc080ec9bc451e98c
Author: Damian Guy 
Date:   2016-12-07T17:12:47Z

add comment about cache eviction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[VOTE]: KIP-97: The client compatibility KIP

2016-12-07 Thread Colin McCabe

Hi all,

I heard that the VOTE and DISCUSS threads for the KIP-97 discussion
appeared to be in the same email thread for some people using gmail.  So
I'm reposting in hopes of getting a separate email thread this time for
those viewers. :)

I'd like to start voting on KIP-97
(https://cwiki.apache.org/confluence/display/KAFKA/KIP-97%3A+Improved+Kafka+Client+RPC+Compatibility+Policy
).

The discussion thread can be found here: 
https://www.mail-archive.com/dev@kafka.apache.org/msg60955.html

Thanks for your feedback.
 
best,
Colin McCabe

[jira] [Created] (KAFKA-4505) Cannot get topic lag since kafka upgrade from 0.8.1.0 to 0.10.1.0

2016-12-07 Thread Romaric Parmentier (JIRA)

Romaric Parmentier created KAFKA-4505:
-

 Summary: Cannot get topic lag since kafka upgrade from 0.8.1.0 to 
0.10.1.0
 Key: KAFKA-4505
 URL: https://issues.apache.org/jira/browse/KAFKA-4505
 Project: Kafka
  Issue Type: Bug
  Components: consumer, metrics, offset manager
Affects Versions: 0.10.1.0
Reporter: Romaric Parmentier
Priority: Critical


We were using kafka 0.8.1.1 and we just migrate to version 0.10.1.0.

Since we migrate we are using the new script kafka-consumer-groups.sh to 
retreive topic lags but it don't seem to work anymore. 
Because the application is using the 0.8 driver we have added the following 
conf to each kafka servers:
log.message.format.version=0.8.2
inter.broker.protocol.version=0.10.0.0

When I'm using the option --list with kafka-consumer-groups.sh I can see every 
consumer groups I'm using but the --describe is not working:
/usr/share/kafka$ bin/kafka-consumer-groups.sh --zookeeper ip:2181 --describe 
--group group_name
No topic available for consumer group provided
GROUP  TOPIC  PARTITION  
CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER

When I'm looking into zookeeper I can see the offset increasing for this 
consumer group.

Any idea ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4503) Expose the log dir for a partition as a metric

2016-12-07 Thread Ismael Juma (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729154#comment-15729154
 ] 

Ismael Juma commented on KAFKA-4503:


Also see KAFKA-1614, which is a bit broader.

> Expose the log dir for a partition as a metric
> --
>
> Key: KAFKA-4503
> URL: https://issues.apache.org/jira/browse/KAFKA-4503
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> It would be useful to be able to map a partition to a log directory if 
> multiple log directories are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4503) Expose the log dir for a partition as a metric

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729150#comment-15729150
 ] 

ASF GitHub Bot commented on KAFKA-4503:
---

Github user ijuma closed the pull request at:

https://github.com/apache/kafka/pull/2224


> Expose the log dir for a partition as a metric
> --
>
> Key: KAFKA-4503
> URL: https://issues.apache.org/jira/browse/KAFKA-4503
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> It would be useful to be able to map a partition to a log directory if 
> multiple log directories are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2224: KAFKA-4503: Expose the log dir for a partition as ...

2016-12-07 Thread ijuma

Github user ijuma closed the pull request at:

https://github.com/apache/kafka/pull/2224


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Work started] (KAFKA-4486) Kafka Streams - exception in process still commits offsets

2016-12-07 Thread Eno Thereska (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-4486 started by Eno Thereska.
---
> Kafka Streams - exception in process still commits offsets
> --
>
> Key: KAFKA-4486
> URL: https://issues.apache.org/jira/browse/KAFKA-4486
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: Java 8
>Reporter: Joel Lundell
>Assignee: Eno Thereska
>
> I'm building a streams application and would like to be able to control the 
> commits manually using ProcessorContext#commit() from an instance of 
> org.apache.kafka.streams.processor.Processor.
> My use case is that I want to read messages from a topic and push them to AWS 
> SQS and I need to be able to guarantee that all messages reach the queue at 
> least once. I also want to use SQS batching support so my approach at the 
> moment is that in Processor#process i'm saving X records in a data structure 
> and when I have a full batch I send it off and if successful i commit. If I 
> for any reason can't deliver the records I don't want the offsets being 
> committed so that when processing works again I can start processing from the 
> last successful record.
> When I was trying out the error handling I noticed that if I create a 
> Processor and in the process method always throw an exception that will 
> trigger StreamThread#shutdownTaskAndState which calls 
> StreamThread#commitOffsets and next time I run the application it starts as 
> if the previous "record" was successfully processed.
> Is there a way to achieve what I'm looking for?
> I found a similar discussion in 
> https://issues.apache.org/jira/browse/KAFKA-3491 but that issue is still open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4486) Kafka Streams - exception in process still commits offsets

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729122#comment-15729122
 ] 

ASF GitHub Bot commented on KAFKA-4486:
---

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2225

KAFKA-4486: Don't commit offsets on exception



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-4486-exception-commit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2225


commit fcd0e1edbd06dc47a64b72d57d155391b7680238
Author: Eno Thereska 
Date:   2016-12-07T16:09:20Z

Don't commit offsets on exception




> Kafka Streams - exception in process still commits offsets
> --
>
> Key: KAFKA-4486
> URL: https://issues.apache.org/jira/browse/KAFKA-4486
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: Java 8
>Reporter: Joel Lundell
>Assignee: Eno Thereska
>
> I'm building a streams application and would like to be able to control the 
> commits manually using ProcessorContext#commit() from an instance of 
> org.apache.kafka.streams.processor.Processor.
> My use case is that I want to read messages from a topic and push them to AWS 
> SQS and I need to be able to guarantee that all messages reach the queue at 
> least once. I also want to use SQS batching support so my approach at the 
> moment is that in Processor#process i'm saving X records in a data structure 
> and when I have a full batch I send it off and if successful i commit. If I 
> for any reason can't deliver the records I don't want the offsets being 
> committed so that when processing works again I can start processing from the 
> last successful record.
> When I was trying out the error handling I noticed that if I create a 
> Processor and in the process method always throw an exception that will 
> trigger StreamThread#shutdownTaskAndState which calls 
> StreamThread#commitOffsets and next time I run the application it starts as 
> if the previous "record" was successfully processed.
> Is there a way to achieve what I'm looking for?
> I found a similar discussion in 
> https://issues.apache.org/jira/browse/KAFKA-3491 but that issue is still open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KAFKA-4486) Kafka Streams - exception in process still commits offsets

2016-12-07 Thread Eno Thereska (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729125#comment-15729125
 ] 

Eno Thereska commented on KAFKA-4486:
-

[~joel.lundell] thanks for the JIRA. I've opened a PR, mind giving it a go and 
reporting if it sorts your scenario? Thanks.

> Kafka Streams - exception in process still commits offsets
> --
>
> Key: KAFKA-4486
> URL: https://issues.apache.org/jira/browse/KAFKA-4486
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: Java 8
>Reporter: Joel Lundell
>Assignee: Eno Thereska
>
> I'm building a streams application and would like to be able to control the 
> commits manually using ProcessorContext#commit() from an instance of 
> org.apache.kafka.streams.processor.Processor.
> My use case is that I want to read messages from a topic and push them to AWS 
> SQS and I need to be able to guarantee that all messages reach the queue at 
> least once. I also want to use SQS batching support so my approach at the 
> moment is that in Processor#process i'm saving X records in a data structure 
> and when I have a full batch I send it off and if successful i commit. If I 
> for any reason can't deliver the records I don't want the offsets being 
> committed so that when processing works again I can start processing from the 
> last successful record.
> When I was trying out the error handling I noticed that if I create a 
> Processor and in the process method always throw an exception that will 
> trigger StreamThread#shutdownTaskAndState which calls 
> StreamThread#commitOffsets and next time I run the application it starts as 
> if the previous "record" was successfully processed.
> Is there a way to achieve what I'm looking for?
> I found a similar discussion in 
> https://issues.apache.org/jira/browse/KAFKA-3491 but that issue is still open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2225: KAFKA-4486: Don't commit offsets on exception

2016-12-07 Thread enothereska

GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2225

KAFKA-4486: Don't commit offsets on exception



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka KAFKA-4486-exception-commit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2225


commit fcd0e1edbd06dc47a64b72d57d155391b7680238
Author: Eno Thereska 
Date:   2016-12-07T16:09:20Z

Don't commit offsets on exception




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Assigned] (KAFKA-4486) Kafka Streams - exception in process still commits offsets

2016-12-07 Thread Eno Thereska (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eno Thereska reassigned KAFKA-4486:
---

Assignee: Eno Thereska

> Kafka Streams - exception in process still commits offsets
> --
>
> Key: KAFKA-4486
> URL: https://issues.apache.org/jira/browse/KAFKA-4486
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
> Environment: Java 8
>Reporter: Joel Lundell
>Assignee: Eno Thereska
>
> I'm building a streams application and would like to be able to control the 
> commits manually using ProcessorContext#commit() from an instance of 
> org.apache.kafka.streams.processor.Processor.
> My use case is that I want to read messages from a topic and push them to AWS 
> SQS and I need to be able to guarantee that all messages reach the queue at 
> least once. I also want to use SQS batching support so my approach at the 
> moment is that in Processor#process i'm saving X records in a data structure 
> and when I have a full batch I send it off and if successful i commit. If I 
> for any reason can't deliver the records I don't want the offsets being 
> committed so that when processing works again I can start processing from the 
> last successful record.
> When I was trying out the error handling I noticed that if I create a 
> Processor and in the process method always throw an exception that will 
> trigger StreamThread#shutdownTaskAndState which calls 
> StreamThread#commitOffsets and next time I run the application it starts as 
> if the previous "record" was successfully processed.
> Is there a way to achieve what I'm looking for?
> I found a similar discussion in 
> https://issues.apache.org/jira/browse/KAFKA-3491 but that issue is still open.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-07 Thread Damian Guy

Michael,

We can only support outerJoin if both tables are keyed the same way. Lets
say for example you can map both ways, however, the key for each table is
of a different type. So t1 is long and t2 is string - what is the key type
of the resulting GlobalKTable? So when you subsequently join to this table,
and do a lookup on it, which key are you using?

Thanks,
Damian

On Wed, 7 Dec 2016 at 14:31 Michael Noll  wrote:

> Damian,
>
> yes, that makes sense.
>
> But I am still wondering:  In your example, there's no prior knowledge "can
> I map from t1->t2" that Streams can leverage for joining t1 and t2 other
> than blindly relying on the user to provide an appropriate KeyValueMapper
> for K1/V1 of t1 -> K2/V2 of t2.  In other words, if we allow the user to
> provide a KeyValueMapper from t1->t2 (Streams does not know at compile time
> whether this mapping will actually work), then we can also allow the user
> to provide a corresponding "reverse" mapper from t2->t1.  That is, we could
> say that an outer join between two global tables IS supported, but if and
> only if the user provides two KeyValueMappers, one for t1->t2 and one for
> t2->t1.
>
> The left join t1->t2 (which is supported in the KIP), in general, works
> only because of the existence of the user-provided KeyValueMapper from
> t1->t2.  The outer join, as you point out, cannot satisfied as easily
> because Streams must know two mappers, t1->t2 plus t2->t1 -- otherwise the
> outer join won't work.
>
>
>
>
>
> On Wed, Dec 7, 2016 at 3:04 PM, Damian Guy  wrote:
>
> > Hi Michael,
> >
> > Sure. Say we have 2 input topics t1 & t2 below:
> > t1{
> >  int key;
> >  string t2_id;
> >  ...
> > }
> >
> > t2 {
> >   string key;
> >   ..
> > }
> > If we create global tables out of these we'd get:
> > GlobalKTable t1;
> > GlobalKTable t2;
> >
> > So the join can only go in 1 direction, i.e, from t1 -> t2 as in order to
> > perform the join we need to use a KeyValueMapper to extract the t2 key
> from
> > the t1 value.
> >
> > Does that make sense?
> >
> > Thanks,
> > Damian
> >
> > On Wed, 7 Dec 2016 at 10:44 Michael Noll  wrote:
> >
> > > > There is no outer-join for GlobalKTables as the tables may be keyed
> > > > differently. So you need to use the key from the left side of the
> join
> > > > along with the KeyValueMapper to resolve the right side of the join.
> > This
> > > > wont work the other way around.
> > >
> > > Care to elaborate why it won't work the other way around?  If, for
> > example,
> > > we swapped the call from leftTable.join(rightTable) to
> > > rightTable.join(leftTable), that join would work, too.  Perhaps I am
> > > missing something though. :-)
> > >
> > >
> > >
> > >
> > > On Wed, Dec 7, 2016 at 10:39 AM, Damian Guy 
> > wrote:
> > >
> > > > Hi Matthias,
> > > >
> > > > Thanks for the feedback.
> > > >
> > > > There is no outer-join for GlobalKTables as the tables may be keyed
> > > > differently. So you need to use the key from the left side of the
> join
> > > > along with the KeyValueMapper to resolve the right side of the join.
> > This
> > > > wont work the other way around.
> > > >
> > > > On the bootstrapping concern. If the application is failing before
> > > > bootstrapping finishes, the problem is likely to be related to a
> > terminal
> > > > exception, i.e., running out of disk space, corrupt state stores etc.
> > In
> > > > these cases, we wouldn't want the application to continue. So i think
> > > this
> > > > is ok.
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > > > On Tue, 6 Dec 2016 at 21:56 Matthias J. Sax 
> > > wrote:
> > > >
> > > > > Thanks for the KIP Damian. Very nice motivating example!
> > > > >
> > > > > A few comments:
> > > > >
> > > > >  - why is there no outer-join for GlobalKTables
> > > > >  - on bootstrapping GlobalKTable, could it happen that this never
> > > > > finishes if the application fails before bootstrapping finishes and
> > new
> > > > > data gets written at the same time? Do we need to guard against
> this
> > > > > (seems to be a very rare corner case, so maybe not required)?
> > > > >
> > > > >
> > > > > -Matthias
> > > > >
> > > > >
> > > > > On 12/6/16 2:09 AM, Damian Guy wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > I would like to start the discussion on KIP-99:
> > > > > >
> > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > > action?pageId=67633649
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > Thanks,
> > > > > > Damian
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> *Michael G. Noll*
> Product Manager | Confluent
> +1 650 453 5860 <(650)%20453-5860> | @miguno 
> Follow us: Twitter  | Blog
> 
>

[jira] [Updated] (KAFKA-4504) Details of retention.bytes property at Topic level are not clear on how they impact partition size

2016-12-07 Thread Justin (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin updated KAFKA-4504:
--
Description: 
Problem:

Details of retention.bytes property at Topic level are not clear on how they 
impact partition size

Business Impact:

Users are setting retention.bytes and not seeing the desired store amount of 
data.

Current Text:

This configuration controls the maximum size a log can grow to before we will 
discard old log segments to free up space if we are using the "delete" 
retention policy. By default there is no size limit only a time limit.

Proposed change:

This configuration controls the maximum size a log can grow to before we will 
discard old log segments to free up space if we are using the "delete" 
retention policy. By default there is no size limit only a time limit.  Please 
note, this is calculated as retention.bytes * number of partitions on the given 
topic for the total  amount of disk space to be used.  

  was:
Problem:

Details of retention.bytes property at Topic level are not clear on how they 
impact the system

Business Impact:


Proposed fix:



Summary: Details of retention.bytes property at Topic level are not 
clear on how they impact partition size  (was: Details of retention.bytes 
property at Topic level are not clear on how they impact on system)

> Details of retention.bytes property at Topic level are not clear on how they 
> impact partition size
> --
>
> Key: KAFKA-4504
> URL: https://issues.apache.org/jira/browse/KAFKA-4504
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 0.10.0.1
>Reporter: Justin
>
> Problem:
> Details of retention.bytes property at Topic level are not clear on how they 
> impact partition size
> Business Impact:
> Users are setting retention.bytes and not seeing the desired store amount of 
> data.
> Current Text:
> This configuration controls the maximum size a log can grow to before we will 
> discard old log segments to free up space if we are using the "delete" 
> retention policy. By default there is no size limit only a time limit.
> Proposed change:
> This configuration controls the maximum size a log can grow to before we will 
> discard old log segments to free up space if we are using the "delete" 
> retention policy. By default there is no size limit only a time limit.  
> Please note, this is calculated as retention.bytes * number of partitions on 
> the given topic for the total  amount of disk space to be used.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-4504) Details of retention.bytes property at Topic level are not clear on how they impact on system

2016-12-07 Thread Justin (JIRA)

Justin created KAFKA-4504:
-

 Summary: Details of retention.bytes property at Topic level are 
not clear on how they impact on system
 Key: KAFKA-4504
 URL: https://issues.apache.org/jira/browse/KAFKA-4504
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.10.0.1
Reporter: Justin


Problem:

Details of retention.bytes property at Topic level are not clear on how they 
impact the system

Business Impact:


Proposed fix:





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-07 Thread Michael Noll

Damian,

yes, that makes sense.

But I am still wondering:  In your example, there's no prior knowledge "can
I map from t1->t2" that Streams can leverage for joining t1 and t2 other
than blindly relying on the user to provide an appropriate KeyValueMapper
for K1/V1 of t1 -> K2/V2 of t2.  In other words, if we allow the user to
provide a KeyValueMapper from t1->t2 (Streams does not know at compile time
whether this mapping will actually work), then we can also allow the user
to provide a corresponding "reverse" mapper from t2->t1.  That is, we could
say that an outer join between two global tables IS supported, but if and
only if the user provides two KeyValueMappers, one for t1->t2 and one for
t2->t1.

The left join t1->t2 (which is supported in the KIP), in general, works
only because of the existence of the user-provided KeyValueMapper from
t1->t2.  The outer join, as you point out, cannot satisfied as easily
because Streams must know two mappers, t1->t2 plus t2->t1 -- otherwise the
outer join won't work.





On Wed, Dec 7, 2016 at 3:04 PM, Damian Guy  wrote:

> Hi Michael,
>
> Sure. Say we have 2 input topics t1 & t2 below:
> t1{
>  int key;
>  string t2_id;
>  ...
> }
>
> t2 {
>   string key;
>   ..
> }
> If we create global tables out of these we'd get:
> GlobalKTable t1;
> GlobalKTable t2;
>
> So the join can only go in 1 direction, i.e, from t1 -> t2 as in order to
> perform the join we need to use a KeyValueMapper to extract the t2 key from
> the t1 value.
>
> Does that make sense?
>
> Thanks,
> Damian
>
> On Wed, 7 Dec 2016 at 10:44 Michael Noll  wrote:
>
> > > There is no outer-join for GlobalKTables as the tables may be keyed
> > > differently. So you need to use the key from the left side of the join
> > > along with the KeyValueMapper to resolve the right side of the join.
> This
> > > wont work the other way around.
> >
> > Care to elaborate why it won't work the other way around?  If, for
> example,
> > we swapped the call from leftTable.join(rightTable) to
> > rightTable.join(leftTable), that join would work, too.  Perhaps I am
> > missing something though. :-)
> >
> >
> >
> >
> > On Wed, Dec 7, 2016 at 10:39 AM, Damian Guy 
> wrote:
> >
> > > Hi Matthias,
> > >
> > > Thanks for the feedback.
> > >
> > > There is no outer-join for GlobalKTables as the tables may be keyed
> > > differently. So you need to use the key from the left side of the join
> > > along with the KeyValueMapper to resolve the right side of the join.
> This
> > > wont work the other way around.
> > >
> > > On the bootstrapping concern. If the application is failing before
> > > bootstrapping finishes, the problem is likely to be related to a
> terminal
> > > exception, i.e., running out of disk space, corrupt state stores etc.
> In
> > > these cases, we wouldn't want the application to continue. So i think
> > this
> > > is ok.
> > >
> > > Thanks,
> > > Damian
> > >
> > > On Tue, 6 Dec 2016 at 21:56 Matthias J. Sax 
> > wrote:
> > >
> > > > Thanks for the KIP Damian. Very nice motivating example!
> > > >
> > > > A few comments:
> > > >
> > > >  - why is there no outer-join for GlobalKTables
> > > >  - on bootstrapping GlobalKTable, could it happen that this never
> > > > finishes if the application fails before bootstrapping finishes and
> new
> > > > data gets written at the same time? Do we need to guard against this
> > > > (seems to be a very rare corner case, so maybe not required)?
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > > On 12/6/16 2:09 AM, Damian Guy wrote:
> > > > > Hi all,
> > > > >
> > > > > I would like to start the discussion on KIP-99:
> > > > >
> > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > action?pageId=67633649
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Thanks,
> > > > > Damian
> > > > >
> > > >
> > > >
> > >
> >
>



-- 
*Michael G. Noll*
Product Manager | Confluent
+1 650 453 5860 | @miguno 
Follow us: Twitter  | Blog

Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-07 Thread Damian Guy

Hi Michael,

Sure. Say we have 2 input topics t1 & t2 below:
t1{
 int key;
 string t2_id;
 ...
}

t2 {
  string key;
  ..
}
If we create global tables out of these we'd get:
GlobalKTable t1;
GlobalKTable t2;

So the join can only go in 1 direction, i.e, from t1 -> t2 as in order to
perform the join we need to use a KeyValueMapper to extract the t2 key from
the t1 value.

Does that make sense?

Thanks,
Damian

On Wed, 7 Dec 2016 at 10:44 Michael Noll  wrote:

> > There is no outer-join for GlobalKTables as the tables may be keyed
> > differently. So you need to use the key from the left side of the join
> > along with the KeyValueMapper to resolve the right side of the join. This
> > wont work the other way around.
>
> Care to elaborate why it won't work the other way around?  If, for example,
> we swapped the call from leftTable.join(rightTable) to
> rightTable.join(leftTable), that join would work, too.  Perhaps I am
> missing something though. :-)
>
>
>
>
> On Wed, Dec 7, 2016 at 10:39 AM, Damian Guy  wrote:
>
> > Hi Matthias,
> >
> > Thanks for the feedback.
> >
> > There is no outer-join for GlobalKTables as the tables may be keyed
> > differently. So you need to use the key from the left side of the join
> > along with the KeyValueMapper to resolve the right side of the join. This
> > wont work the other way around.
> >
> > On the bootstrapping concern. If the application is failing before
> > bootstrapping finishes, the problem is likely to be related to a terminal
> > exception, i.e., running out of disk space, corrupt state stores etc. In
> > these cases, we wouldn't want the application to continue. So i think
> this
> > is ok.
> >
> > Thanks,
> > Damian
> >
> > On Tue, 6 Dec 2016 at 21:56 Matthias J. Sax 
> wrote:
> >
> > > Thanks for the KIP Damian. Very nice motivating example!
> > >
> > > A few comments:
> > >
> > >  - why is there no outer-join for GlobalKTables
> > >  - on bootstrapping GlobalKTable, could it happen that this never
> > > finishes if the application fails before bootstrapping finishes and new
> > > data gets written at the same time? Do we need to guard against this
> > > (seems to be a very rare corner case, so maybe not required)?
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 12/6/16 2:09 AM, Damian Guy wrote:
> > > > Hi all,
> > > >
> > > > I would like to start the discussion on KIP-99:
> > > >
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=67633649
> > > >
> > > > Looking forward to your feedback.
> > > >
> > > > Thanks,
> > > > Damian
> > > >
> > >
> > >
> >
>

[jira] [Commented] (KAFKA-4503) Expose the log dir for a partition as a metric

2016-12-07 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15728735#comment-15728735
 ] 

ASF GitHub Bot commented on KAFKA-4503:
---

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2224

KAFKA-4503: Expose the log dir for a partition as a metric



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-4503-log-dir-for-partition-as-metric

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2224


commit 796983fcbd6e268f05f2e33851304cfbbc92aee7
Author: Ismael Juma 
Date:   2016-12-07T13:16:39Z

Introduce `LogDir` gauge in `Log`




> Expose the log dir for a partition as a metric
> --
>
> Key: KAFKA-4503
> URL: https://issues.apache.org/jira/browse/KAFKA-4503
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Ismael Juma
>Assignee: Ismael Juma
>
> It would be useful to be able to map a partition to a log directory if 
> multiple log directories are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] kafka pull request #2224: KAFKA-4503: Expose the log dir for a partition as ...

2016-12-07 Thread ijuma

GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/2224

KAFKA-4503: Expose the log dir for a partition as a metric



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka 
kafka-4503-log-dir-for-partition-as-metric

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2224


commit 796983fcbd6e268f05f2e33851304cfbbc92aee7
Author: Ismael Juma 
Date:   2016-12-07T13:16:39Z

Introduce `LogDir` gauge in `Log`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (KAFKA-4503) Expose the log dir for a partition as a metric

2016-12-07 Thread Ismael Juma (JIRA)

Ismael Juma created KAFKA-4503:
--

 Summary: Expose the log dir for a partition as a metric
 Key: KAFKA-4503
 URL: https://issues.apache.org/jira/browse/KAFKA-4503
 Project: Kafka
  Issue Type: New Feature
Reporter: Ismael Juma
Assignee: Ismael Juma


It would be useful to be able to map a partition to a log directory if multiple 
log directories are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KAFKA-3172) Consumer threads stay in 'Watiting' status and are blocked at consumer poll method

2016-12-07 Thread Ismael Juma (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3172:
---
Fix Version/s: (was: 0.9.0.0)
   0.10.2.0

> Consumer threads stay in 'Watiting' status and are blocked at consumer poll 
> method
> --
>
> Key: KAFKA-3172
> URL: https://issues.apache.org/jira/browse/KAFKA-3172
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0
> Environment: linux
>Reporter: Dany Benjamin
>Assignee: Neha Narkhede
>Priority: Critical
> Fix For: 0.10.2.0
>
> Attachments: jmx_info.png, jstack.png, lagSample.png
>
>
> When running multiple consumers on same group (400 - for a 400 partitioned 
> topic), the application for all threads blocks at consumer.poll() method. The 
> timeout parameter sent in is 1.
> Stack dump:
> "pool-1-thread-198" #424 prio=5 os_prio=0 tid=0x7f6bb6d53800 nid=0xc349 
> waiting on condition [0x7f63df8f7000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x000605812710> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at 
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> "kafka-producer-network-thread | producer-198" #423 daemon prio=5 os_prio=0 
> tid=0x7f6bb6d52000 nid=0xc348 runnable [0x7f63df9f8000]
>java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> - locked <0x0006058283e8> (a sun.nio.ch.Util$2)
> - locked <0x0006058283d8> (a 
> java.util.Collections$UnmodifiableSet)
> - locked <0x000605828390> (a sun.nio.ch.EPollSelectorImpl)
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
> at org.apache.kafka.common.network.Selector.select(Selector.java:425)
> at org.apache.kafka.common.network.Selector.poll(Selector.java:254)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:270)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2016-12-07 Thread Ismael Juma

Thanks for the KIP, Vahid. +1 (binding)

On Mon, Dec 5, 2016 at 6:16 PM, Vahid S Hashemian  wrote:

> Happy Monday,
>
> I'd like to start voting on KIP-88 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 88%3A+OffsetFetch+Protocol+Update
> ).
> The discussion thread can be found here:
> https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html
>
> Thank you for your feedback.
>
> Regards,
> --Vahid
>
>

[jira] [Assigned] (KAFKA-4441) Kafka Monitoring is incorrect during rapid topic creation and deletion

2016-12-07 Thread Edoardo Comar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar reassigned KAFKA-4441:


Assignee: Edoardo Comar

> Kafka Monitoring is incorrect during rapid topic creation and deletion
> --
>
> Key: KAFKA-4441
> URL: https://issues.apache.org/jira/browse/KAFKA-4441
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Tom Crayford
>Assignee: Edoardo Comar
>Priority: Minor
>
> Kafka reports several metrics off the state of partitions:
> UnderReplicatedPartitions
> PreferredReplicaImbalanceCount
> OfflinePartitionsCount
> All of these metrics trigger when rapidly creating and deleting topics in a 
> tight loop, although the actual causes of the metrics firing are from topics 
> that are undergoing creation/deletion, and the cluster is otherwise stable.
> Looking through the source code, topic deletion goes through an asynchronous 
> state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/TopicDeletionManager.scala#L35.
> However, the metrics do not know about the progress of this state machine: 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/controller/KafkaController.scala#L185
>  
> I believe the fix to this is relatively simple - we need to make the metrics 
> know that a topic is currently undergoing deletion or creation, and only 
> include topics that are "stable"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-07 Thread Michael Noll

> There is no outer-join for GlobalKTables as the tables may be keyed
> differently. So you need to use the key from the left side of the join
> along with the KeyValueMapper to resolve the right side of the join. This
> wont work the other way around.

Care to elaborate why it won't work the other way around?  If, for example,
we swapped the call from leftTable.join(rightTable) to
rightTable.join(leftTable), that join would work, too.  Perhaps I am
missing something though. :-)




On Wed, Dec 7, 2016 at 10:39 AM, Damian Guy  wrote:

> Hi Matthias,
>
> Thanks for the feedback.
>
> There is no outer-join for GlobalKTables as the tables may be keyed
> differently. So you need to use the key from the left side of the join
> along with the KeyValueMapper to resolve the right side of the join. This
> wont work the other way around.
>
> On the bootstrapping concern. If the application is failing before
> bootstrapping finishes, the problem is likely to be related to a terminal
> exception, i.e., running out of disk space, corrupt state stores etc. In
> these cases, we wouldn't want the application to continue. So i think this
> is ok.
>
> Thanks,
> Damian
>
> On Tue, 6 Dec 2016 at 21:56 Matthias J. Sax  wrote:
>
> > Thanks for the KIP Damian. Very nice motivating example!
> >
> > A few comments:
> >
> >  - why is there no outer-join for GlobalKTables
> >  - on bootstrapping GlobalKTable, could it happen that this never
> > finishes if the application fails before bootstrapping finishes and new
> > data gets written at the same time? Do we need to guard against this
> > (seems to be a very rare corner case, so maybe not required)?
> >
> >
> > -Matthias
> >
> >
> > On 12/6/16 2:09 AM, Damian Guy wrote:
> > > Hi all,
> > >
> > > I would like to start the discussion on KIP-99:
> > >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=67633649
> > >
> > > Looking forward to your feedback.
> > >
> > > Thanks,
> > > Damian
> > >
> >
> >
>

Re: [VOTE] KIP-94: Session Windows

2016-12-07 Thread Michael Noll

+1 (non-binding)

On Wed, Dec 7, 2016 at 1:21 AM, Sriram Subramanian  wrote:

> +1 (binding)
>
> On Tue, Dec 6, 2016 at 3:43 PM, Ewen Cheslack-Postava 
> wrote:
>
> > +1 binding
> >
> > -Ewen
> >
> > On Tue, Dec 6, 2016 at 3:21 PM, Bill Bejeck  wrote:
> >
> > > +1
> > >
> > > On Tue, Dec 6, 2016 at 4:55 PM, Guozhang Wang 
> > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Tue, Dec 6, 2016 at 9:07 AM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On 12/6/16 7:40 AM, Eno Thereska wrote:
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks
> > > > > > Eno
> > > > > >> On 6 Dec 2016, at 12:09, Damian Guy 
> wrote:
> > > > > >>
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I'd like to start the vote for KIP-94:
> > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 94+Session+Windows
> > > > > >>
> > > > > >> There is a PR for it here: https://github.com/apache/
> > > kafka/pull/2166
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Damian
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Ewen
> >
>

[jira] [Updated] (KAFKA-4502) Exception during startup, append offset to swap file

2016-12-07 Thread Harald Kirsch (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harald Kirsch updated KAFKA-4502:
-
Labels:   (was: log)

> Exception during startup, append offset to swap file
> 
>
> Key: KAFKA-4502
> URL: https://issues.apache.org/jira/browse/KAFKA-4502
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0
> Environment: Windows Server
>Reporter: Harald Kirsch
>
> During startup, the Kafka server throws the exception shown below with a bit 
> of pre-context.
> We are using the so-called SiphonRelease 
> (https://github.com/Microsoft/Kafka/tree/SiphonRelease, 
> https://issues.apache.org/jira/browse/KAFKA-1194?focusedCommentId=15702991) 
> which tries to circumvent problems of the logCleaner to rename and delete 
> segments still memory mapped by the broker. 
> The trouble seems to be as follows: since in the SiphonRelease the LogCleaner 
> still sometimes crashes, we have a monitoring script that detects this and 
> then restarts the Windows Service (apache procrun based) running Kafka. My 
> hunch is that the combination of restart-service/procrun does not allow Kafka 
> to shut down smoothly, since when it starts we get tons of messages like:
> {noformat}
> [2016-12-05 23:30:20,704] WARN Found a corrupted index file due to 
> requirement failed: Corrupt index found, index file 
> (d:\Search\kafka\fileshare-0\00084814.index) has non-zero size 
> but the last offset is 84814 which is no larger than the base offset 84814.}. 
> deleting d:\Search\kafka\fileshare-0\00084814.timeindex, 
> d:\Search\kafka\fileshare-0\00084814.index and rebuilding 
> index... (kafka.log.Log)
> {noformat}
> While this seems fixable by Kafka, my hunch is that a leftover .swap file 
> then breaks it as follows:
> {noformat}
> [2016-12-05 23:32:34,676] INFO Found log file 
> d:\Search\kafka\windream-4\.log.swap from interrupted 
> swap operation, repairing. (kafka.log.Log)
> [2016-12-05 23:32:34,957] ERROR There was an error in one of the threads 
> during logs loading: kafka.common.InvalidOffsetException: Attempt to append 
> an offset (110460) to position 182 no larger than the last offset appended 
> (110735) to d:\Search\kafka\windream-4\.index.swap. 
> (kafka.log.LogManager)
> [2016-12-05 23:32:34,957] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> kafka.common.InvalidOffsetException: Attempt to append an offset (110460) to 
> position 182 no larger than the last offset appended (110735) to 
> d:\Search\kafka\windream-4\.index.swap.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:132)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:122)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:122)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:122)
>   at kafka.log.LogSegment.recover(LogSegment.scala:224)
>   at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:248)
>   at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:232)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
>   at kafka.log.Log.loadSegments(Log.scala:232)
>   at kafka.log.Log.(Log.scala:108)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (KAFKA-4502) Exception during startup, append offset to swap file

2016-12-07 Thread Harald Kirsch (JIRA)

Harald Kirsch created KAFKA-4502:


 Summary: Exception during startup, append offset to swap file
 Key: KAFKA-4502
 URL: https://issues.apache.org/jira/browse/KAFKA-4502
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.2.0
 Environment: Windows Server
Reporter: Harald Kirsch


During startup, the Kafka server throws the exception shown below with a bit of 
pre-context.

We are using the so-called SiphonRelease 
(https://github.com/Microsoft/Kafka/tree/SiphonRelease, 
https://issues.apache.org/jira/browse/KAFKA-1194?focusedCommentId=15702991) 
which tries to circumvent problems of the logCleaner to rename and delete 
segments still memory mapped by the broker. 

The trouble seems to be as follows: since in the SiphonRelease the LogCleaner 
still sometimes crashes, we have a monitoring script that detects this and then 
restarts the Windows Service (apache procrun based) running Kafka. My hunch is 
that the combination of restart-service/procrun does not allow Kafka to shut 
down smoothly, since when it starts we get tons of messages like:

{noformat}
[2016-12-05 23:30:20,704] WARN Found a corrupted index file due to requirement 
failed: Corrupt index found, index file 
(d:\Search\kafka\fileshare-0\00084814.index) has non-zero size but 
the last offset is 84814 which is no larger than the base offset 84814.}. 
deleting d:\Search\kafka\fileshare-0\00084814.timeindex, 
d:\Search\kafka\fileshare-0\00084814.index and rebuilding index... 
(kafka.log.Log)
{noformat}

While this seems fixable by Kafka, my hunch is that a leftover .swap file then 
breaks it as follows:
{noformat}
[2016-12-05 23:32:34,676] INFO Found log file 
d:\Search\kafka\windream-4\.log.swap from interrupted swap 
operation, repairing. (kafka.log.Log)
[2016-12-05 23:32:34,957] ERROR There was an error in one of the threads during 
logs loading: kafka.common.InvalidOffsetException: Attempt to append an offset 
(110460) to position 182 no larger than the last offset appended (110735) to 
d:\Search\kafka\windream-4\.index.swap. 
(kafka.log.LogManager)
[2016-12-05 23:32:34,957] FATAL Fatal error during KafkaServer startup. Prepare 
to shutdown (kafka.server.KafkaServer)
kafka.common.InvalidOffsetException: Attempt to append an offset (110460) to 
position 182 no larger than the last offset appended (110735) to 
d:\Search\kafka\windream-4\.index.swap.
at 
kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:132)
at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:122)
at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:122)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.OffsetIndex.append(OffsetIndex.scala:122)
at kafka.log.LogSegment.recover(LogSegment.scala:224)
at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:248)
at kafka.log.Log$$anonfun$loadSegments$5.apply(Log.scala:232)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:74)
at kafka.log.Log.loadSegments(Log.scala:232)
at kafka.log.Log.(Log.scala:108)
at 
kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:151)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:58)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] KIP-88: OffsetFetch Protocol Update

2016-12-07 Thread Edoardo Comar

+1 (non-binding)
--
Edoardo Comar
IBM MessageHub
eco...@uk.ibm.com
IBM UK Ltd, Hursley Park, SO21 2JN

IBM United Kingdom Limited Registered in England and Wales with number 
741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 
3AU



From:   "Vahid S Hashemian" 
To: dev 
Date:   05/12/2016 18:17
Subject:[VOTE] KIP-88: OffsetFetch Protocol Update



Happy Monday,

I'd like to start voting on KIP-88 (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-88%3A+OffsetFetch+Protocol+Update

).
The discussion thread can be found here: 
https://www.mail-archive.com/dev@kafka.apache.org/msg59608.html

Thank you for your feedback.
 
Regards,
--Vahid




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

[jira] [Created] (KAFKA-4501) Support Java 9

2016-12-07 Thread Ismael Juma (JIRA)

Ismael Juma created KAFKA-4501:
--

 Summary: Support Java 9
 Key: KAFKA-4501
 URL: https://issues.apache.org/jira/browse/KAFKA-4501
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
 Fix For: 0.10.2.0


Java 9 is scheduled to be released in July 2017. We should support it.

The new module system enforces access control and things like `setAccessible` 
cannot, by default, be used to circumvent access control in other modules. 
There are command-line flags available to disable the behaviour on a module by 
module basis.

Right now, Gradle fails with the latest Java 9 snapshot and Scala 2.12.1 is 
required if building with Java 9. So we are blocked until the Gradle issues are 
fixed.

I set the "Fix version" to 0.10.2.0, but it's likely to happen for the release 
after that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] KIP-99: Add Global Tables to Kafka Streams

2016-12-07 Thread Damian Guy

Hi Matthias,

Thanks for the feedback.

There is no outer-join for GlobalKTables as the tables may be keyed
differently. So you need to use the key from the left side of the join
along with the KeyValueMapper to resolve the right side of the join. This
wont work the other way around.

On the bootstrapping concern. If the application is failing before
bootstrapping finishes, the problem is likely to be related to a terminal
exception, i.e., running out of disk space, corrupt state stores etc. In
these cases, we wouldn't want the application to continue. So i think this
is ok.

Thanks,
Damian

On Tue, 6 Dec 2016 at 21:56 Matthias J. Sax  wrote:

> Thanks for the KIP Damian. Very nice motivating example!
>
> A few comments:
>
>  - why is there no outer-join for GlobalKTables
>  - on bootstrapping GlobalKTable, could it happen that this never
> finishes if the application fails before bootstrapping finishes and new
> data gets written at the same time? Do we need to guard against this
> (seems to be a very rare corner case, so maybe not required)?
>
>
> -Matthias
>
>
> On 12/6/16 2:09 AM, Damian Guy wrote:
> > Hi all,
> >
> > I would like to start the discussion on KIP-99:
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67633649
> >
> > Looking forward to your feedback.
> >
> > Thanks,
> > Damian
> >
>
>

[GitHub] kafka pull request #2219: HOTFIX: Disable this test until it's fixed (0.10.1...

2016-12-07 Thread enothereska

Github user enothereska closed the pull request at:

https://github.com/apache/kafka/pull/2219


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

77 matches

Mail list logo