[GitHub] kafka-site pull request #60: Update delivery semantics section for KIP-98

2017-06-09 Thread hachikuji
Github user hachikuji commented on a diff in the pull request:

https://github.com/apache/kafka-site/pull/60#discussion_r121249550
  
--- Diff: 0110/design.html ---
@@ -261,15 +262,23 @@
 It can read the messages, process the messages, and finally save 
its position. In this case there is a possibility that the consumer process 
crashes after processing messages but before saving its position.
 In this case when the new process takes over the first few messages it 
receives will already have been processed. This corresponds to the 
"at-least-once" semantics in the case of consumer failure. In many cases
 messages have a primary key and so the updates are idempotent 
(receiving the same message twice just overwrites a record with another copy of 
itself).
-So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to co-ordinate the consumer's position with
+
+
+So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to coordinate the consumer's position with
 what is actually stored as output. The classic way of achieving this 
would be to introduce a two-phase commit between the storage for the consumer 
position and the storage of the consumers output. But this can be
 handled more simply and generally by simply letting the consumer store 
its offset in the same place as its output. This is better because many of the 
output systems a consumer might want to write to will not
 support a two-phase commit. As an example of this, our Hadoop ETL that 
populates data in HDFS stores its offsets in HDFS with the data it reads so 
that it is guaranteed that either data and offsets are both updated
 or neither is. We follow similar patterns for many other data systems 
which require these stronger semantics and for which the messages do not have a 
primary key to allow for deduplication.
-
 
-So effectively Kafka guarantees at-least-once delivery by default and 
allows the user to implement at most once delivery by disabling retries on the 
producer and committing its offset prior to processing a batch of
-messages. Exactly-once delivery requires co-operation with the 
destination storage system but Kafka provides the offset which makes 
implementing this straight-forward.
+A special case is when the output system is just another Kafka topic 
(e.g. in a Kafka Streams application). Here we can leverage the new 
transactional producer capabilities in 0.11.0.0 that were mentioned above.
+Since the consumer's position is stored as a message in a topic, we 
can ensure that that topic is included in the same transaction as the output 
topics receiving the processed data. If the transaction is aborted,
+the consumer's position will revert to its old value and none of the 
output data will be visible to consumers. To enable this, consumers support an 
"isolation level" to achieve this. In the default
--- End diff --

Thanks. I will try to clarify these lines.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka-site pull request #60: Update delivery semantics section for KIP-98

2017-06-09 Thread hachikuji
Github user hachikuji commented on a diff in the pull request:

https://github.com/apache/kafka-site/pull/60#discussion_r121249508
  
--- Diff: 0110/design.html ---
@@ -261,15 +262,23 @@
 It can read the messages, process the messages, and finally save 
its position. In this case there is a possibility that the consumer process 
crashes after processing messages but before saving its position.
 In this case when the new process takes over the first few messages it 
receives will already have been processed. This corresponds to the 
"at-least-once" semantics in the case of consumer failure. In many cases
 messages have a primary key and so the updates are idempotent 
(receiving the same message twice just overwrites a record with another copy of 
itself).
-So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to co-ordinate the consumer's position with
+
+
+So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to coordinate the consumer's position with
 what is actually stored as output. The classic way of achieving this 
would be to introduce a two-phase commit between the storage for the consumer 
position and the storage of the consumers output. But this can be
 handled more simply and generally by simply letting the consumer store 
its offset in the same place as its output. This is better because many of the 
output systems a consumer might want to write to will not
 support a two-phase commit. As an example of this, our Hadoop ETL that 
populates data in HDFS stores its offsets in HDFS with the data it reads so 
that it is guaranteed that either data and offsets are both updated
 or neither is. We follow similar patterns for many other data systems 
which require these stronger semantics and for which the messages do not have a 
primary key to allow for deduplication.
-
 
-So effectively Kafka guarantees at-least-once delivery by default and 
allows the user to implement at most once delivery by disabling retries on the 
producer and committing its offset prior to processing a batch of
-messages. Exactly-once delivery requires co-operation with the 
destination storage system but Kafka provides the offset which makes 
implementing this straight-forward.
+A special case is when the output system is just another Kafka topic 
(e.g. in a Kafka Streams application). Here we can leverage the new 
transactional producer capabilities in 0.11.0.0 that were mentioned above.
+Since the consumer's position is stored as a message in a topic, we 
can ensure that that topic is included in the same transaction as the output 
topics receiving the processed data. If the transaction is aborted,
+the consumer's position will revert to its old value and none of the 
output data will be visible to consumers. To enable this, consumers support an 
"isolation level" to achieve this. In the default
+"read_uncommitted" mode, all messages are visible to consumers even if 
they were part of an aborted transaction, but in "read_committed" mode, the 
consumer will only return data from transactions which were committed
+(and any messages which were not part of any transaction).
+
+So effectively Kafka guarantees at-least-once delivery by default, and 
allows the user to implement at-most-once delivery by disabling retries on the 
producer and committing its offset prior to processing a batch of
--- End diff --

Good point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5425) Kafka replication support all log segment files sync with leader

2017-06-09 Thread Pengwei (JIRA)
Pengwei created KAFKA-5425:
--

 Summary: Kafka replication support all log segment files sync with 
leader
 Key: KAFKA-5425
 URL: https://issues.apache.org/jira/browse/KAFKA-5425
 Project: Kafka
  Issue Type: Improvement
Reporter: Pengwei
Priority: Minor


Currently kafka replication only support follower sync the leader with the 
latest  fetch offset, it means offsets smaller than this fetch offset's 
messages are already sync.  After complete sync with the leader, it will update 
the fetch offset to log end offset.

But if the log segments which latest offset is smaller than current fetch 
offset are being delete, the replication will not sync these log segments back 
to kafka.   In this case, there is a risk of losing messages if the follow is 
becoming a leader,  consumer begin to consume from begin to end will found some 
messages are lost.

Can we  improve the kafka replication mechanism to ensure follow''s all log 
segments are the same as the leader? Even if some log segments are delete,  
will try to recover from the leader



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5342) Distinguish abortable failures in transactional producer

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5342:

Fix Version/s: (was: 0.11.0.0)
   0.11.0.1

> Distinguish abortable failures in transactional producer
> 
>
> Key: KAFKA-5342
> URL: https://issues.apache.org/jira/browse/KAFKA-5342
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.11.0.1
>
>
> The transactional producer distinguishes two classes of user-visible errors:
> 1. Abortable errors: these are errors which are fatal to the ongoing 
> transaction, but which can be successfully aborted. Essentially any error in 
> which the producer can still expect to successfully send EndTxn to the 
> transaction coordinator is abortable.
> 2. Fatal errors: any error which is not abortable is fatal. For example, a 
> transactionalId authorization error is fatal because it would also prevent 
> the TC from receiving the EndTxn request.
> At the moment, it's not clear how the user would know how they should handle 
> a given failure. One option is to add an exception type to indicate which 
> errors are abortable (e.g. AbortableKafkaException). Then any other exception 
> could be considered fatal.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5286) Producer should await transaction completion in close

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5286:

Fix Version/s: (was: 0.11.0.0)
   0.11.0.1

> Producer should await transaction completion in close
> -
>
> Key: KAFKA-5286
> URL: https://issues.apache.org/jira/browse/KAFKA-5286
> Project: Kafka
>  Issue Type: Sub-task
>  Components: clients, core, producer 
>Reporter: Jason Gustafson
> Fix For: 0.11.0.1
>
>
> We should wait at least as long as the timeout for a transaction which has 
> begun completion (commit or abort) to be finished. Tricky thing is whether we 
> should abort a transaction which is in progress. It seems reasonable since 
> that's the coordinator will either timeout and abort the transaction or the 
> next producer using the same transactionalId will fence the producer and 
> abort the transaction. In any case, the transaction will be aborted, so 
> perhaps we should do it proactively.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-trunk-jdk8 #1686

2017-06-09 Thread Apache Jenkins Server
See 




[jira] [Commented] (KAFKA-5418) ZkUtils.getAllPartitions() may fail if a topic is marked for deletion

2017-06-09 Thread huxihx (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045321#comment-16045321
 ] 

huxihx commented on KAFKA-5418:
---

Is it a duplicate of 
[KAFKA-1019|https://issues.apache.org/jira/browse/KAFKA-1019]?

> ZkUtils.getAllPartitions() may fail if a topic is marked for deletion
> -
>
> Key: KAFKA-5418
> URL: https://issues.apache.org/jira/browse/KAFKA-5418
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1, 0.10.2.1
>Reporter: Edoardo Comar
>
> Running {{ZkUtils.getAllPartitions()}} on a cluster which had a topic stuck 
> in the 'marked for deletion' state 
> so it was a child of {{/brokers/topics}}
> but it had no children, i.e. the path {{/brokers/topics/thistopic/partitions}}
> did not exist, throws a ZkNoNodeException while iterating:
> {noformat}
> rg.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /brokers/topics/xyzblahfoo/partitions
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
>   at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
>   at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
>   at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
>   at 
> kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
>   at 
> kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
> ...
>   at java.lang.Thread.run(Thread.java:809)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /brokers/topics/xyzblahfoo/partitions
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
>   at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114)
>   at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678)
>   at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985)
>   ... 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Joe Stein
Congrats!


~ Joe Stein

On Fri, Jun 9, 2017 at 6:49 PM, Neha Narkhede  wrote:

> Well deserved. Congratulations Damian!
>
> On Fri, Jun 9, 2017 at 1:34 PM Guozhang Wang  wrote:
>
> > Hello all,
> >
> >
> > The PMC of Apache Kafka is pleased to announce that we have invited
> Damian
> > Guy as a committer to the project.
> >
> > Damian has made tremendous contributions to Kafka. He has not only
> > contributed a lot into the Streams api, but have also been involved in
> many
> > other areas like the producer and consumer clients, broker-side
> > coordinators (group coordinator and the ongoing transaction coordinator).
> > He has contributed more than 100 patches so far, and have been driving
> on 6
> > KIP contributions.
> >
> > More importantly, Damian has been a very prolific reviewer on open PRs
> and
> > has been actively participating on community activities such as email
> lists
> > and slack overflow questions. Through his code contributions and reviews,
> > Damian has demonstrated good judgement on system design and code
> qualities,
> > especially on thorough unit test coverages. We believe he will make a
> great
> > addition to the committers of the community.
> >
> >
> > Thank you for your contributions, Damian!
> >
> >
> > -- Guozhang, on behalf of the Apache Kafka PMC
> >
> --
> Thanks,
> Neha
>


Re: [DISCUSS] KIP-113: Support replicas movement between log directories

2017-06-09 Thread Jun Rao
Just a few comments on this.

1. One of the issues with using RAID 0 is that a single disk failure causes
a hard failure of the broker. Hard failure increases the unavailability
window for all the partitions on the failed broker, which includes the
failure detection time (tied to ZK session timeout right now) and leader
election time by the controller. If we support JBOD natively, when a single
disk fails, only partitions on the failed disk will experience a hard
failure. The availability for partitions on the rest of the disks are not
affected.

2. For running things on the Cloud such as AWS. Currently, each EBS volume
has a throughout limit of about 300MB/sec. If you get an enhanced EC2
instance, you can get 20Gb/sec network. To saturate the network, you may
need about 7 EBS volumes. So, being able to support JBOD in the Cloud is
still potentially useful.

3. On the benefit of balancing data across disks within the same broker.
Data imbalance can happen across brokers as well as across disks within the
same broker. Balancing the data across disks within the broker has the
benefit of saving network bandwidth as Dong mentioned. So, if intra broker
load balancing is possible, it's probably better to avoid the more
expensive inter broker load balancing. One of the reasons for disk
imbalance right now is that partitions within a broker are assigned to
disks just based on the partition count. So, it does seem possible for
disks to get imbalanced from time to time. If someone can share some stats
for that in practice, that will be very helpful.

Thanks,

Jun


On Wed, Jun 7, 2017 at 2:30 PM, Dong Lin  wrote:

> Hey Sriram,
>
> I think there is one way to explain why the ability to move replica between
> disks can save space. Let's say the load is distributed to disks
> independent of the broker. Sooner or later, the load imbalance will exceed
> a threshold and we will need to rebalance load across disks. Now our
> questions is whether our rebalancing algorithm will be able to take
> advantage of locality by moving replicas between disks on the same broker.
>
> Say for a given disk, there is 20% probability it is overloaded, 20%
> probability it is underloaded, and 60% probability its load is around the
> expected average load if the cluster is well balanced. Then for a broker of
> 10 disks, we would 2 disks need to have in-bound replica movement, 2 disks
> need to have out-bound replica movement, and 6 disks do not need replica
> movement. Thus we would expect KIP-113 to be useful since we will be able
> to move replica from the two over-loaded disks to the two under-loaded
> disks on the same broKER. Does this make sense?
>
> Thanks,
> Dong
>
>
>
>
>
>
> On Wed, Jun 7, 2017 at 2:12 PM, Dong Lin  wrote:
>
> > Hey Sriram,
> >
> > Thanks for raising these concerns. Let me answer these questions below:
> >
> > - The benefit of those additional complexity to move the data stored on a
> > disk within the broker is to avoid network bandwidth usage. Creating
> > replica on another broker is less efficient than creating replica on
> > another disk in the same broker IF there is actually lightly-loaded disk
> on
> > the same broker.
> >
> > - In my opinion the rebalance algorithm would this: 1) we balance the
> load
> > across brokers using the same algorithm we are using today. 2) we balance
> > load across disk on a given broker using a greedy algorithm, i.e. move
> > replica from the overloaded disk to lightly loaded disk. The greedy
> > algorithm would only consider the capacity and replica size. We can
> improve
> > it to consider throughput in the future.
> >
> > - With 30 brokers with each having 10 disks, using the rebalancing
> algorithm,
> > the chances of choosing disks within the broker can be high. There will
> > always be load imbalance across disks of the same broker for the same
> > reason that there will always be load imbalance across brokers. The
> > algorithm specified above will take advantage of the locality, i.e. first
> > balance load across disks of the same broker, and only balance across
> > brokers if some brokers are much more loaded than others.
> >
> > I think it is useful to note that the load imbalance across disks of the
> > same broker is independent of the load imbalance across brokers. Both are
> > guaranteed to happen in any Kafka cluster for the same reason, i.e.
> > variation in the partition size. Say broker 1 have two disks that are 80%
> > loaded and 20% loaded. And broker 2 have two disks that are also 80%
> > loaded and 20%. We can balance them without inter-broker traffic with
> > KIP-113.  This is why I think KIP-113 can be very useful.
> >
> > Do these explanation sound reasonable?
> >
> > Thanks,
> > Dong
> >
> >
> > On Wed, Jun 7, 2017 at 1:33 PM, Sriram Subramanian 
> > wrote:
> >
> >> Hey Dong,
> >>
> >> Thanks for the explanation. I don't think anyone is denying that we
> should
> >> rebalance at the disk 

[jira] [Updated] (KAFKA-5417) Clients get inconsistent connection states when SASL/SSL connection is marked CONECTED and DISCONNECTED at the same time

2017-06-09 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5417:
---
Fix Version/s: 0.11.0.0

> Clients get inconsistent connection states when SASL/SSL connection is marked 
> CONECTED and DISCONNECTED at the same time
> 
>
> Key: KAFKA-5417
> URL: https://issues.apache.org/jira/browse/KAFKA-5417
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.10.2.1
>Reporter: dongeforever
>Priority: Critical
> Fix For: 0.11.0.0, 0.10.2.2
>
>
> Assume the SASL or SSL Connection is established successfully, but be reset 
> when writing data into it (This will happen frequently in LVS Proxy 
> environment )
> Selecter poll will act like follows:
> try { 
>...
> //finish connect successfully
> if (channel.finishConnect()) {
> this.connected.add(channel.id());(1)
>  }
>  //the prepare will fail, for sasl or ssl will do handshake and write data
>  //throw exception
> if (channel.isConnected() && !channel.ready())
> channel.prepare();
>
> } catch {
>  close(channel);
>   this.disconnected.add(channel.id());  (2)
> }
> The code line named (1) and (2) will mark the connection CONNECTED and 
> DISCONNECTED at the same time.
> And the NetworkClient poll will:
> handleDisconnections(responses, updatedNow);   //remove the channel
> handleConnections();   //mark the channel CONNECTED
> So get the inconsistent ConnectionStates, and such state will block the 
> messages sent into this channel in Sender:
> For the channel will never be ready and never be connected again:
> public boolean ready(Node node, long now) {
> if (node.isEmpty())
> throw new IllegalArgumentException("Cannot connect to empty node 
> " + node);
> //return false, for the channel dose not exist actually
> if (isReady(node, now))
> return true;
> //return false, for the channel is marked CONNECTED
> if (connectionStates.canConnect(node.idString(), now))
> // if we are interested in sending to a node and we don't have a 
> connection to it, initiate one
> initiateConnect(node, now);
> return false;
>  }
> So all messages sent to such channel will be expired eventually



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5416) TransactionCoordinator doesn't complete transition to CompleteCommit

2017-06-09 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5416:
---
Status: Patch Available  (was: Reopened)

> TransactionCoordinator doesn't complete transition to CompleteCommit
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> Here are the ownership changes for __transaction_state-37: 
> {noformat}
> ./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 15 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 6 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
> Log Manager 1]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> 

[jira] [Commented] (KAFKA-5424) KafkaConsumer.listTopics() throws Exception when unauthorized topics exist in cluster

2017-06-09 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045301#comment-16045301
 ] 

Ismael Juma commented on KAFKA-5424:


Thanks for the report. Can you specify the version where you've seen this? The 
broker doesn't return topics that the user has no authorization for so the 
method should work fine as it is. Some older versions of the broker did not 
behave like this though.

> KafkaConsumer.listTopics() throws Exception when unauthorized topics exist in 
> cluster
> -
>
> Key: KAFKA-5424
> URL: https://issues.apache.org/jira/browse/KAFKA-5424
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Reporter: Mike Fagan
>
> KafkaConsumer.listTopics() internally calls Fetcher. 
> getAllTopicMetadata(timeout) and this method will throw a 
> TopicAuthorizationException when there exists an unauthorized topic in the 
> cluster. 
> This behavior runs counter to the API docs and makes listTopics() unusable 
> except in the case of the consumer is authorized for every single topic in 
> the cluster. 
> A potentially better approach is to have Fetcher implement a new method 
> getAuthorizedTopicMetadata(timeout)  and have KafkaConsumer call this method 
> instead of getAllTopicMetadata(timeout) from within KafkaConsumer.listTopics()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka-site pull request #60: Update delivery semantics section for KIP-98

2017-06-09 Thread ijuma
Github user ijuma commented on a diff in the pull request:

https://github.com/apache/kafka-site/pull/60#discussion_r121246230
  
--- Diff: 0110/design.html ---
@@ -261,15 +262,23 @@
 It can read the messages, process the messages, and finally save 
its position. In this case there is a possibility that the consumer process 
crashes after processing messages but before saving its position.
 In this case when the new process takes over the first few messages it 
receives will already have been processed. This corresponds to the 
"at-least-once" semantics in the case of consumer failure. In many cases
 messages have a primary key and so the updates are idempotent 
(receiving the same message twice just overwrites a record with another copy of 
itself).
-So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to co-ordinate the consumer's position with
+
+
+So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to coordinate the consumer's position with
 what is actually stored as output. The classic way of achieving this 
would be to introduce a two-phase commit between the storage for the consumer 
position and the storage of the consumers output. But this can be
 handled more simply and generally by simply letting the consumer store 
its offset in the same place as its output. This is better because many of the 
output systems a consumer might want to write to will not
 support a two-phase commit. As an example of this, our Hadoop ETL that 
populates data in HDFS stores its offsets in HDFS with the data it reads so 
that it is guaranteed that either data and offsets are both updated
 or neither is. We follow similar patterns for many other data systems 
which require these stronger semantics and for which the messages do not have a 
primary key to allow for deduplication.
-
 
-So effectively Kafka guarantees at-least-once delivery by default and 
allows the user to implement at most once delivery by disabling retries on the 
producer and committing its offset prior to processing a batch of
-messages. Exactly-once delivery requires co-operation with the 
destination storage system but Kafka provides the offset which makes 
implementing this straight-forward.
+A special case is when the output system is just another Kafka topic 
(e.g. in a Kafka Streams application). Here we can leverage the new 
transactional producer capabilities in 0.11.0.0 that were mentioned above.
+Since the consumer's position is stored as a message in a topic, we 
can ensure that that topic is included in the same transaction as the output 
topics receiving the processed data. If the transaction is aborted,
+the consumer's position will revert to its old value and none of the 
output data will be visible to consumers. To enable this, consumers support an 
"isolation level" to achieve this. In the default
+"read_uncommitted" mode, all messages are visible to consumers even if 
they were part of an aborted transaction, but in "read_committed" mode, the 
consumer will only return data from transactions which were committed
+(and any messages which were not part of any transaction).
+
+So effectively Kafka guarantees at-least-once delivery by default, and 
allows the user to implement at-most-once delivery by disabling retries on the 
producer and committing its offset prior to processing a batch of
--- End diff --

I wonder if we should mention the stronger guarantees first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka-site pull request #60: Update delivery semantics section for KIP-98

2017-06-09 Thread ijuma
Github user ijuma commented on a diff in the pull request:

https://github.com/apache/kafka-site/pull/60#discussion_r121246131
  
--- Diff: 0110/design.html ---
@@ -261,15 +262,23 @@
 It can read the messages, process the messages, and finally save 
its position. In this case there is a possibility that the consumer process 
crashes after processing messages but before saving its position.
 In this case when the new process takes over the first few messages it 
receives will already have been processed. This corresponds to the 
"at-least-once" semantics in the case of consumer failure. In many cases
 messages have a primary key and so the updates are idempotent 
(receiving the same message twice just overwrites a record with another copy of 
itself).
-So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to co-ordinate the consumer's position with
+
+
+So what about exactly once semantics (i.e. the thing you actually 
want)? The limitation here is not actually a feature of the messaging system 
but rather the need to coordinate the consumer's position with
 what is actually stored as output. The classic way of achieving this 
would be to introduce a two-phase commit between the storage for the consumer 
position and the storage of the consumers output. But this can be
 handled more simply and generally by simply letting the consumer store 
its offset in the same place as its output. This is better because many of the 
output systems a consumer might want to write to will not
 support a two-phase commit. As an example of this, our Hadoop ETL that 
populates data in HDFS stores its offsets in HDFS with the data it reads so 
that it is guaranteed that either data and offsets are both updated
 or neither is. We follow similar patterns for many other data systems 
which require these stronger semantics and for which the messages do not have a 
primary key to allow for deduplication.
-
 
-So effectively Kafka guarantees at-least-once delivery by default and 
allows the user to implement at most once delivery by disabling retries on the 
producer and committing its offset prior to processing a batch of
-messages. Exactly-once delivery requires co-operation with the 
destination storage system but Kafka provides the offset which makes 
implementing this straight-forward.
+A special case is when the output system is just another Kafka topic 
(e.g. in a Kafka Streams application). Here we can leverage the new 
transactional producer capabilities in 0.11.0.0 that were mentioned above.
+Since the consumer's position is stored as a message in a topic, we 
can ensure that that topic is included in the same transaction as the output 
topics receiving the processed data. If the transaction is aborted,
+the consumer's position will revert to its old value and none of the 
output data will be visible to consumers. To enable this, consumers support an 
"isolation level" to achieve this. In the default
--- End diff --

Seems like this sentence could be clarified a little. `we can ensure that 
that topic is included in the same transaction`, `that` is repeated and it's 
unclear what it means to include a topic in a transaction. Did you mean the 
updates to the topic or something along those lines?

Also `the consumer's position will revert to its old value and none of the 
output data will be visible to consumers`. It may be worth qualifying 
`consumer` and `consumers` in that sentence.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-2526) Console Producer / Consumer's serde config is not working

2017-06-09 Thread huxihx (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045296#comment-16045296
 ] 

huxihx commented on KAFKA-2526:
---

[~guozhang] No, I am thinking [~mgharat] is working on this.

> Console Producer / Consumer's serde config is not working
> -
>
> Key: KAFKA-2526
> URL: https://issues.apache.org/jira/browse/KAFKA-2526
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>Assignee: Mayuresh Gharat
>  Labels: newbie
>
> Although in the console producer one can specify the key value serializer, 
> they are actually not used since 1) it always serialize the input string as 
> String.getBytes (hence always pre-assume the string serializer) and 2) it is 
> actually only passed into the old producer. The same issues exist in console 
> consumer.
> In addition the configs in the console producer is messy: we have 1) some 
> config values exposed as cmd parameters, and 2) some config values in 
> --producer-property and 3) some in --property.
> It will be great to clean the configs up in both console producer and 
> consumer, and put them into a single --property parameter which could 
> possibly take a file to reading in property values as well, and only leave 
> --new-producer as the other command line parameter.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5424) KafkaConsumer.listTopics() throws Exception when unauthorized topics exist in cluster

2017-06-09 Thread Mike Fagan (JIRA)
Mike Fagan created KAFKA-5424:
-

 Summary: KafkaConsumer.listTopics() throws Exception when 
unauthorized topics exist in cluster
 Key: KAFKA-5424
 URL: https://issues.apache.org/jira/browse/KAFKA-5424
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: Mike Fagan


KafkaConsumer.listTopics() internally calls Fetcher. 
getAllTopicMetadata(timeout) and this method will throw a 
TopicAuthorizationException when there exists an unauthorized topic in the 
cluster. 

This behavior runs counter to the API docs and makes listTopics() unusable 
except in the case of the consumer is authorized for every single topic in the 
cluster. 

A potentially better approach is to have Fetcher implement a new method 
getAuthorizedTopicMetadata(timeout)  and have KafkaConsumer call this method 
instead of getAllTopicMetadata(timeout) from within KafkaConsumer.listTopics()





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3288: KAFKA-4661: Improve test coverage UsePreviousTimeO...

2017-06-09 Thread jeyhunkarimov
GitHub user jeyhunkarimov opened a pull request:

https://github.com/apache/kafka/pull/3288

KAFKA-4661: Improve test coverage UsePreviousTimeOnInvalidTimestamp



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeyhunkarimov/kafka KAFKA-4661

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3288


commit 2f407df88b1eeb9c91df235cd3dc07b4e6cdf3ed
Author: Jeyhun Karimov 
Date:   2017-06-10T00:55:23Z

Exception branch tested on UsePreviousTimeOnInvalidTimestamp




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4661) Improve test coverage UsePreviousTimeOnInvalidTimestamp

2017-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045287#comment-16045287
 ] 

ASF GitHub Bot commented on KAFKA-4661:
---

GitHub user jeyhunkarimov opened a pull request:

https://github.com/apache/kafka/pull/3288

KAFKA-4661: Improve test coverage UsePreviousTimeOnInvalidTimestamp



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeyhunkarimov/kafka KAFKA-4661

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3288.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3288


commit 2f407df88b1eeb9c91df235cd3dc07b4e6cdf3ed
Author: Jeyhun Karimov 
Date:   2017-06-10T00:55:23Z

Exception branch tested on UsePreviousTimeOnInvalidTimestamp




> Improve test coverage UsePreviousTimeOnInvalidTimestamp
> ---
>
> Key: KAFKA-4661
> URL: https://issues.apache.org/jira/browse/KAFKA-4661
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Jeyhun Karimov
>Priority: Minor
>
> Exception branch not tested



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4661) Improve test coverage UsePreviousTimeOnInvalidTimestamp

2017-06-09 Thread Jeyhun Karimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeyhun Karimov reassigned KAFKA-4661:
-

Assignee: Jeyhun Karimov

> Improve test coverage UsePreviousTimeOnInvalidTimestamp
> ---
>
> Key: KAFKA-4661
> URL: https://issues.apache.org/jira/browse/KAFKA-4661
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Damian Guy
>Assignee: Jeyhun Karimov
>Priority: Minor
>
> Exception branch not tested



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Build failed in Jenkins: kafka-trunk-jdk7 #2379

2017-06-09 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-5415; Remove timestamp check in completeTransitionTo

--
[...truncated 22.13 MB...]
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:154)
... 53 more

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 FAILED
java.lang.RuntimeException: Failed to create a temp dir
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:174)
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:148)
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:158)
at kafka.utils.TestUtils$.tempDir(TestUtils.scala:87)
at kafka.utils.TestUtils$.createBrokerConfig(TestUtils.scala:230)
at 
unit.kafka.server.KafkaApisTest.createKafkaApis(KafkaApisTest.scala:76)
at 
unit.kafka.server.KafkaApisTest.shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired(KafkaApisTest.scala:135)

Caused by:
java.nio.file.FileSystemException: /tmp/kafka-7879391240810558071: No 
space left on device
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:383)
at java.nio.file.Files.createDirectory(Files.java:630)
at java.nio.file.TempFileHelper.create(TempFileHelper.java:136)
at 
java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:173)
at java.nio.file.Files.createTempDirectory(Files.java:944)
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:171)
... 6 more

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 FAILED
java.lang.Exception: Unexpected exception, 
expected but 
was

Caused by:
java.lang.RuntimeException: Failed to create a temp dir
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:174)
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:148)
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:158)
at kafka.utils.TestUtils$.tempDir(TestUtils.scala:87)
at kafka.utils.TestUtils$.createBrokerConfig(TestUtils.scala:230)
at 
unit.kafka.server.KafkaApisTest.createKafkaApis(KafkaApisTest.scala:76)
at 
unit.kafka.server.KafkaApisTest.shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported(KafkaApisTest.scala:110)

Caused by:
java.nio.file.FileSystemException: /tmp/kafka-5774244997723750119: 
No space left on device
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.createDirectory(UnixFileSystemProvider.java:383)
at java.nio.file.Files.createDirectory(Files.java:630)
at java.nio.file.TempFileHelper.create(TempFileHelper.java:136)
at 
java.nio.file.TempFileHelper.createTempDirectory(TempFileHelper.java:173)
at java.nio.file.Files.createTempDirectory(Files.java:944)
at 
org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:171)
... 6 more

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 STARTED

unit.kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 FAILED
java.lang.Exception: Unexpected exception, 
expected but 
was

Caused by:
java.lang.RuntimeException: Failed to create a temp dir
at org.apache.kafka.test.TestUtils.tempDirectory(TestUtils.java:174)
at 

Jenkins build is back to normal : kafka-trunk-jdk7 #2378

2017-06-09 Thread Apache Jenkins Server
See 




[jira] [Commented] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045252#comment-16045252
 ] 

ASF GitHub Bot commented on KAFKA-5415:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3286


> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3286: KAFKA-5415: Remove timestamp check in completeTran...

2017-06-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3286


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-5415:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 3286
[https://github.com/apache/kafka/pull/3286]

> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [Vote] KIP-150 - Kafka-Streams Cogroup

2017-06-09 Thread Sriram Subramanian
+1

On Fri, Jun 9, 2017 at 2:24 PM, Jay Kreps  wrote:

> +1
>
> -Jay
>
> On Thu, Jun 8, 2017 at 11:16 AM, Guozhang Wang  wrote:
>
> > I think we can continue on this voting thread.
> >
> > Currently we have one binding vote and 2 non-binging votes. I would like
> to
> > call out for other people especially committers to also take a look at
> this
> > proposal and vote.
> >
> >
> > Guozhang
> >
> >
> > On Wed, Jun 7, 2017 at 6:37 PM, Kyle Winkelman  >
> > wrote:
> >
> > > Just bringing people's attention to the vote thread for my KIP. I
> started
> > > it before another round of discussion happened. Not sure the protocol
> so
> > > someone let me know if I am supposed to restart the vote.
> > > Thanks,
> > > Kyle
> > >
> > > On May 24, 2017 8:49 AM, "Bill Bejeck"  wrote:
> > >
> > > > +1  for the KIP and +1 what Xavier said as well.
> > > >
> > > > On Wed, May 24, 2017 at 3:57 AM, Damian Guy 
> > > wrote:
> > > >
> > > > > Also, +1 for the KIP
> > > > >
> > > > > On Wed, 24 May 2017 at 08:57 Damian Guy 
> > wrote:
> > > > >
> > > > > > +1 to what Xavier said
> > > > > >
> > > > > > On Wed, 24 May 2017 at 06:45 Xavier Léauté 
> > > > wrote:
> > > > > >
> > > > > >> I don't think we should wait for entries from each stream, since
> > > that
> > > > > >> might
> > > > > >> limit the usefulness of the cogroup operator. There are
> instances
> > > > where
> > > > > it
> > > > > >> can be useful to compute something based on data from one or
> more
> > > > > stream,
> > > > > >> without having to wait for all the streams to produce something
> > for
> > > > the
> > > > > >> group. In the example I gave in the discussion, it is possible
> to
> > > > > compute
> > > > > >> impression/auction statistics without having to wait for click
> > data,
> > > > > which
> > > > > >> can typically arrive several minutes late.
> > > > > >>
> > > > > >> We could have a separate discussion around adding inner / outer
> > > > > modifiers
> > > > > >> to each of the streams to decide which fields are optional /
> > > required
> > > > > >> before sending updates if we think that might be useful.
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On Tue, May 23, 2017 at 6:28 PM Guozhang Wang <
> wangg...@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > The proposal LGTM, +1
> > > > > >> >
> > > > > >> > One question I have is about when to send the record to the
> > > resulted
> > > > > >> KTable
> > > > > >> > changelog. For example in your code snippet in the wiki page,
> > > before
> > > > > you
> > > > > >> > see the end result of
> > > > > >> >
> > > > > >> > 1L, Customer[
> > > > > >> >
> > > > > >> >   cart:{Item[no:01], Item[no:03],
> > > Item[no:04]},
> > > > > >> >   purchases:{Item[no:07], Item[no:08]},
> > > > > >> >   wishList:{Item[no:11]}
> > > > > >> >   ]
> > > > > >> >
> > > > > >> >
> > > > > >> > You will firs see
> > > > > >> >
> > > > > >> > 1L, Customer[
> > > > > >> >
> > > > > >> >   cart:{Item[no:01]},
> > > > > >> >   purchases:{},
> > > > > >> >   wishList:{}
> > > > > >> >   ]
> > > > > >> >
> > > > > >> > 1L, Customer[
> > > > > >> >
> > > > > >> >   cart:{Item[no:01]},
> > > > > >> >   purchases:{Item[no:07],Item[no:08]},
> > > > > >> >
> > > > > >> >   wishList:{}
> > > > > >> >   ]
> > > > > >> >
> > > > > >> > 1L, Customer[
> > > > > >> >
> > > > > >> >   cart:{Item[no:01]},
> > > > > >> >   purchases:{Item[no:07],Item[no:08]},
> > > > > >> >
> > > > > >> >   wishList:{}
> > > > > >> >   ]
> > > > > >> >
> > > > > >> > ...
> > > > > >> >
> > > > > >> >
> > > > > >> > I'm wondering if it makes more sense to only start sending the
> > > > update
> > > > > if
> > > > > >> > the corresponding agg-key has seen at least one input from
> each
> > of
> > > > the
> > > > > >> > input stream? Maybe it is out of the scope of this KIP and we
> > can
> > > > make
> > > > > >> it a
> > > > > >> > more general discussion in a separate one.
> > > > > >> >
> > > > > >> >
> > > > > >> > Guozhang
> > > > > >> >
> > > > > >> >
> > > > > >> > On Fri, May 19, 2017 at 8:37 AM, Xavier Léauté <
> > > xav...@confluent.io
> > > > >
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > Hi Kyle, I left a few more comments in the discussion
> thread,
> > if
> > > > you
> > > > > >> > > wouldn't mind taking a look
> > > > > >> > >
> > > > > >> > > On Fri, May 19, 2017 at 5:31 AM Kyle Winkelman <
> > > > > >> winkelman.k...@gmail.com
> > > > > >> > >
> > > > > >> > > wrote:
> > > > > >> > >
> > > > > >> > > > Hello all,
> > > > > >> > > >
> > > > > >> > > > I would like to start the vote on KIP-150.
> > > > > >> > > >

[jira] [Comment Edited] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044885#comment-16044885
 ] 

Guozhang Wang edited comment on KAFKA-5415 at 6/9/17 11:28 PM:
---

[~apurva] [~hachikuji] The root cause for KAFKA-5416 (not this JIRA, but I'd 
still just keep it here for the record) is in a couple folds:

1) First in {{TransactionStateManager#appendTransactionToLog}}, when we 
received an error while trying to append to log (in our case it is 
`NotEnoughReplicas`), we will reset the pending state: {{metadata.pendingState 
= None}}. This behavior is correct for all its callers EXCEPT the one from 
{{TransactionMarkerChannelManager#tryAppendToLog}}, in which {{appendCallback}} 
will retry appending again but this time the pending state has been reset.

So when the retry finally succeed, it will call 
{{TransactionMetadata#completeTransitionTo}} we will throw an exception since 
the pending state is `None`.

2) This exception is thrown all the way to {{KafkaApi#handleFetchRequest}}, 
because it is the thread that handles the follower fetch request, and then 
increment the HW, and then call {{tryCompleteDelayedRequests}} which will throw 
the exception. This exception will be handled in {{handleError}}, in which we 
print an error logging but this is usually not shown since in 
{{log4j.properties}} we commented out {{KafkaApis}} class, and simply returns 
an UNKNOWN error to the follower fetcher. And hence from the logs it is a 
mystery that it seems the callback was not called; in fact it did get called by 
throw an exception that get silently swallowed.

The key log entries the disclose this is in worker7: 

{code}
[2017-06-09 01:16:54,132] DEBUG Partition [__transaction_state,37] on broker 1: 
High watermark for partition [__transaction_state,37] updated to 86 [0 : 14667] 
(kafka.cluster.Partition)
[2017-06-09 01:16:54,132] DEBUG [Replica Manager on Broker 1]: Request key 
__transaction_state-37 unblocked 0 fetch requests. (kafka.server.ReplicaManager)
{code}

Note it does not have any "unblocked x produce requests" after "unblocked 0 
fetch requests", while in {{tryCompleteDelayedRequests}} we always try to 
complete fetch, then produce, then delete requests; and in worker2:

{code}
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,22] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,8] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,21] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,4] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,7] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,9] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,46] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,25] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,35] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,41] to broker 

Build failed in Jenkins: kafka-trunk-jdk8 #1685

2017-06-09 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-5422; Handle multiple transitions to ABORTABLE_ERROR correctly

--
[...truncated 911.44 KB...]
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy70.onOutput(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.results.TestListenerAdapter.output(TestListenerAdapter.java:56)
at sun.reflect.GeneratedMethodAccessor281.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.event.AbstractBroadcastDispatch.dispatch(AbstractBroadcastDispatch.java:42)
at 
org.gradle.internal.event.BroadcastDispatch$SingletonDispatch.dispatch(BroadcastDispatch.java:221)
at 
org.gradle.internal.event.BroadcastDispatch$SingletonDispatch.dispatch(BroadcastDispatch.java:145)
at 
org.gradle.internal.event.AbstractBroadcastDispatch.dispatch(AbstractBroadcastDispatch.java:58)
at 
org.gradle.internal.event.BroadcastDispatch$CompositeDispatch.dispatch(BroadcastDispatch.java:315)
at 
org.gradle.internal.event.BroadcastDispatch$CompositeDispatch.dispatch(BroadcastDispatch.java:225)
at 
org.gradle.internal.event.ListenerBroadcast.dispatch(ListenerBroadcast.java:138)
at 
org.gradle.internal.event.ListenerBroadcast.dispatch(ListenerBroadcast.java:35)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy71.output(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.results.StateTrackingTestResultProcessor.output(StateTrackingTestResultProcessor.java:87)
at 
org.gradle.api.internal.tasks.testing.results.AttachParentTestResultProcessor.output(AttachParentTestResultProcessor.java:48)
at sun.reflect.GeneratedMethodAccessor280.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)
at 
org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:132)
at 
org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:33)
at 
org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:72)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:154)
... 52 more
java.io.IOException: No space left on device
com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on 
device
at com.esotericsoftware.kryo.io.Output.flush(Output.java:156)
at com.esotericsoftware.kryo.io.Output.require(Output.java:134)
at com.esotericsoftware.kryo.io.Output.writeBoolean(Output.java:578)
at 
org.gradle.internal.serialize.kryo.KryoBackedEncoder.writeBoolean(KryoBackedEncoder.java:63)
at 
org.gradle.api.internal.tasks.testing.junit.result.TestOutputStore$Writer.onOutput(TestOutputStore.java:99)
at 
org.gradle.api.internal.tasks.testing.junit.result.TestReportDataCollector.onOutput(TestReportDataCollector.java:141)
at sun.reflect.GeneratedMethodAccessor278.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 

[jira] [Commented] (KAFKA-5416) TransactionCoordinator doesn't complete transition to CompleteCommit

2017-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045195#comment-16045195
 ] 

ASF GitHub Bot commented on KAFKA-5416:
---

GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/3287

KAFKA-5416: Re-prepare transition to CompleteCommit/Abort upon retrying 
append to log

In `TransationStateManager`, we reset the pending state if an error 
occurred while appending to log; this is correct except that for the 
`TransactionMarkerChannelManager`, as it will retry appending to log and if 
eventually it succeeded, the transaction metadata's completing transition will 
throw an IllegalStateException since pending state is None, this will be thrown 
all the way to the `KafkaApis` and be swallowed.

1. When re-enqueueing to the retry append queue, re-prepare transition to 
set its pending state.
2. A bunch of log4j improvements based the debugging experience. The main 
principle is to make sure all error codes that is about to sent to the client 
will be logged, and unnecessary log4j entries to be removed.
3. Also moved some log entries in ReplicationUtils.scala to `trace`: this 
is rather orthogonal to this PR but I found it rather annoying while debugging 
the logs.
4. A couple of unrelated bug fixes as pointed by @hachikuji and @apurvam .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
KHotfix-transaction-coordinator-append-callback

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3287


commit 755f01201774f6fb5ddcdff87caaa78634847ebe
Author: Guozhang Wang 
Date:   2017-06-09T23:02:44Z

re-prepare transition to completeXX upon retrying append to log




> TransactionCoordinator doesn't complete transition to CompleteCommit
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> Here are the ownership changes for __transaction_state-37: 
> {noformat}
> ./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 15 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from 

[GitHub] kafka pull request #3287: KAFKA-5416: Re-prepare transition to CompleteCommi...

2017-06-09 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/3287

KAFKA-5416: Re-prepare transition to CompleteCommit/Abort upon retrying 
append to log

In `TransationStateManager`, we reset the pending state if an error 
occurred while appending to log; this is correct except that for the 
`TransactionMarkerChannelManager`, as it will retry appending to log and if 
eventually it succeeded, the transaction metadata's completing transition will 
throw an IllegalStateException since pending state is None, this will be thrown 
all the way to the `KafkaApis` and be swallowed.

1. When re-enqueueing to the retry append queue, re-prepare transition to 
set its pending state.
2. A bunch of log4j improvements based the debugging experience. The main 
principle is to make sure all error codes that is about to sent to the client 
will be logged, and unnecessary log4j entries to be removed.
3. Also moved some log entries in ReplicationUtils.scala to `trace`: this 
is rather orthogonal to this PR but I found it rather annoying while debugging 
the logs.
4. A couple of unrelated bug fixes as pointed by @hachikuji and @apurvam .

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
KHotfix-transaction-coordinator-append-callback

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3287


commit 755f01201774f6fb5ddcdff87caaa78634847ebe
Author: Guozhang Wang 
Date:   2017-06-09T23:02:44Z

re-prepare transition to completeXX upon retrying append to log




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Comment Edited] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044885#comment-16044885
 ] 

Guozhang Wang edited comment on KAFKA-5415 at 6/9/17 10:59 PM:
---

[~apurva] [~hachikuji] The root cause for KAFKA-5416 (not this JIRA, but I'd 
still just keep it here for the record) is in a couple folds:

1) First in {{TransactionStateManager#appendTransactionToLog}}, when we 
received an error while trying to append to log (in our case it is 
`NotEnoughReplicas`), we will reset the pending state: {{metadata.pendingState 
= None}}. This behavior is correct for all its callers EXCEPT the one from 
{{TransactionMarkerChannelManager#tryAppendToLog}}, in which {{appendCallback}} 
will retry appending again but this time the pending state has been reset.

So when the retry finally succeed, it will call 
{{TransactionMetadata#completeTransitionTo}} we will throw an exception since 
the pending state is `None`.

2) This exception is thrown all the way to {{KafkaApi#handleFetchRequest}}, 
because it is the thread that handles the follower fetch request, and then 
increment the HW, and then call {{tryCompleteDelayedRequests}} which will throw 
the exception. This exception will be handled in {{handleError}}, in which we 
do not print ANY logging but simply returns an error to the follower fetcher. 
And hence from the logs it is a mystery that it seems the callback was not 
called; in fact it did get called by throw an exception that get silently 
swallowed.

The key log entries the disclose this is in worker7: 

{code}
[2017-06-09 01:16:54,132] DEBUG Partition [__transaction_state,37] on broker 1: 
High watermark for partition [__transaction_state,37] updated to 86 [0 : 14667] 
(kafka.cluster.Partition)
[2017-06-09 01:16:54,132] DEBUG [Replica Manager on Broker 1]: Request key 
__transaction_state-37 unblocked 0 fetch requests. (kafka.server.ReplicaManager)
{code}

Note it does not have any "unblocked x produce requests" after "unblocked 0 
fetch requests", while in {{tryCompleteDelayedRequests}} we always try to 
complete fetch, then produce, then delete requests; and in worker2:

{code}
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,22] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,8] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,21] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,4] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,7] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,9] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,46] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,25] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,35] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,41] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when 

[jira] [Updated] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5415:

Status: Patch Available  (was: Open)

> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5422) Multiple produce request failures causes invalid state transition in TransactionManager

2017-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045176#comment-16045176
 ] 

ASF GitHub Bot commented on KAFKA-5422:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3285


> Multiple produce request failures causes invalid state transition in 
> TransactionManager
> ---
>
> Key: KAFKA-5422
> URL: https://issues.apache.org/jira/browse/KAFKA-5422
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> When multiple produce requests fail (for instance when all inflight batches 
> are expired), each will try to transition to ABORTABLE_ERROR. 
> However, only the first transition will succeed, the rest will fail with the 
> following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 
> This will be caught in the sender thread and things will continue. However, 
> the correct thing to do do is to allow multiple transitions to 
> ABORTABLE_ERROR.
> {noformat}
> [2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
> [2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
> Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
> output-topic-0 with base offset offset -1 and error: {}. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Invalid transition attempted from 
> state ABORTABLE_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
> at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5422) Multiple produce request failures causes invalid state transition in TransactionManager

2017-06-09 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-5422:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Issue resolved by pull request 3285
[https://github.com/apache/kafka/pull/3285]

> Multiple produce request failures causes invalid state transition in 
> TransactionManager
> ---
>
> Key: KAFKA-5422
> URL: https://issues.apache.org/jira/browse/KAFKA-5422
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> When multiple produce requests fail (for instance when all inflight batches 
> are expired), each will try to transition to ABORTABLE_ERROR. 
> However, only the first transition will succeed, the rest will fail with the 
> following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 
> This will be caught in the sender thread and things will continue. However, 
> the correct thing to do do is to allow multiple transitions to 
> ABORTABLE_ERROR.
> {noformat}
> [2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
> [2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
> Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
> output-topic-0 with base offset offset -1 and error: {}. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Invalid transition attempted from 
> state ABORTABLE_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
> at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3285: KAFKA-5422: Handle multiple transitions to ABORTAB...

2017-06-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3285


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045170#comment-16045170
 ] 

ASF GitHub Bot commented on KAFKA-5415:
---

GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/3286

KAFKA-5415: Remove timestamp check in completeTransitionTo

This assertion is hard to get right because the system time can roll 
backward on a host due to NTP (as shown in the ticket), and also because a 
transaction can start on one host and complete on another. Getting precise 
clock times across hosts is virtually impossible, and this check makes things 
fragile.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-5415-avoid-timestamp-check-in-completeTransition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3286


commit ccf5217d5a5985e7e88b2794c5fe43ff5b1d8a58
Author: Apurva Mehta 
Date:   2017-06-09T22:51:31Z

Remove timestamp check in completeTransitionTo




> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3286: KAFKA-5415: Remove timestamp check in completeTran...

2017-06-09 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/3286

KAFKA-5415: Remove timestamp check in completeTransitionTo

This assertion is hard to get right because the system time can roll 
backward on a host due to NTP (as shown in the ticket), and also because a 
transaction can start on one host and complete on another. Getting precise 
clock times across hosts is virtually impossible, and this check makes things 
fragile.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-5415-avoid-timestamp-check-in-completeTransition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3286


commit ccf5217d5a5985e7e88b2794c5fe43ff5b1d8a58
Author: Apurva Mehta 
Date:   2017-06-09T22:51:31Z

Remove timestamp check in completeTransitionTo




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Neha Narkhede
Well deserved. Congratulations Damian!

On Fri, Jun 9, 2017 at 1:34 PM Guozhang Wang  wrote:

> Hello all,
>
>
> The PMC of Apache Kafka is pleased to announce that we have invited Damian
> Guy as a committer to the project.
>
> Damian has made tremendous contributions to Kafka. He has not only
> contributed a lot into the Streams api, but have also been involved in many
> other areas like the producer and consumer clients, broker-side
> coordinators (group coordinator and the ongoing transaction coordinator).
> He has contributed more than 100 patches so far, and have been driving on 6
> KIP contributions.
>
> More importantly, Damian has been a very prolific reviewer on open PRs and
> has been actively participating on community activities such as email lists
> and slack overflow questions. Through his code contributions and reviews,
> Damian has demonstrated good judgement on system design and code qualities,
> especially on thorough unit test coverages. We believe he will make a great
> addition to the committers of the community.
>
>
> Thank you for your contributions, Damian!
>
>
> -- Guozhang, on behalf of the Apache Kafka PMC
>
-- 
Thanks,
Neha


[jira] [Comment Edited] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045162#comment-16045162
 ] 

Apurva Mehta edited comment on KAFKA-5415 at 6/9/17 10:48 PM:
--

The last successful metadata update was the following. The update timestamp was 
1496957141444.

{noformat}
[2017-06-08 21:25:41,449] DEBUG TransactionalId my-first-transactional-id 
complete transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=6, txnState=Ongoing, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141444) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

then the system clock rolled back by a couple of hundred milliseconds, and the 
'prepare transition' to 'PrepareCommit' had this transition metadata, with an 
update time of 1496957141285

{noformat}
[2017-06-08 21:25:41,285] DEBUG TransactionalId my-first-transactional-id 
prepare transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=6, txnState=PrepareCommit, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141285) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

So when it came time to complete the transition, the timestamp check would fail 
because the new update timestamp was older than the previous one. We wolud 
throw an illegalStateException, which would be caught and swallowed in the 
delayed fetch operation, hence leving the transaction hanging with a 
pendingState of PrepareCommit.




was (Author: apurva):
The last successful metadata update was the following. The update timestamp was 
1496957141444.

{noformat}
[2017-06-08 21:25:41,449] DEBUG TransactionalId my-first-transactional-id 
complete transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=6, txnState=Ongoing, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141444) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

then the system clock rolled back by a couple of hundred milliseconds, and the 
'prepare transition' to 'PrepareCommit' had this transition metadata 

{noformat}
[2017-06-08 21:25:41,285] DEBUG TransactionalId my-first-transactional-id 
prepare transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=6, txnState=PrepareCommit, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141285) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

So when it came time to complete the transition, the timestamp check would fail 
because the new update timestamp was older than the previous one. We wolud 
throw an illegalStateException, which would be caught and swallowed in the 
delayed fetch operation, hence leving the transaction hanging with a 
pendingState of PrepareCommit.



> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045162#comment-16045162
 ] 

Apurva Mehta commented on KAFKA-5415:
-

The last successful metadata update was the following. The update timestamp was 
1496957141444.

{noformat}
[2017-06-08 21:25:41,449] DEBUG TransactionalId my-first-transactional-id 
complete transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=6, txnState=Ongoing, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141444) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

then the system clock rolled back by a couple of hundred milliseconds, and the 
'prepare transition' to 'PrepareCommit' had this transition metadata 

{noformat}
[2017-06-08 21:25:41,285] DEBUG TransactionalId my-first-transactional-id 
prepare transition from Ongoing to TxnTransitMetadata(producerId=2000, 
producerEpoch=0, txnTimeoutMs=6, txnState=PrepareCommit, 
topicPartitions=Set(output-topic-2, __consumer_offsets-47, output-topic-0, 
output-topic-1), txnStartTimestamp=1496957141430, 
txnLastUpdateTimestamp=1496957141285) 
(kafka.coordinator.transaction.TransactionMetadata)
{noformat}

So when it came time to complete the transition, the timestamp check would fail 
because the new update timestamp was older than the previous one. We wolud 
throw an illegalStateException, which would be caught and swallowed in the 
delayed fetch operation, hence leving the transaction hanging with a 
pendingState of PrepareCommit.



> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045151#comment-16045151
 ] 

Apurva Mehta commented on KAFKA-5415:
-

The problem here seem to be this check: 

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/transaction/TransactionMetadata.scala#L288

I noticed that the system clock time rolled back slightly during the processing 
of this transaction, and [~hachikuji] noticed this check.

What this amounts to is that the operation to complete the state transition 
will fail in the DelayedFetch operation, and the exception would be swallowed. 
The Pending state would not be cleared, and future EndTxnRequests would fail 
with a CONCURRENT_TRANSACTIONS exception.

> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-09 Thread Nicholas Ngorok (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Ngorok updated KAFKA-5413:
---
Attachment: 002147422683.log

> Log cleaner fails due to large offset in segment file
> -
>
> Key: KAFKA-5413
> URL: https://issues.apache.org/jira/browse/KAFKA-5413
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
>Reporter: Nicholas Ngorok
>  Labels: reliability
> Attachments: .log, 002147422683.log
>
>
> The log cleaner thread in our brokers is failing with the trace below
> {noformat}
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
> 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp 
> Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. 
> (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: largest offset in 
> message set can not be safely converted to relative offset.
> at scala.Predef$.require(Predef.scala:224)
> at kafka.log.LogSegment.append(LogSegment.scala:109)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {noformat}
> This seems to point at the specific line [here| 
> https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
>  in the kafka src where the difference is actually larger than MAXINT as both 
> baseOffset and offset are of type long. It was introduced in this [pr| 
> https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]
> These were the outputs of dumping the first two log segments
> {noformat}
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0.log
> Dumping /kafka-logs/__consumer_offsets-12/.log
> Starting offset: 0
> offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
> -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0002147343575.log
> Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
> Starting offset: 2147343575
> offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
> adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
> {noformat}
> My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
> exception. Was there a specific reason, this check was added in 0.10.2?
> E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of 
> "key 1" following, wouldn't we run into this situation whenever the log 
> cleaner runs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-09 Thread Nicholas Ngorok (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Ngorok updated KAFKA-5413:
---
Attachment: (was: 002148098942.log)

> Log cleaner fails due to large offset in segment file
> -
>
> Key: KAFKA-5413
> URL: https://issues.apache.org/jira/browse/KAFKA-5413
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
>Reporter: Nicholas Ngorok
>  Labels: reliability
> Attachments: .log, 002147422683.log
>
>
> The log cleaner thread in our brokers is failing with the trace below
> {noformat}
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
> 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp 
> Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. 
> (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: largest offset in 
> message set can not be safely converted to relative offset.
> at scala.Predef$.require(Predef.scala:224)
> at kafka.log.LogSegment.append(LogSegment.scala:109)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {noformat}
> This seems to point at the specific line [here| 
> https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
>  in the kafka src where the difference is actually larger than MAXINT as both 
> baseOffset and offset are of type long. It was introduced in this [pr| 
> https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]
> These were the outputs of dumping the first two log segments
> {noformat}
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0.log
> Dumping /kafka-logs/__consumer_offsets-12/.log
> Starting offset: 0
> offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
> -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0002147343575.log
> Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
> Starting offset: 2147343575
> offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
> adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
> {noformat}
> My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
> exception. Was there a specific reason, this check was added in 0.10.2?
> E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of 
> "key 1" following, wouldn't we run into this situation whenever the log 
> cleaner runs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-09 Thread Nicholas Ngorok (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Ngorok updated KAFKA-5413:
---
Attachment: (was: 002148775326.index)

> Log cleaner fails due to large offset in segment file
> -
>
> Key: KAFKA-5413
> URL: https://issues.apache.org/jira/browse/KAFKA-5413
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
>Reporter: Nicholas Ngorok
>  Labels: reliability
> Attachments: .log, 002147422683.log
>
>
> The log cleaner thread in our brokers is failing with the trace below
> {noformat}
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
> 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp 
> Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. 
> (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: largest offset in 
> message set can not be safely converted to relative offset.
> at scala.Predef$.require(Predef.scala:224)
> at kafka.log.LogSegment.append(LogSegment.scala:109)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {noformat}
> This seems to point at the specific line [here| 
> https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
>  in the kafka src where the difference is actually larger than MAXINT as both 
> baseOffset and offset are of type long. It was introduced in this [pr| 
> https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]
> These were the outputs of dumping the first two log segments
> {noformat}
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0.log
> Dumping /kafka-logs/__consumer_offsets-12/.log
> Starting offset: 0
> offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
> -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0002147343575.log
> Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
> Starting offset: 2147343575
> offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
> adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
> {noformat}
> My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
> exception. Was there a specific reason, this check was added in 0.10.2?
> E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of 
> "key 1" following, wouldn't we run into this situation whenever the log 
> cleaner runs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Viktor Somogyi
Congrats Damian! :)

On Fri, Jun 9, 2017 at 3:06 PM, Matthias J. Sax 
wrote:

> Congrats Damian!
>
> On 6/9/17 2:25 PM, Eno Thereska wrote:
> > Congrats Damian!
> >
> > Eno
> >> On 9 Jun 2017, at 22:04, Ismael Juma  wrote:
> >>
> >> Congratulations Damian! :)
> >>
> >> Ismael
> >>
> >> On Fri, Jun 9, 2017 at 9:34 PM, Guozhang Wang 
> wrote:
> >>
> >>> Hello all,
> >>>
> >>>
> >>> The PMC of Apache Kafka is pleased to announce that we have invited
> Damian
> >>> Guy as a committer to the project.
> >>>
> >>> Damian has made tremendous contributions to Kafka. He has not only
> >>> contributed a lot into the Streams api, but have also been involved in
> many
> >>> other areas like the producer and consumer clients, broker-side
> >>> coordinators (group coordinator and the ongoing transaction
> coordinator).
> >>> He has contributed more than 100 patches so far, and have been driving
> on 6
> >>> KIP contributions.
> >>>
> >>> More importantly, Damian has been a very prolific reviewer on open PRs
> and
> >>> has been actively participating on community activities such as email
> lists
> >>> and slack overflow questions. Through his code contributions and
> reviews,
> >>> Damian has demonstrated good judgement on system design and code
> qualities,
> >>> especially on thorough unit test coverages. We believe he will make a
> great
> >>> addition to the committers of the community.
> >>>
> >>>
> >>> Thank you for your contributions, Damian!
> >>>
> >>>
> >>> -- Guozhang, on behalf of the Apache Kafka PMC
> >>>
> >
>
>


[jira] [Updated] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-09 Thread Nicholas Ngorok (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Ngorok updated KAFKA-5413:
---
Attachment: .log
002148775326.index
002148098942.log

> Log cleaner fails due to large offset in segment file
> -
>
> Key: KAFKA-5413
> URL: https://issues.apache.org/jira/browse/KAFKA-5413
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
>Reporter: Nicholas Ngorok
>  Labels: reliability
> Attachments: .log, 002148098942.log, 
> 002148775326.index
>
>
> The log cleaner thread in our brokers is failing with the trace below
> {noformat}
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
> 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp 
> Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. 
> (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: largest offset in 
> message set can not be safely converted to relative offset.
> at scala.Predef$.require(Predef.scala:224)
> at kafka.log.LogSegment.append(LogSegment.scala:109)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {noformat}
> This seems to point at the specific line [here| 
> https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
>  in the kafka src where the difference is actually larger than MAXINT as both 
> baseOffset and offset are of type long. It was introduced in this [pr| 
> https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]
> These were the outputs of dumping the first two log segments
> {noformat}
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0.log
> Dumping /kafka-logs/__consumer_offsets-12/.log
> Starting offset: 0
> offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
> -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0002147343575.log
> Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
> Starting offset: 2147343575
> offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
> adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
> {noformat}
> My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
> exception. Was there a specific reason, this check was added in 0.10.2?
> E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of 
> "key 1" following, wouldn't we run into this situation whenever the log 
> cleaner runs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5423) Broker fails with TopicExistException with auto topic creation enabled

2017-06-09 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-5423:
--

 Summary: Broker fails with TopicExistException with auto topic 
creation enabled
 Key: KAFKA-5423
 URL: https://issues.apache.org/jira/browse/KAFKA-5423
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.11.0.0
Reporter: Matthias J. Sax
Priority: Minor


During a test that creates a topic in a 3 broker embedded cluster, it can 
happen that a broker does not refresh it's metadata quickly enough. If the test 
start to write to the topic (if just send a metadata request for it), and auto 
topic create is enabled the broker tries to create the topic. As the topic was 
already create, it fails with {{TopicExistsException}}.

For this case, the broker should swallow the exception and resume with failing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: Changing JIRA notifications to remove Single Email Address (dev@kafka.apache.org)

2017-06-09 Thread Matthias J. Sax
+1

On 6/9/17 3:01 PM, Bill Bejeck wrote:
> +1
> 
> Thanks,
> Bill
> 
> On Fri, Jun 9, 2017 at 4:56 PM, Gwen Shapira  wrote:
> 
>> +1, thank you
>>
>> On Fri, Jun 9, 2017 at 1:37 PM Jeff Widman  wrote:
>>
>>> +1
>>>
>>> Thanks for driving this
>>>
>>> On Jun 9, 2017 1:22 PM, "Guozhang Wang"  wrote:
>>>
 I have just confirmed with Jun that de cannot change the notification
>>> edits
 himself as well. So I'll talk to Apache Infra anyways in order to make
>>> that
 change.

 So I'd update my proposal to:

 1. create an "iss...@kafka.apache.org" which will receive all the JIRA
 events.
 2. remove "dev@kafka.apache.org" from all the JIRA event notifications
 except "Issue Created", "Issue Resolved" and "Issue Reopened".


 Everyone's feedback are more than welcome! I will wait for a few days
>> and
 if there is an agreement create a ticket to Apache JIRA.

 Guozhang


 On Wed, Jun 7, 2017 at 2:43 PM, Guozhang Wang 
>>> wrote:

> Yeah I have thought about that but that would involve Apache Infra.
>> The
> current proposal is based on the assumption that removing the
 notification
> can be done by ourselves.
>
>
> Guozhang
>
> On Wed, Jun 7, 2017 at 1:24 PM, Matthias J. Sax <
>> matth...@confluent.io

> wrote:
>
>> Have you considered starting a new mailing list "issues@" that gets
>>> the
>> full load of all emails?
>>
>> -Matthias
>>
>> On 6/6/17 7:26 PM, Jeff Widman wrote:
>>> What about also adding "Issue Resolved" to the events that hit
>> dev?
>>>
>>> While I don't generally care about updates on issue progress, I do
 want
>> to
>>> know when issues are resolved in case undocumented behavior that I
>>> may
>> have
>>> accidentally been relying on may have changed.
>>>
>>> With or without this suggested tweak, I am +1 (non-binding) on
>> this.
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 6, 2017 at 3:49 PM, Guozhang Wang >>
>> wrote:
>>>
 (Changing the subject title to reflect on the proposal)

 Hey guys,

 In order to not drop the ball on the floor I'd like to kick off
>>> some
 proposals according to people's feedbacks on this issue. Feel
>> free
>>> to
 brainstorm on different ideas:

 We change JIRA notifications to remove `Single Email Address (
 dev@kafka.apache.org)` for all events EXCEPT "Issue Created" (5
>> -
>>> 8
 events per day). Currently `dev@kafka.apache.org` is notified on
>>> all
 events (~180 events per day).


 Though as a PMC member I can view these notification schemes on
>> the
>> JIRA
 admin page, I cannot edit on them. Maybe Jun can check if he has
>>> the
 privilege to do so after we have agreed on some proposal;
>> otherwise
 we
>> need
 to talk to Apache INFRA for it.


 Guozhang



 On Tue, May 30, 2017 at 11:57 PM, Michal Borowiecki <
 michal.borowie...@openbet.com> wrote:

> +1 agree with Jeff,
>
> Michał
>
> On 31/05/17 06:25, Jeff Widman wrote:
>
> I'm hugely in favor of this change as well...
>
> Although I actually find the Github pull request emails less
>>> useful
>> than
> the jirabot ones since Jira typically has more info when I'm
>>> trying
 to
> figure out if the issue is relevant to me or not...
>
> On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang <
>>> wangg...@gmail.com>
 <
>> wangg...@gmail.com> wrote:
>
>
> I actually do not know.. Maybe Jun knows better than me?
>
>
> Guozhang
>
> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira <
>> g...@confluent.io

 <
>> g...@confluent.io> wrote:
>
>
> I agree.
>
> Guozhang, do you know how to implement the suggestion? JIRA to
 Apache
> Infra? Or is this something we can do ourselves somehow?
>
> On Mon, May 29, 2017 at 9:33 PM Guozhang Wang <
>> wangg...@gmail.com

 <
>> wangg...@gmail.com>
>
> wrote:
>
> I share your pains. Right now I use filters on my email accounts
>>> and
>> it
>
> has
>
> been down to about 25 per day.
>
> I think setup a separate mailing list for jirabot and jenkins
>> auto
> generated emails is a good idea.
>
>
> Guozhang
>
>
> On Mon, May 29, 2017 at 12:58 AM,  <
>> marc.schle...@sdv-it.de> wrote:
>
>
> Hello 

Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Matthias J. Sax
Congrats Damian!

On 6/9/17 2:25 PM, Eno Thereska wrote:
> Congrats Damian! 
> 
> Eno
>> On 9 Jun 2017, at 22:04, Ismael Juma  wrote:
>>
>> Congratulations Damian! :)
>>
>> Ismael
>>
>> On Fri, Jun 9, 2017 at 9:34 PM, Guozhang Wang  wrote:
>>
>>> Hello all,
>>>
>>>
>>> The PMC of Apache Kafka is pleased to announce that we have invited Damian
>>> Guy as a committer to the project.
>>>
>>> Damian has made tremendous contributions to Kafka. He has not only
>>> contributed a lot into the Streams api, but have also been involved in many
>>> other areas like the producer and consumer clients, broker-side
>>> coordinators (group coordinator and the ongoing transaction coordinator).
>>> He has contributed more than 100 patches so far, and have been driving on 6
>>> KIP contributions.
>>>
>>> More importantly, Damian has been a very prolific reviewer on open PRs and
>>> has been actively participating on community activities such as email lists
>>> and slack overflow questions. Through his code contributions and reviews,
>>> Damian has demonstrated good judgement on system design and code qualities,
>>> especially on thorough unit test coverages. We believe he will make a great
>>> addition to the committers of the community.
>>>
>>>
>>> Thank you for your contributions, Damian!
>>>
>>>
>>> -- Guozhang, on behalf of the Apache Kafka PMC
>>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Changing JIRA notifications to remove Single Email Address (dev@kafka.apache.org)

2017-06-09 Thread Bill Bejeck
+1

Thanks,
Bill

On Fri, Jun 9, 2017 at 4:56 PM, Gwen Shapira  wrote:

> +1, thank you
>
> On Fri, Jun 9, 2017 at 1:37 PM Jeff Widman  wrote:
>
> > +1
> >
> > Thanks for driving this
> >
> > On Jun 9, 2017 1:22 PM, "Guozhang Wang"  wrote:
> >
> > > I have just confirmed with Jun that de cannot change the notification
> > edits
> > > himself as well. So I'll talk to Apache Infra anyways in order to make
> > that
> > > change.
> > >
> > > So I'd update my proposal to:
> > >
> > > 1. create an "iss...@kafka.apache.org" which will receive all the JIRA
> > > events.
> > > 2. remove "dev@kafka.apache.org" from all the JIRA event notifications
> > > except "Issue Created", "Issue Resolved" and "Issue Reopened".
> > >
> > >
> > > Everyone's feedback are more than welcome! I will wait for a few days
> and
> > > if there is an agreement create a ticket to Apache JIRA.
> > >
> > > Guozhang
> > >
> > >
> > > On Wed, Jun 7, 2017 at 2:43 PM, Guozhang Wang 
> > wrote:
> > >
> > > > Yeah I have thought about that but that would involve Apache Infra.
> The
> > > > current proposal is based on the assumption that removing the
> > > notification
> > > > can be done by ourselves.
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Wed, Jun 7, 2017 at 1:24 PM, Matthias J. Sax <
> matth...@confluent.io
> > >
> > > > wrote:
> > > >
> > > >> Have you considered starting a new mailing list "issues@" that gets
> > the
> > > >> full load of all emails?
> > > >>
> > > >> -Matthias
> > > >>
> > > >> On 6/6/17 7:26 PM, Jeff Widman wrote:
> > > >> > What about also adding "Issue Resolved" to the events that hit
> dev?
> > > >> >
> > > >> > While I don't generally care about updates on issue progress, I do
> > > want
> > > >> to
> > > >> > know when issues are resolved in case undocumented behavior that I
> > may
> > > >> have
> > > >> > accidentally been relying on may have changed.
> > > >> >
> > > >> > With or without this suggested tweak, I am +1 (non-binding) on
> this.
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Tue, Jun 6, 2017 at 3:49 PM, Guozhang Wang  >
> > > >> wrote:
> > > >> >
> > > >> >> (Changing the subject title to reflect on the proposal)
> > > >> >>
> > > >> >> Hey guys,
> > > >> >>
> > > >> >> In order to not drop the ball on the floor I'd like to kick off
> > some
> > > >> >> proposals according to people's feedbacks on this issue. Feel
> free
> > to
> > > >> >> brainstorm on different ideas:
> > > >> >>
> > > >> >> We change JIRA notifications to remove `Single Email Address (
> > > >> >> dev@kafka.apache.org)` for all events EXCEPT "Issue Created" (5
> -
> > 8
> > > >> >> events per day). Currently `dev@kafka.apache.org` is notified on
> > all
> > > >> >> events (~180 events per day).
> > > >> >>
> > > >> >>
> > > >> >> Though as a PMC member I can view these notification schemes on
> the
> > > >> JIRA
> > > >> >> admin page, I cannot edit on them. Maybe Jun can check if he has
> > the
> > > >> >> privilege to do so after we have agreed on some proposal;
> otherwise
> > > we
> > > >> need
> > > >> >> to talk to Apache INFRA for it.
> > > >> >>
> > > >> >>
> > > >> >> Guozhang
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> On Tue, May 30, 2017 at 11:57 PM, Michal Borowiecki <
> > > >> >> michal.borowie...@openbet.com> wrote:
> > > >> >>
> > > >> >>> +1 agree with Jeff,
> > > >> >>>
> > > >> >>> Michał
> > > >> >>>
> > > >> >>> On 31/05/17 06:25, Jeff Widman wrote:
> > > >> >>>
> > > >> >>> I'm hugely in favor of this change as well...
> > > >> >>>
> > > >> >>> Although I actually find the Github pull request emails less
> > useful
> > > >> than
> > > >> >>> the jirabot ones since Jira typically has more info when I'm
> > trying
> > > to
> > > >> >>> figure out if the issue is relevant to me or not...
> > > >> >>>
> > > >> >>> On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang <
> > wangg...@gmail.com>
> > > <
> > > >> wangg...@gmail.com> wrote:
> > > >> >>>
> > > >> >>>
> > > >> >>> I actually do not know.. Maybe Jun knows better than me?
> > > >> >>>
> > > >> >>>
> > > >> >>> Guozhang
> > > >> >>>
> > > >> >>> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira <
> g...@confluent.io
> > >
> > > <
> > > >> g...@confluent.io> wrote:
> > > >> >>>
> > > >> >>>
> > > >> >>> I agree.
> > > >> >>>
> > > >> >>> Guozhang, do you know how to implement the suggestion? JIRA to
> > > Apache
> > > >> >>> Infra? Or is this something we can do ourselves somehow?
> > > >> >>>
> > > >> >>> On Mon, May 29, 2017 at 9:33 PM Guozhang Wang <
> wangg...@gmail.com
> > >
> > > <
> > > >> wangg...@gmail.com>
> > > >> >>>
> > > >> >>> wrote:
> > > >> >>>
> > > >> >>> I share your pains. Right now I use filters on my email accounts
> > and
> > > >> it
> > > >> >>>
> > > >> >>> has
> > > >> >>>
> > > >> >>> been down to about 25 per day.
> > > >> >>>
> > > >> >>> I think setup a separate mailing list for jirabot and jenkins
> 

[jira] [Updated] (KAFKA-5422) Multiple produce request failures causes invalid state transition in TransactionManager

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5422:

Status: Patch Available  (was: Open)

> Multiple produce request failures causes invalid state transition in 
> TransactionManager
> ---
>
> Key: KAFKA-5422
> URL: https://issues.apache.org/jira/browse/KAFKA-5422
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> When multiple produce requests fail (for instance when all inflight batches 
> are expired), each will try to transition to ABORTABLE_ERROR. 
> However, only the first transition will succeed, the rest will fail with the 
> following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 
> This will be caught in the sender thread and things will continue. However, 
> the correct thing to do do is to allow multiple transitions to 
> ABORTABLE_ERROR.
> {noformat}
> [2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
> [2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
> Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
> output-topic-0 with base offset offset -1 and error: {}. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Invalid transition attempted from 
> state ABORTABLE_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
> at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5422) Multiple produce request failures causes invalid state transition in TransactionManager

2017-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045069#comment-16045069
 ] 

ASF GitHub Bot commented on KAFKA-5422:
---

GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/3285

KAFKA-5422: Handle multiple transitions to ABORTABLE_ERROR correctly



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-5422-allow-multiple-transitions-to-abortable-error

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3285.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3285


commit a3d0d923a76269d55541294447967167c35baebb
Author: Apurva Mehta 
Date:   2017-06-09T21:39:49Z

Handle multiple transitions to ABORTABLE_ERROR correctly




> Multiple produce request failures causes invalid state transition in 
> TransactionManager
> ---
>
> Key: KAFKA-5422
> URL: https://issues.apache.org/jira/browse/KAFKA-5422
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> When multiple produce requests fail (for instance when all inflight batches 
> are expired), each will try to transition to ABORTABLE_ERROR. 
> However, only the first transition will succeed, the rest will fail with the 
> following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 
> This will be caught in the sender thread and things will continue. However, 
> the correct thing to do do is to allow multiple transitions to 
> ABORTABLE_ERROR.
> {noformat}
> [2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
> [2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
> Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
> output-topic-0 with base offset offset -1 and error: {}. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Invalid transition attempted from 
> state ABORTABLE_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
> at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3285: KAFKA-5422: Handle multiple transitions to ABORTAB...

2017-06-09 Thread apurvam
GitHub user apurvam opened a pull request:

https://github.com/apache/kafka/pull/3285

KAFKA-5422: Handle multiple transitions to ABORTABLE_ERROR correctly



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apurvam/kafka 
KAFKA-5422-allow-multiple-transitions-to-abortable-error

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3285.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3285


commit a3d0d923a76269d55541294447967167c35baebb
Author: Apurva Mehta 
Date:   2017-06-09T21:39:49Z

Handle multiple transitions to ABORTABLE_ERROR correctly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Eno Thereska
Congrats Damian! 

Eno
> On 9 Jun 2017, at 22:04, Ismael Juma  wrote:
> 
> Congratulations Damian! :)
> 
> Ismael
> 
> On Fri, Jun 9, 2017 at 9:34 PM, Guozhang Wang  wrote:
> 
>> Hello all,
>> 
>> 
>> The PMC of Apache Kafka is pleased to announce that we have invited Damian
>> Guy as a committer to the project.
>> 
>> Damian has made tremendous contributions to Kafka. He has not only
>> contributed a lot into the Streams api, but have also been involved in many
>> other areas like the producer and consumer clients, broker-side
>> coordinators (group coordinator and the ongoing transaction coordinator).
>> He has contributed more than 100 patches so far, and have been driving on 6
>> KIP contributions.
>> 
>> More importantly, Damian has been a very prolific reviewer on open PRs and
>> has been actively participating on community activities such as email lists
>> and slack overflow questions. Through his code contributions and reviews,
>> Damian has demonstrated good judgement on system design and code qualities,
>> especially on thorough unit test coverages. We believe he will make a great
>> addition to the committers of the community.
>> 
>> 
>> Thank you for your contributions, Damian!
>> 
>> 
>> -- Guozhang, on behalf of the Apache Kafka PMC
>> 



[jira] [Updated] (KAFKA-5422) Multiple produce request failures causes invalid state transition in TransactionManager

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5422:

Fix Version/s: 0.11.0.0

> Multiple produce request failures causes invalid state transition in 
> TransactionManager
> ---
>
> Key: KAFKA-5422
> URL: https://issues.apache.org/jira/browse/KAFKA-5422
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> When multiple produce requests fail (for instance when all inflight batches 
> are expired), each will try to transition to ABORTABLE_ERROR. 
> However, only the first transition will succeed, the rest will fail with the 
> following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 
> This will be caught in the sender thread and things will continue. However, 
> the correct thing to do do is to allow multiple transitions to 
> ABORTABLE_ERROR.
> {noformat}
> [2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
> [2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
> Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
> output-topic-0 with base offset offset -1 and error: {}. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Invalid transition attempted from 
> state ABORTABLE_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
> at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [Vote] KIP-150 - Kafka-Streams Cogroup

2017-06-09 Thread Jay Kreps
+1

-Jay

On Thu, Jun 8, 2017 at 11:16 AM, Guozhang Wang  wrote:

> I think we can continue on this voting thread.
>
> Currently we have one binding vote and 2 non-binging votes. I would like to
> call out for other people especially committers to also take a look at this
> proposal and vote.
>
>
> Guozhang
>
>
> On Wed, Jun 7, 2017 at 6:37 PM, Kyle Winkelman 
> wrote:
>
> > Just bringing people's attention to the vote thread for my KIP. I started
> > it before another round of discussion happened. Not sure the protocol so
> > someone let me know if I am supposed to restart the vote.
> > Thanks,
> > Kyle
> >
> > On May 24, 2017 8:49 AM, "Bill Bejeck"  wrote:
> >
> > > +1  for the KIP and +1 what Xavier said as well.
> > >
> > > On Wed, May 24, 2017 at 3:57 AM, Damian Guy 
> > wrote:
> > >
> > > > Also, +1 for the KIP
> > > >
> > > > On Wed, 24 May 2017 at 08:57 Damian Guy 
> wrote:
> > > >
> > > > > +1 to what Xavier said
> > > > >
> > > > > On Wed, 24 May 2017 at 06:45 Xavier Léauté 
> > > wrote:
> > > > >
> > > > >> I don't think we should wait for entries from each stream, since
> > that
> > > > >> might
> > > > >> limit the usefulness of the cogroup operator. There are instances
> > > where
> > > > it
> > > > >> can be useful to compute something based on data from one or more
> > > > stream,
> > > > >> without having to wait for all the streams to produce something
> for
> > > the
> > > > >> group. In the example I gave in the discussion, it is possible to
> > > > compute
> > > > >> impression/auction statistics without having to wait for click
> data,
> > > > which
> > > > >> can typically arrive several minutes late.
> > > > >>
> > > > >> We could have a separate discussion around adding inner / outer
> > > > modifiers
> > > > >> to each of the streams to decide which fields are optional /
> > required
> > > > >> before sending updates if we think that might be useful.
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Tue, May 23, 2017 at 6:28 PM Guozhang Wang  >
> > > > wrote:
> > > > >>
> > > > >> > The proposal LGTM, +1
> > > > >> >
> > > > >> > One question I have is about when to send the record to the
> > resulted
> > > > >> KTable
> > > > >> > changelog. For example in your code snippet in the wiki page,
> > before
> > > > you
> > > > >> > see the end result of
> > > > >> >
> > > > >> > 1L, Customer[
> > > > >> >
> > > > >> >   cart:{Item[no:01], Item[no:03],
> > Item[no:04]},
> > > > >> >   purchases:{Item[no:07], Item[no:08]},
> > > > >> >   wishList:{Item[no:11]}
> > > > >> >   ]
> > > > >> >
> > > > >> >
> > > > >> > You will firs see
> > > > >> >
> > > > >> > 1L, Customer[
> > > > >> >
> > > > >> >   cart:{Item[no:01]},
> > > > >> >   purchases:{},
> > > > >> >   wishList:{}
> > > > >> >   ]
> > > > >> >
> > > > >> > 1L, Customer[
> > > > >> >
> > > > >> >   cart:{Item[no:01]},
> > > > >> >   purchases:{Item[no:07],Item[no:08]},
> > > > >> >
> > > > >> >   wishList:{}
> > > > >> >   ]
> > > > >> >
> > > > >> > 1L, Customer[
> > > > >> >
> > > > >> >   cart:{Item[no:01]},
> > > > >> >   purchases:{Item[no:07],Item[no:08]},
> > > > >> >
> > > > >> >   wishList:{}
> > > > >> >   ]
> > > > >> >
> > > > >> > ...
> > > > >> >
> > > > >> >
> > > > >> > I'm wondering if it makes more sense to only start sending the
> > > update
> > > > if
> > > > >> > the corresponding agg-key has seen at least one input from each
> of
> > > the
> > > > >> > input stream? Maybe it is out of the scope of this KIP and we
> can
> > > make
> > > > >> it a
> > > > >> > more general discussion in a separate one.
> > > > >> >
> > > > >> >
> > > > >> > Guozhang
> > > > >> >
> > > > >> >
> > > > >> > On Fri, May 19, 2017 at 8:37 AM, Xavier Léauté <
> > xav...@confluent.io
> > > >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Kyle, I left a few more comments in the discussion thread,
> if
> > > you
> > > > >> > > wouldn't mind taking a look
> > > > >> > >
> > > > >> > > On Fri, May 19, 2017 at 5:31 AM Kyle Winkelman <
> > > > >> winkelman.k...@gmail.com
> > > > >> > >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > Hello all,
> > > > >> > > >
> > > > >> > > > I would like to start the vote on KIP-150.
> > > > >> > > >
> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 150+-+
> > > > >> > > Kafka-Streams+Cogroup
> > > > >> > > >
> > > > >> > > > Thanks,
> > > > >> > > > Kyle
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > -- Guozhang
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Updated] (KAFKA-5415) TransactionCoordinator doesn't complete transition to PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5415:

Summary: TransactionCoordinator doesn't complete transition to 
PrepareCommit state  (was: TransactionCoordinator gets stuck in PrepareCommit 
state)

> TransactionCoordinator doesn't complete transition to PrepareCommit state
> -
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045051#comment-16045051
 ] 

Apurva Mehta commented on KAFKA-5415:
-

[~guozhang]'s explanation above actually applies to KAFKA-5416.

> TransactionCoordinator gets stuck in PrepareCommit state
> 
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta reassigned KAFKA-5415:
---

Assignee: Apurva Mehta  (was: Guozhang Wang)

> TransactionCoordinator gets stuck in PrepareCommit state
> 
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5416) TransactionCoordinator doesn't complete transition to CompleteCommit

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5416:

Fix Version/s: 0.11.0.0

> TransactionCoordinator doesn't complete transition to CompleteCommit
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> Here are the ownership changes for __transaction_state-37: 
> {noformat}
> ./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 15 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 6 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
> Log Manager 1]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> 

[jira] [Reopened] (KAFKA-5416) TransactionCoordinator doesn't complete transition to CompleteCommit

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta reopened KAFKA-5416:
-
  Assignee: Guozhang Wang

This is not a dup of KAFKA-5415. Instead the cause for KAFKA-5415 explained by 
[~guozhang] in 
https://issues.apache.org/jira/browse/KAFKA-5415?focusedCommentId=16044885=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16044885

actually applies to this ticket. 

> TransactionCoordinator doesn't complete transition to CompleteCommit
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> Here are the ownership changes for __transaction_state-37: 
> {noformat}
> ./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 15 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 6 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
> Log Manager 1]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 

[jira] [Assigned] (KAFKA-5422) Multiple produce request failures causes invalid state transition in TransactionManager

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta reassigned KAFKA-5422:
---

Assignee: Apurva Mehta

> Multiple produce request failures causes invalid state transition in 
> TransactionManager
> ---
>
> Key: KAFKA-5422
> URL: https://issues.apache.org/jira/browse/KAFKA-5422
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Apurva Mehta
>  Labels: exactly-once
>
> When multiple produce requests fail (for instance when all inflight batches 
> are expired), each will try to transition to ABORTABLE_ERROR. 
> However, only the first transition will succeed, the rest will fail with the 
> following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 
> This will be caught in the sender thread and things will continue. However, 
> the correct thing to do do is to allow multiple transitions to 
> ABORTABLE_ERROR.
> {noformat}
> [2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
> [2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
> Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
> output-topic-0 with base offset offset -1 and error: {}. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Invalid transition attempted from 
> state ABORTABLE_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
> at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5422) Multiple produce request failures causes invalid state transition in TransactionManager

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5422:

Summary: Multiple produce request failures causes invalid state transition 
in TransactionManager  (was: Multiple expired batches causes invalid state 
transition in TransactionManager)

> Multiple produce request failures causes invalid state transition in 
> TransactionManager
> ---
>
> Key: KAFKA-5422
> URL: https://issues.apache.org/jira/browse/KAFKA-5422
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>  Labels: exactly-once
>
> When multiple produce requests fail (for instance when all inflight batches 
> are expired), each will try to transition to ABORTABLE_ERROR. 
> However, only the first transition will succeed, the rest will fail with the 
> following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 
> This will be caught in the sender thread and things will continue. However, 
> the correct thing to do do is to allow multiple transitions to 
> ABORTABLE_ERROR.
> {noformat}
> [2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
> Broker may not be available. (org.apache.kafka.clients.NetworkClient)
> [2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
> (org.apache.kafka.clients.producer.internals.Sender)
> [2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
> Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
> (org.apache.kafka.clients.producer.internals.TransactionManager)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
> output-topic-0 with base offset offset -1 and error: {}. 
> (org.apache.kafka.clients.producer.internals.ProducerBatch)
> org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
> output-topic-0: 30099 ms has passed since batch creation plus linger time
> [2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
> (org.apache.kafka.clients.producer.internals.Sender)
> org.apache.kafka.common.KafkaException: Invalid transition attempted from 
> state ABORTABLE_ERROR to state ABORTABLE_ERROR
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
> at 
> org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
> at 
> org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
> at 
> org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-5422) Multiple expired batches causes invalid state transition in TransactionManager

2017-06-09 Thread Apurva Mehta (JIRA)
Apurva Mehta created KAFKA-5422:
---

 Summary: Multiple expired batches causes invalid state transition 
in TransactionManager
 Key: KAFKA-5422
 URL: https://issues.apache.org/jira/browse/KAFKA-5422
 Project: Kafka
  Issue Type: Bug
Reporter: Apurva Mehta


When multiple produce requests fail (for instance when all inflight batches are 
expired), each will try to transition to ABORTABLE_ERROR. 

However, only the first transition will succeed, the rest will fail with the 
following 'invalid transition from ABORTABLE_ERROR to ABORTABLE_ERROR'. 

This will be caught in the sender thread and things will continue. However, the 
correct thing to do do is to allow multiple transitions to ABORTABLE_ERROR.

{noformat}

[2017-06-09 01:22:39,327] WARN Connection to node 3 could not be established. 
Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2017-06-09 01:22:39,958] TRACE Expired 2 batches in accumulator 
(org.apache.kafka.clients.producer.internals.Sender)
[2017-06-09 01:22:39,958] DEBUG [TransactionalId my-first-transactional-id] 
Transition from state COMMITTING_TRANSACTION to error state ABORTABLE_ERROR 
(org.apache.kafka.clients.producer.internals.TransactionManager)
org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
output-topic-0: 30099 ms has passed since batch creation plus linger time
[2017-06-09 01:22:39,960] TRACE Produced messages to topic-partition 
output-topic-0 with base offset offset -1 and error: {}. 
(org.apache.kafka.clients.producer.internals.ProducerBatch)
org.apache.kafka.common.errors.TimeoutException: Expiring 250 record(s) for 
output-topic-0: 30099 ms has passed since batch creation plus linger time
[2017-06-09 01:22:39,960] ERROR Uncaught error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
org.apache.kafka.common.KafkaException: Invalid transition attempted from state 
ABORTABLE_ERROR to state ABORTABLE_ERROR
at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionTo(TransactionManager.java:475)
at 
org.apache.kafka.clients.producer.internals.TransactionManager.transitionToAbortableError(TransactionManager.java:288)
at 
org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:602)
at 
org.apache.kafka.clients.producer.internals.Sender.sendProducerData(Sender.java:271)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:221)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:162)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Ismael Juma
Congratulations Damian! :)

Ismael

On Fri, Jun 9, 2017 at 9:34 PM, Guozhang Wang  wrote:

> Hello all,
>
>
> The PMC of Apache Kafka is pleased to announce that we have invited Damian
> Guy as a committer to the project.
>
> Damian has made tremendous contributions to Kafka. He has not only
> contributed a lot into the Streams api, but have also been involved in many
> other areas like the producer and consumer clients, broker-side
> coordinators (group coordinator and the ongoing transaction coordinator).
> He has contributed more than 100 patches so far, and have been driving on 6
> KIP contributions.
>
> More importantly, Damian has been a very prolific reviewer on open PRs and
> has been actively participating on community activities such as email lists
> and slack overflow questions. Through his code contributions and reviews,
> Damian has demonstrated good judgement on system design and code qualities,
> especially on thorough unit test coverages. We believe he will make a great
> addition to the committers of the community.
>
>
> Thank you for your contributions, Damian!
>
>
> -- Guozhang, on behalf of the Apache Kafka PMC
>


Re: Changing JIRA notifications to remove Single Email Address (dev@kafka.apache.org)

2017-06-09 Thread Gwen Shapira
+1, thank you

On Fri, Jun 9, 2017 at 1:37 PM Jeff Widman  wrote:

> +1
>
> Thanks for driving this
>
> On Jun 9, 2017 1:22 PM, "Guozhang Wang"  wrote:
>
> > I have just confirmed with Jun that de cannot change the notification
> edits
> > himself as well. So I'll talk to Apache Infra anyways in order to make
> that
> > change.
> >
> > So I'd update my proposal to:
> >
> > 1. create an "iss...@kafka.apache.org" which will receive all the JIRA
> > events.
> > 2. remove "dev@kafka.apache.org" from all the JIRA event notifications
> > except "Issue Created", "Issue Resolved" and "Issue Reopened".
> >
> >
> > Everyone's feedback are more than welcome! I will wait for a few days and
> > if there is an agreement create a ticket to Apache JIRA.
> >
> > Guozhang
> >
> >
> > On Wed, Jun 7, 2017 at 2:43 PM, Guozhang Wang 
> wrote:
> >
> > > Yeah I have thought about that but that would involve Apache Infra. The
> > > current proposal is based on the assumption that removing the
> > notification
> > > can be done by ourselves.
> > >
> > >
> > > Guozhang
> > >
> > > On Wed, Jun 7, 2017 at 1:24 PM, Matthias J. Sax  >
> > > wrote:
> > >
> > >> Have you considered starting a new mailing list "issues@" that gets
> the
> > >> full load of all emails?
> > >>
> > >> -Matthias
> > >>
> > >> On 6/6/17 7:26 PM, Jeff Widman wrote:
> > >> > What about also adding "Issue Resolved" to the events that hit dev?
> > >> >
> > >> > While I don't generally care about updates on issue progress, I do
> > want
> > >> to
> > >> > know when issues are resolved in case undocumented behavior that I
> may
> > >> have
> > >> > accidentally been relying on may have changed.
> > >> >
> > >> > With or without this suggested tweak, I am +1 (non-binding) on this.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Jun 6, 2017 at 3:49 PM, Guozhang Wang 
> > >> wrote:
> > >> >
> > >> >> (Changing the subject title to reflect on the proposal)
> > >> >>
> > >> >> Hey guys,
> > >> >>
> > >> >> In order to not drop the ball on the floor I'd like to kick off
> some
> > >> >> proposals according to people's feedbacks on this issue. Feel free
> to
> > >> >> brainstorm on different ideas:
> > >> >>
> > >> >> We change JIRA notifications to remove `Single Email Address (
> > >> >> dev@kafka.apache.org)` for all events EXCEPT "Issue Created" (5 -
> 8
> > >> >> events per day). Currently `dev@kafka.apache.org` is notified on
> all
> > >> >> events (~180 events per day).
> > >> >>
> > >> >>
> > >> >> Though as a PMC member I can view these notification schemes on the
> > >> JIRA
> > >> >> admin page, I cannot edit on them. Maybe Jun can check if he has
> the
> > >> >> privilege to do so after we have agreed on some proposal; otherwise
> > we
> > >> need
> > >> >> to talk to Apache INFRA for it.
> > >> >>
> > >> >>
> > >> >> Guozhang
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Tue, May 30, 2017 at 11:57 PM, Michal Borowiecki <
> > >> >> michal.borowie...@openbet.com> wrote:
> > >> >>
> > >> >>> +1 agree with Jeff,
> > >> >>>
> > >> >>> Michał
> > >> >>>
> > >> >>> On 31/05/17 06:25, Jeff Widman wrote:
> > >> >>>
> > >> >>> I'm hugely in favor of this change as well...
> > >> >>>
> > >> >>> Although I actually find the Github pull request emails less
> useful
> > >> than
> > >> >>> the jirabot ones since Jira typically has more info when I'm
> trying
> > to
> > >> >>> figure out if the issue is relevant to me or not...
> > >> >>>
> > >> >>> On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang <
> wangg...@gmail.com>
> > <
> > >> wangg...@gmail.com> wrote:
> > >> >>>
> > >> >>>
> > >> >>> I actually do not know.. Maybe Jun knows better than me?
> > >> >>>
> > >> >>>
> > >> >>> Guozhang
> > >> >>>
> > >> >>> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira  >
> > <
> > >> g...@confluent.io> wrote:
> > >> >>>
> > >> >>>
> > >> >>> I agree.
> > >> >>>
> > >> >>> Guozhang, do you know how to implement the suggestion? JIRA to
> > Apache
> > >> >>> Infra? Or is this something we can do ourselves somehow?
> > >> >>>
> > >> >>> On Mon, May 29, 2017 at 9:33 PM Guozhang Wang  >
> > <
> > >> wangg...@gmail.com>
> > >> >>>
> > >> >>> wrote:
> > >> >>>
> > >> >>> I share your pains. Right now I use filters on my email accounts
> and
> > >> it
> > >> >>>
> > >> >>> has
> > >> >>>
> > >> >>> been down to about 25 per day.
> > >> >>>
> > >> >>> I think setup a separate mailing list for jirabot and jenkins auto
> > >> >>> generated emails is a good idea.
> > >> >>>
> > >> >>>
> > >> >>> Guozhang
> > >> >>>
> > >> >>>
> > >> >>> On Mon, May 29, 2017 at 12:58 AM,  <
> > >> marc.schle...@sdv-it.de> wrote:
> > >> >>>
> > >> >>>
> > >> >>> Hello everyone
> > >> >>>
> > >> >>> I find it hard to follow this mailinglist due to all the mails
> > >> >>>
> > >> >>> generated
> > >> >>>
> > >> >>> by Jira. Just over this weekend 

Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Gwen Shapira
Congratulations :)

On Fri, Jun 9, 2017 at 1:49 PM Vahid S Hashemian 
wrote:

> Great news.
>
> Congrats Damian!
>
> --Vahid
>
>
>
> From:   Guozhang Wang 
> To: "dev@kafka.apache.org" ,
> "us...@kafka.apache.org" ,
> "priv...@kafka.apache.org" 
> Date:   06/09/2017 01:34 PM
> Subject:[ANNOUNCE] New committer: Damian Guy
>
>
>
> Hello all,
>
>
> The PMC of Apache Kafka is pleased to announce that we have invited Damian
> Guy as a committer to the project.
>
> Damian has made tremendous contributions to Kafka. He has not only
> contributed a lot into the Streams api, but have also been involved in
> many
> other areas like the producer and consumer clients, broker-side
> coordinators (group coordinator and the ongoing transaction coordinator).
> He has contributed more than 100 patches so far, and have been driving on
> 6
> KIP contributions.
>
> More importantly, Damian has been a very prolific reviewer on open PRs and
> has been actively participating on community activities such as email
> lists
> and slack overflow questions. Through his code contributions and reviews,
> Damian has demonstrated good judgement on system design and code
> qualities,
> especially on thorough unit test coverages. We believe he will make a
> great
> addition to the committers of the community.
>
>
> Thank you for your contributions, Damian!
>
>
> -- Guozhang, on behalf of the Apache Kafka PMC
>
>
>
>
>


Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Vahid S Hashemian
Great news.

Congrats Damian!

--Vahid



From:   Guozhang Wang 
To: "dev@kafka.apache.org" , 
"us...@kafka.apache.org" , 
"priv...@kafka.apache.org" 
Date:   06/09/2017 01:34 PM
Subject:[ANNOUNCE] New committer: Damian Guy



Hello all,


The PMC of Apache Kafka is pleased to announce that we have invited Damian
Guy as a committer to the project.

Damian has made tremendous contributions to Kafka. He has not only
contributed a lot into the Streams api, but have also been involved in 
many
other areas like the producer and consumer clients, broker-side
coordinators (group coordinator and the ongoing transaction coordinator).
He has contributed more than 100 patches so far, and have been driving on 
6
KIP contributions.

More importantly, Damian has been a very prolific reviewer on open PRs and
has been actively participating on community activities such as email 
lists
and slack overflow questions. Through his code contributions and reviews,
Damian has demonstrated good judgement on system design and code 
qualities,
especially on thorough unit test coverages. We believe he will make a 
great
addition to the committers of the community.


Thank you for your contributions, Damian!


-- Guozhang, on behalf of the Apache Kafka PMC






Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread James Cheng
Congrats Damian!

-James

> On Jun 9, 2017, at 1:34 PM, Guozhang Wang  wrote:
> 
> Hello all,
> 
> 
> The PMC of Apache Kafka is pleased to announce that we have invited Damian
> Guy as a committer to the project.
> 
> Damian has made tremendous contributions to Kafka. He has not only
> contributed a lot into the Streams api, but have also been involved in many
> other areas like the producer and consumer clients, broker-side
> coordinators (group coordinator and the ongoing transaction coordinator).
> He has contributed more than 100 patches so far, and have been driving on 6
> KIP contributions.
> 
> More importantly, Damian has been a very prolific reviewer on open PRs and
> has been actively participating on community activities such as email lists
> and slack overflow questions. Through his code contributions and reviews,
> Damian has demonstrated good judgement on system design and code qualities,
> especially on thorough unit test coverages. We believe he will make a great
> addition to the committers of the community.
> 
> 
> Thank you for your contributions, Damian!
> 
> 
> -- Guozhang, on behalf of the Apache Kafka PMC



Re: [ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Bill Bejeck
Well deserved.  Congrats Damian!

On Fri, Jun 9, 2017 at 4:34 PM, Guozhang Wang  wrote:

> Hello all,
>
>
> The PMC of Apache Kafka is pleased to announce that we have invited Damian
> Guy as a committer to the project.
>
> Damian has made tremendous contributions to Kafka. He has not only
> contributed a lot into the Streams api, but have also been involved in many
> other areas like the producer and consumer clients, broker-side
> coordinators (group coordinator and the ongoing transaction coordinator).
> He has contributed more than 100 patches so far, and have been driving on 6
> KIP contributions.
>
> More importantly, Damian has been a very prolific reviewer on open PRs and
> has been actively participating on community activities such as email lists
> and slack overflow questions. Through his code contributions and reviews,
> Damian has demonstrated good judgement on system design and code qualities,
> especially on thorough unit test coverages. We believe he will make a great
> addition to the committers of the community.
>
>
> Thank you for your contributions, Damian!
>
>
> -- Guozhang, on behalf of the Apache Kafka PMC
>


Re: Changing JIRA notifications to remove Single Email Address (dev@kafka.apache.org)

2017-06-09 Thread Jeff Widman
+1

Thanks for driving this

On Jun 9, 2017 1:22 PM, "Guozhang Wang"  wrote:

> I have just confirmed with Jun that de cannot change the notification edits
> himself as well. So I'll talk to Apache Infra anyways in order to make that
> change.
>
> So I'd update my proposal to:
>
> 1. create an "iss...@kafka.apache.org" which will receive all the JIRA
> events.
> 2. remove "dev@kafka.apache.org" from all the JIRA event notifications
> except "Issue Created", "Issue Resolved" and "Issue Reopened".
>
>
> Everyone's feedback are more than welcome! I will wait for a few days and
> if there is an agreement create a ticket to Apache JIRA.
>
> Guozhang
>
>
> On Wed, Jun 7, 2017 at 2:43 PM, Guozhang Wang  wrote:
>
> > Yeah I have thought about that but that would involve Apache Infra. The
> > current proposal is based on the assumption that removing the
> notification
> > can be done by ourselves.
> >
> >
> > Guozhang
> >
> > On Wed, Jun 7, 2017 at 1:24 PM, Matthias J. Sax 
> > wrote:
> >
> >> Have you considered starting a new mailing list "issues@" that gets the
> >> full load of all emails?
> >>
> >> -Matthias
> >>
> >> On 6/6/17 7:26 PM, Jeff Widman wrote:
> >> > What about also adding "Issue Resolved" to the events that hit dev?
> >> >
> >> > While I don't generally care about updates on issue progress, I do
> want
> >> to
> >> > know when issues are resolved in case undocumented behavior that I may
> >> have
> >> > accidentally been relying on may have changed.
> >> >
> >> > With or without this suggested tweak, I am +1 (non-binding) on this.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Jun 6, 2017 at 3:49 PM, Guozhang Wang 
> >> wrote:
> >> >
> >> >> (Changing the subject title to reflect on the proposal)
> >> >>
> >> >> Hey guys,
> >> >>
> >> >> In order to not drop the ball on the floor I'd like to kick off some
> >> >> proposals according to people's feedbacks on this issue. Feel free to
> >> >> brainstorm on different ideas:
> >> >>
> >> >> We change JIRA notifications to remove `Single Email Address (
> >> >> dev@kafka.apache.org)` for all events EXCEPT "Issue Created" (5 - 8
> >> >> events per day). Currently `dev@kafka.apache.org` is notified on all
> >> >> events (~180 events per day).
> >> >>
> >> >>
> >> >> Though as a PMC member I can view these notification schemes on the
> >> JIRA
> >> >> admin page, I cannot edit on them. Maybe Jun can check if he has the
> >> >> privilege to do so after we have agreed on some proposal; otherwise
> we
> >> need
> >> >> to talk to Apache INFRA for it.
> >> >>
> >> >>
> >> >> Guozhang
> >> >>
> >> >>
> >> >>
> >> >> On Tue, May 30, 2017 at 11:57 PM, Michal Borowiecki <
> >> >> michal.borowie...@openbet.com> wrote:
> >> >>
> >> >>> +1 agree with Jeff,
> >> >>>
> >> >>> Michał
> >> >>>
> >> >>> On 31/05/17 06:25, Jeff Widman wrote:
> >> >>>
> >> >>> I'm hugely in favor of this change as well...
> >> >>>
> >> >>> Although I actually find the Github pull request emails less useful
> >> than
> >> >>> the jirabot ones since Jira typically has more info when I'm trying
> to
> >> >>> figure out if the issue is relevant to me or not...
> >> >>>
> >> >>> On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang 
> <
> >> wangg...@gmail.com> wrote:
> >> >>>
> >> >>>
> >> >>> I actually do not know.. Maybe Jun knows better than me?
> >> >>>
> >> >>>
> >> >>> Guozhang
> >> >>>
> >> >>> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira 
> <
> >> g...@confluent.io> wrote:
> >> >>>
> >> >>>
> >> >>> I agree.
> >> >>>
> >> >>> Guozhang, do you know how to implement the suggestion? JIRA to
> Apache
> >> >>> Infra? Or is this something we can do ourselves somehow?
> >> >>>
> >> >>> On Mon, May 29, 2017 at 9:33 PM Guozhang Wang 
> <
> >> wangg...@gmail.com>
> >> >>>
> >> >>> wrote:
> >> >>>
> >> >>> I share your pains. Right now I use filters on my email accounts and
> >> it
> >> >>>
> >> >>> has
> >> >>>
> >> >>> been down to about 25 per day.
> >> >>>
> >> >>> I think setup a separate mailing list for jirabot and jenkins auto
> >> >>> generated emails is a good idea.
> >> >>>
> >> >>>
> >> >>> Guozhang
> >> >>>
> >> >>>
> >> >>> On Mon, May 29, 2017 at 12:58 AM,  <
> >> marc.schle...@sdv-it.de> wrote:
> >> >>>
> >> >>>
> >> >>> Hello everyone
> >> >>>
> >> >>> I find it hard to follow this mailinglist due to all the mails
> >> >>>
> >> >>> generated
> >> >>>
> >> >>> by Jira. Just over this weekend there are 240 new mails.
> >> >>> Would it be possible to setup something like j...@kafka.apache.org
> >> >>>
> >> >>> where
> >> >>>
> >> >>> everyone can subscribe interested in those Jira mails?
> >> >>>
> >> >>> Right now I am going to setup a filter which just deletes the
> >> >>>
> >> >>> jira-tagged
> >> >>>
> >> >>> mails, but I think the current setup also makes it hard to read
> >> >>>
> >> >>> through
> >> >>>
> >> >>> 

[ANNOUNCE] New committer: Damian Guy

2017-06-09 Thread Guozhang Wang
Hello all,


The PMC of Apache Kafka is pleased to announce that we have invited Damian
Guy as a committer to the project.

Damian has made tremendous contributions to Kafka. He has not only
contributed a lot into the Streams api, but have also been involved in many
other areas like the producer and consumer clients, broker-side
coordinators (group coordinator and the ongoing transaction coordinator).
He has contributed more than 100 patches so far, and have been driving on 6
KIP contributions.

More importantly, Damian has been a very prolific reviewer on open PRs and
has been actively participating on community activities such as email lists
and slack overflow questions. Through his code contributions and reviews,
Damian has demonstrated good judgement on system design and code qualities,
especially on thorough unit test coverages. We believe he will make a great
addition to the committers of the community.


Thank you for your contributions, Damian!


-- Guozhang, on behalf of the Apache Kafka PMC


Re: Changing JIRA notifications to remove Single Email Address (dev@kafka.apache.org)

2017-06-09 Thread Guozhang Wang
I have just confirmed with Jun that de cannot change the notification edits
himself as well. So I'll talk to Apache Infra anyways in order to make that
change.

So I'd update my proposal to:

1. create an "iss...@kafka.apache.org" which will receive all the JIRA
events.
2. remove "dev@kafka.apache.org" from all the JIRA event notifications
except "Issue Created", "Issue Resolved" and "Issue Reopened".


Everyone's feedback are more than welcome! I will wait for a few days and
if there is an agreement create a ticket to Apache JIRA.

Guozhang


On Wed, Jun 7, 2017 at 2:43 PM, Guozhang Wang  wrote:

> Yeah I have thought about that but that would involve Apache Infra. The
> current proposal is based on the assumption that removing the notification
> can be done by ourselves.
>
>
> Guozhang
>
> On Wed, Jun 7, 2017 at 1:24 PM, Matthias J. Sax 
> wrote:
>
>> Have you considered starting a new mailing list "issues@" that gets the
>> full load of all emails?
>>
>> -Matthias
>>
>> On 6/6/17 7:26 PM, Jeff Widman wrote:
>> > What about also adding "Issue Resolved" to the events that hit dev?
>> >
>> > While I don't generally care about updates on issue progress, I do want
>> to
>> > know when issues are resolved in case undocumented behavior that I may
>> have
>> > accidentally been relying on may have changed.
>> >
>> > With or without this suggested tweak, I am +1 (non-binding) on this.
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jun 6, 2017 at 3:49 PM, Guozhang Wang 
>> wrote:
>> >
>> >> (Changing the subject title to reflect on the proposal)
>> >>
>> >> Hey guys,
>> >>
>> >> In order to not drop the ball on the floor I'd like to kick off some
>> >> proposals according to people's feedbacks on this issue. Feel free to
>> >> brainstorm on different ideas:
>> >>
>> >> We change JIRA notifications to remove `Single Email Address (
>> >> dev@kafka.apache.org)` for all events EXCEPT "Issue Created" (5 - 8
>> >> events per day). Currently `dev@kafka.apache.org` is notified on all
>> >> events (~180 events per day).
>> >>
>> >>
>> >> Though as a PMC member I can view these notification schemes on the
>> JIRA
>> >> admin page, I cannot edit on them. Maybe Jun can check if he has the
>> >> privilege to do so after we have agreed on some proposal; otherwise we
>> need
>> >> to talk to Apache INFRA for it.
>> >>
>> >>
>> >> Guozhang
>> >>
>> >>
>> >>
>> >> On Tue, May 30, 2017 at 11:57 PM, Michal Borowiecki <
>> >> michal.borowie...@openbet.com> wrote:
>> >>
>> >>> +1 agree with Jeff,
>> >>>
>> >>> Michał
>> >>>
>> >>> On 31/05/17 06:25, Jeff Widman wrote:
>> >>>
>> >>> I'm hugely in favor of this change as well...
>> >>>
>> >>> Although I actually find the Github pull request emails less useful
>> than
>> >>> the jirabot ones since Jira typically has more info when I'm trying to
>> >>> figure out if the issue is relevant to me or not...
>> >>>
>> >>> On Tue, May 30, 2017 at 2:28 PM, Guozhang Wang  <
>> wangg...@gmail.com> wrote:
>> >>>
>> >>>
>> >>> I actually do not know.. Maybe Jun knows better than me?
>> >>>
>> >>>
>> >>> Guozhang
>> >>>
>> >>> On Mon, May 29, 2017 at 12:58 PM, Gwen Shapira  <
>> g...@confluent.io> wrote:
>> >>>
>> >>>
>> >>> I agree.
>> >>>
>> >>> Guozhang, do you know how to implement the suggestion? JIRA to Apache
>> >>> Infra? Or is this something we can do ourselves somehow?
>> >>>
>> >>> On Mon, May 29, 2017 at 9:33 PM Guozhang Wang  <
>> wangg...@gmail.com>
>> >>>
>> >>> wrote:
>> >>>
>> >>> I share your pains. Right now I use filters on my email accounts and
>> it
>> >>>
>> >>> has
>> >>>
>> >>> been down to about 25 per day.
>> >>>
>> >>> I think setup a separate mailing list for jirabot and jenkins auto
>> >>> generated emails is a good idea.
>> >>>
>> >>>
>> >>> Guozhang
>> >>>
>> >>>
>> >>> On Mon, May 29, 2017 at 12:58 AM,  <
>> marc.schle...@sdv-it.de> wrote:
>> >>>
>> >>>
>> >>> Hello everyone
>> >>>
>> >>> I find it hard to follow this mailinglist due to all the mails
>> >>>
>> >>> generated
>> >>>
>> >>> by Jira. Just over this weekend there are 240 new mails.
>> >>> Would it be possible to setup something like j...@kafka.apache.org
>> >>>
>> >>> where
>> >>>
>> >>> everyone can subscribe interested in those Jira mails?
>> >>>
>> >>> Right now I am going to setup a filter which just deletes the
>> >>>
>> >>> jira-tagged
>> >>>
>> >>> mails, but I think the current setup also makes it hard to read
>> >>>
>> >>> through
>> >>>
>> >>> the archives.
>> >>>
>> >>> regards
>> >>> Marc
>> >>>
>> >>> --
>> >>> -- Guozhang
>> >>>
>> >>>
>> >>> --
>> >>> -- Guozhang
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>  Michal Borowiecki
>> >>> Senior Software Engineer L4
>> >>> T: +44 208 742 1600 <+44%2020%208742%201600>
>> >>>
>> >>>
>> >>> +44 203 249 8448 <+44%2020%203249%208448>
>> >>>
>> >>>
>> >>>
>> >>> E: michal.borowie...@openbet.com
>> 

Build failed in Jenkins: kafka-trunk-jdk7 #2377

2017-06-09 Thread Apache Jenkins Server
See 


Changes:

[jason] HOTFIX: Use atomic boolean for inject errors in streams eos integration

--
[...truncated 954.84 KB...]

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] STARTED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[0] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[0] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[0] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[0] STARTED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[0] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[1] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[1] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[1] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[1] STARTED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[1] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[2] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[2] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[2] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[2] STARTED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[2] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[3] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleansCombinedCompactAndDeleteTopic[3] PASSED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[3] STARTED

kafka.log.LogCleanerIntegrationTest > 
testCleaningNestedMessagesWithMultipleVersions[3] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[3] STARTED

kafka.log.LogCleanerIntegrationTest > testCleanerWithMessageFormatV0[3] PASSED

unit.kafka.admin.ResetConsumerGroupOffsetTest > 
testResetOffsetsShiftByHigherThanLatest STARTED

unit.kafka.admin.ResetConsumerGroupOffsetTest > 
testResetOffsetsShiftByHigherThanLatest PASSED

unit.kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsShiftMinus 
STARTED

unit.kafka.admin.ResetConsumerGroupOffsetTest > testResetOffsetsShiftMinus 
PASSED


[jira] [Commented] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-09 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044885#comment-16044885
 ] 

Guozhang Wang commented on KAFKA-5415:
--

[~apurva] [~hachikuji] The root cause is in a couple folds:

1) First in {{TransactionStateManager#appendTransactionToLog}}, when we 
received an error while trying to append to log (in our case it is 
`NotEnoughReplicas`), we will reset the pending state: {{metadata.pendingState 
= None}}. This behavior is correct for all its callers EXCEPT the one from 
{{TransactionMarkerChannelManager#tryAppendToLog}}, in which {{appendCallback}} 
will retry appending again but this time the pending state has been reset.

So when the retry finally succeed, it will call 
{{TransactionMetadata#completeTransitionTo}} we will throw an exception since 
the pending state is `None`.

2) This exception is thrown all the way to {{KafkaApi#handleFetchRequest}}, 
because it is the thread that handles the follower fetch request, and then 
increment the HW, and then call {{tryCompleteDelayedRequests}} which will throw 
the exception. This exception will be handled in {{handleError}}, in which we 
do not print ANY logging but simply returns an error to the follower fetcher. 
And hence from the logs it is a mystery that it seems the callback was not 
called; in fact it did get called by throw an exception that get silently 
swallowed.

The key log entries the disclose this is in worker7: 

{code}
[2017-06-09 01:16:54,132] DEBUG Partition [__transaction_state,37] on broker 1: 
High watermark for partition [__transaction_state,37] updated to 86 [0 : 14667] 
(kafka.cluster.Partition)
[2017-06-09 01:16:54,132] DEBUG [Replica Manager on Broker 1]: Request key 
__transaction_state-37 unblocked 0 fetch requests. (kafka.server.ReplicaManager)
{code}

Note it does not have any "unblocked x produce requests" after "unblocked 0 
fetch requests", while in {{tryCompleteDelayedRequests}} we always try to 
complete fetch, then produce, then delete requests; and in worker2:

{code}
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,22] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,136] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,8] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,21] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,4] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,7] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,9] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,46] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,25] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,35] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for partition 
[__consumer_offsets,41] to broker 
1:org.apache.kafka.common.errors.UnknownServerException: The server experienced 
an unexpected error when processing the request 
(kafka.server.ReplicaFetcherThread)
[2017-06-09 01:16:54,137] ERROR [ReplicaFetcherThread-0-1]: Error for 

[jira] [Updated] (KAFKA-5416) TransactionCoordinator doesn't complete transition to CompleteCommit

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta updated KAFKA-5416:

Summary: TransactionCoordinator doesn't complete transition to 
CompleteCommit  (was: TransactionCoordinator: TransactionMarkerChannelManager 
seems not to retry failed writes.)

> TransactionCoordinator doesn't complete transition to CompleteCommit
> 
>
> Key: KAFKA-5416
> URL: https://issues.apache.org/jira/browse/KAFKA-5416
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Priority: Blocker
>  Labels: exactly-once
>
> In regard to this system test: 
> http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2017-06-09--001.1496974430--apurvam--MINOR-add-logging-to-transaction-coordinator-in-all-failure-cases--02b3816/TransactionsTest/test_transactions/failure_mode=clean_bounce.bounce_target=brokers/4.tgz
> Here are the ownership changes for __transaction_state-37: 
> {noformat}
> ./worker1/debug/server.log:3623:[2017-06-09 01:16:36,551] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likely that another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16174:[2017-06-09 01:16:39,963] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:16194:[2017-06-09 01:16:39,978] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 15 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:35633:[2017-06-09 01:16:45,308] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40353:[2017-06-09 01:16:52,218] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:40664:[2017-06-09 01:16:52,683] INFO [Transaction 
> Log Manager 2]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146044:[2017-06-09 01:19:08,844] INFO [Transaction 
> Log Manager 2]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:146048:[2017-06-09 01:19:08,850] INFO [Transaction 
> Log Manager 2]: Finished loading 1 transaction metadata from 
> __transaction_state-37 in 6 milliseconds 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker1/debug/server.log:150147:[2017-06-09 01:19:10,995] INFO [Transaction 
> Log Manager 2]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:3413:[2017-06-09 01:16:36,109] INFO [Transaction 
> Log Manager 1]: Loading transaction metadata from __transaction_state-37 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:20889:[2017-06-09 01:16:39,991] INFO [Transaction 
> Log Manager 1]: Removed 1 cached transaction metadata for 
> __transaction_state-37 on follower transition 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26084:[2017-06-09 01:16:46,682] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> (kafka.coordinator.transaction.TransactionStateManager)
> ./worker2/debug/server.log:26322:[2017-06-09 01:16:47,153] INFO [Transaction 
> Log Manager 1]: Trying to remove cached transaction metadata for 
> __transaction_state-37 on follower transition but there is no entries 
> remaining; it is likelythat another process for removing the cached entries 
> has just executed earlier before 
> 

[jira] [Assigned] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apurva Mehta reassigned KAFKA-5415:
---

Assignee: Guozhang Wang  (was: Apurva Mehta)

> TransactionCoordinator gets stuck in PrepareCommit state
> 
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5415) TransactionCoordinator gets stuck in PrepareCommit state

2017-06-09 Thread Apurva Mehta (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044861#comment-16044861
 ] 

Apurva Mehta commented on KAFKA-5415:
-

[~guozhang] has found the bug and will file the PR. Assigning to him.

> TransactionCoordinator gets stuck in PrepareCommit state
> 
>
> Key: KAFKA-5415
> URL: https://issues.apache.org/jira/browse/KAFKA-5415
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Guozhang Wang
>Priority: Blocker
>  Labels: exactly-once
> Fix For: 0.11.0.0
>
> Attachments: 6.tgz
>
>
> This has been revealed by the system test failures on jenkins. 
> The transaction coordinator seems to get into a path during the handling of 
> the EndTxnRequest where it returns an error (possibly a NOT_COORDINATOR or 
> COORDINATOR_NOT_AVAILABLE error, to be revealed by 
> https://github.com/apache/kafka/pull/3278) to the client. However, due to 
> network instability, the producer is disconnected before it receives this 
> error.
> As a result, the transaction remains in a `PrepareXX` state, and future 
> `EndTxn` requests sent by the client after reconnecting result in a 
> `CONCURRENT_TRANSACTION` error code. Hence the client gets stuck and the 
> transaction never finishes, as expiration isn't done from a PrepareXX state.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-1044) change log4j to slf4j

2017-06-09 Thread Viktor Somogyi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044842#comment-16044842
 ] 

Viktor Somogyi edited comment on KAFKA-1044 at 6/9/17 6:53 PM:
---

[~ewencp], thank you, started working on it.
I found two incompatibilities and I'm asking your advice in the resolution.
 * Fatal logging level: slf4j doesn't support it
 * kafka.utils.Log4jController: there are some log4j specific features used 
here that can't be replaced with a generic solution

I have two solutions for these:
 # we could simply use the log4j-over-slf4j package. This will redirect all 
Fatal logs to Error level and also provides some of the functionalities 
required by Log4jController (but not all).
* Pros: a few lines of changes only, quite straightforward
* Cons: loss of features. However Fatal can't be worked around so we have 
to accept that, but we also lose Log4jController.getLoggers as there is no way 
of collecting the current loggers by simply relying on slf4j. Also other 
methods' behaviour will change (no way to detect existing loggers in slf4j as 
far as I'm aware of it)
# we could separate off Log4jController into a separate module and have the 
users to put it on the classpath explicitly if they use log4j for logging. From 
Logging class we can instantiate it by reflection.
* Pros: except Fatal logging level we keep all the features
* Cons: more complicated and breaking, also requires documentation so users 
will be aware of this

I'll try to create PRs of both solutions (and they are rather work in progress, 
I just want to show approximate solutions) just in case if that is fine.


was (Author: viktorsomogyi):
[~ewencp], thank you, started working on it.
I found two incompatibilities and I'm asking your advice in the resolution.
 * Fatal logging level: slf4j doesn't support it
 * kafka.utils.Log4jController: there are some log4j specific features used 
here that can't be replaced with a generic solution

I have two solutions for these:
 # we could simply use the log4j-over-slf4j package. This will redirect all 
Fatal logs to Error level and also provides some of the functionalities 
required by Log4jController (but not all).
* Pros: a few lines of changes only, quite straightforward
* Cons: loss of features. However Fatal can't be worked around so we have 
to accept that, but we also lose Log4jController.getLoggers as there is no way 
of collecting the current loggers by simply relying on slf4j. Also other 
methods' behaviour will change (no way to detect existing loggers in slf4j as 
far as I'm aware of it)
# we could separate off Log4jController into a separate module and have the 
users to put it on the classpath explicitly if they use log4j for logging. From 
Logging class we can instantiate it by reflection.
* Pros: except Fatal logging level we keep all the features
* Cons: more complicated and breaking, also requires documentation so users 
will be aware of this

I'm attaching patches of both solutions (and they are rather work in progress, 
I just want to show approximate solutions) just in case

> change log4j to slf4j 
> --
>
> Key: KAFKA-1044
> URL: https://issues.apache.org/jira/browse/KAFKA-1044
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.8.0
>Reporter: shijinkui
>Assignee: Viktor Somogyi
>Priority: Minor
>  Labels: newbie
>
> can u chanage the log4j to slf4j, in my project, i use logback, it's conflict 
> with log4j.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-1044) change log4j to slf4j

2017-06-09 Thread Viktor Somogyi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044842#comment-16044842
 ] 

Viktor Somogyi commented on KAFKA-1044:
---

[~ewencp], thank you, started working on it.
I found two incompatibilities and I'm asking your advice in the resolution.
 * Fatal logging level: slf4j doesn't support it
 * kafka.utils.Log4jController: there are some log4j specific features used 
here that can't be replaced with a generic solution

I have two solutions for these:
 # we could simply use the log4j-over-slf4j package. This will redirect all 
Fatal logs to Error level and also provides some of the functionalities 
required by Log4jController (but not all).
* Pros: a few lines of changes only, quite straightforward
* Cons: loss of features. However Fatal can't be worked around so we have 
to accept that, but we also lose Log4jController.getLoggers as there is no way 
of collecting the current loggers by simply relying on slf4j. Also other 
methods' behaviour will change (no way to detect existing loggers in slf4j as 
far as I'm aware of it)
# we could separate off Log4jController into a separate module and have the 
users to put it on the classpath explicitly if they use log4j for logging. From 
Logging class we can instantiate it by reflection.
* Pros: except Fatal logging level we keep all the features
* Cons: more complicated and breaking, also requires documentation so users 
will be aware of this

I'm attaching patches of both solutions (and they are rather work in progress, 
I just want to show approximate solutions) just in case

> change log4j to slf4j 
> --
>
> Key: KAFKA-1044
> URL: https://issues.apache.org/jira/browse/KAFKA-1044
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.8.0
>Reporter: shijinkui
>Assignee: Viktor Somogyi
>Priority: Minor
>  Labels: newbie
>
> can u chanage the log4j to slf4j, in my project, i use logback, it's conflict 
> with log4j.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #3275: HOTFIX: use atomic boolean for inject errors in st...

2017-06-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3275


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #3284: MINOR: A few cleanups of usage of Either in TC

2017-06-09 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/3284

MINOR: A few cleanups of usage of Either in TC



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka minor-either-usage-cleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3284.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3284


commit cc8f8e75afed382e0eb77b3238aa3dff30a2c100
Author: Jason Gustafson 
Date:   2017-06-09T18:30:36Z

MINOR: A few cleanups of usage of Either in TC




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-09 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-5412:
-
Description: 
With the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
using connect-console-sink/source.properties doesn't work anymore because the 
needed "file" property isn't found. This is because the underlying used 
FileStreamSink/Source connector and task has defined a ConfigDef with "file" as 
mandatory parameter. In the case of console example we want to have file=null 
so that stdin and stdout are used. 

One possible solution and workaround is set "file=" inside the provided 
connect-console-sink/source.properties. The other one could be modify the 
FileStreamSink/Source source code in order to remove the "file" definition from 
the ConfigDef.

  was:
Hi,
with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
using connect-console-sink/source.properties doesn't work anymore because the 
needed "file" property isn't found.
This is because the underlying used FileStreamSink/Source connector and task 
has defined a ConfigDef with "file" as mandatory parameter. In the case of 
console example we want to have file=null so that stdin and stdout are used.
One possible solution and workaround is set "file=" inside the provided 
connect-console-sink/source.properties.
The other one could be modify the FileStreamSink/Source source code in order to 
remove the "file" definition from the ConfigDef.


> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.1
>Reporter: Paolo Patierno
> Fix For: 0.11.1.0
>
>
> With the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found. This is because the underlying used 
> FileStreamSink/Source connector and task has defined a ConfigDef with "file" 
> as mandatory parameter. In the case of console example we want to have 
> file=null so that stdin and stdout are used. 
> One possible solution and workaround is set "file=" inside the provided 
> connect-console-sink/source.properties. The other one could be modify the 
> FileStreamSink/Source source code in order to remove the "file" definition 
> from the ConfigDef.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-09 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-5412:
-
Description: 
Hi,
with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
using connect-console-sink/source.properties doesn't work anymore because the 
needed "file" property isn't found.
This is because the underlying used FileStreamSink/Source connector and task 
has defined a ConfigDef with "file" as mandatory parameter. In the case of 
console example we want to have file=null so that stdin and stdout are used.
One possible solution and workaround is set "file=" inside the provided 
connect-console-sink/source.properties.
The other one could be modify the FileStreamSink/Source source code in order to 
remove the "file" definition from the ConfigDef.

  was:
Hi,
with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
using connect-console-sink/source.properties doesn't work anymore because the 
needed "file" property isn't found.
This is because the underlying used FileStreamSink/Source connector and task 
has defined a ConfigDef with "file" as mandatory parameter. In the case of 
console example we want to have file=null so that stdin and stdout are used.
One possible solution is set "file=" inside the provided 
connect-console-sink/source.properties.
The other one could be modify the FileStreamSink/Source source code in order to 
remove the "file" definition from the ConfigDef.
What do you think ?
I can provide a PR for that.

Thanks,
Paolo.




> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.1
>Reporter: Paolo Patierno
> Fix For: 0.11.1.0
>
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution and workaround is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5421) Getting InvalidRecordException

2017-06-09 Thread Rishi Reddy Bokka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishi Reddy Bokka updated KAFKA-5421:
-
Description: 
In my application, I get data which gets queued using kafka and saved on the 
disk and the consumer which gets this data from kafka and does the processing. 
But When my consumer is trying to read data from kafka I am getting below 
exceptions :

2017-06-09 10:57:24,733 ERROR NetworkClient Uncaught error in request 
completion:
org.apache.kafka.common.KafkaException: Error deserializing key/value for 
partition TcpMessage-1 at offset 155884487
at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:628)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:566)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69) 
~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
 ~[kafka-clients-0.9.0.1.jar:?]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
 [kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
 [kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
 [kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
 [kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
[kafka-clients-0.9.0.1.jar:?]
at 
com.affirmed.mediation.edr.kafka.tcpMessage.TcpMessageConsumer.doWork(TcpMessageConsumer.java:190)
 [EdrServer.jar:?]
at 
com.affirmed.mediation.edr.kafka.tcpMessage.TcpMessageConsumer.run(TcpMessageConsumer.java:248)
 [EdrServer.jar:?]
Caused by: org.apache.kafka.common.record.InvalidRecordException: Record is 
corrupt (stored crc = 2016852547, computed crc = 1399853379)
at org.apache.kafka.common.record.Record.ensureValid(Record.java:226) 
~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:617)
 ~[kafka-clients-0.9.0.1.jar:?]
... 15 more

Could anyone please help me with this. I got stuck with it and not able to 
figure out the root.

When this occurs is there any way to catch this exception and move the offset? 
Currently, consumer is keep polling for the same range of records in the next 
poll  as   
result never moving forward.

  was:
In my application, I get data which gets queued using kafka and saved on the 
disk and the consumer which gets this data from kafka and does the processing. 
But When my consumer is trying to read data from kafka I am getting below 
exceptions :

2017-06-09 10:57:24,733 ERROR NetworkClient Uncaught error in request 
completion:
org.apache.kafka.common.KafkaException: Error deserializing key/value for 
partition TcpMessage-1 at offset 155884487
at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:628)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:566)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69) 
~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
 ~[kafka-clients-0.9.0.1.jar:?]
at 
org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
 ~[kafka-clients-0.9.0.1.jar:?]
at 

[GitHub] kafka pull request #2928: HOTFIX [WIP]: Check on not owned partitions

2017-06-09 Thread guozhangwang
Github user guozhangwang closed the pull request at:

https://github.com/apache/kafka/pull/2928


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-09 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044669#comment-16044669
 ] 

Jun Rao commented on KAFKA-5413:


[~ny2ko], [~Federico Giraud], thanks for the reporting this issue. In 
LogCleaner.groupSegmentsBySize(), we try to groups segments with the following 
check.
   segs.head.index.lastOffset - group.last.index.baseOffset <= Int.MaxValue

The intention is that if 2 segments's offset differ by more than Int.MaxValue, 
they will be put into different groups and therefore won't be writing to the 
same output segment. So, I am not sure why the grouping didn't happen.

Do you think you could get the value of index.lastOffset and index.baseOffset 
of these 2 segments in the debugger? Alternatively, if you could upload the 2 
log segments (with both the log and the index files) with base offset 0 and 
2147343575 to the jira, we can take a look of that locally.

> Log cleaner fails due to large offset in segment file
> -
>
> Key: KAFKA-5413
> URL: https://issues.apache.org/jira/browse/KAFKA-5413
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
>Reporter: Nicholas Ngorok
>  Labels: reliability
>
> The log cleaner thread in our brokers is failing with the trace below
> {noformat}
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
> 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp 
> Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. 
> (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: largest offset in 
> message set can not be safely converted to relative offset.
> at scala.Predef$.require(Predef.scala:224)
> at kafka.log.LogSegment.append(LogSegment.scala:109)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {noformat}
> This seems to point at the specific line [here| 
> https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
>  in the kafka src where the difference is actually larger than MAXINT as both 
> baseOffset and offset are of type long. It was introduced in this [pr| 
> https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]
> These were the outputs of dumping the first two log segments
> {noformat}
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0.log
> Dumping /kafka-logs/__consumer_offsets-12/.log
> Starting offset: 0
> offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
> -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0002147343575.log
> Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
> Starting offset: 2147343575
> offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
> adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
> {noformat}
> My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
> exception. Was there a specific reason, this check was added in 0.10.2?
> E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of 
> "key 1" following, wouldn't we run into this 

[jira] [Updated] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-09 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5413:
---
Labels: reliability  (was: )

> Log cleaner fails due to large offset in segment file
> -
>
> Key: KAFKA-5413
> URL: https://issues.apache.org/jira/browse/KAFKA-5413
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
>Reporter: Nicholas Ngorok
>  Labels: reliability
>
> The log cleaner thread in our brokers is failing with the trace below
> {noformat}
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
> 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp 
> Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. 
> (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: largest offset in 
> message set can not be safely converted to relative offset.
> at scala.Predef$.require(Predef.scala:224)
> at kafka.log.LogSegment.append(LogSegment.scala:109)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {noformat}
> This seems to point at the specific line [here| 
> https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
>  in the kafka src where the difference is actually larger than MAXINT as both 
> baseOffset and offset are of type long. It was introduced in this [pr| 
> https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]
> These were the outputs of dumping the first two log segments
> {noformat}
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0.log
> Dumping /kafka-logs/__consumer_offsets-12/.log
> Starting offset: 0
> offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
> -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0002147343575.log
> Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
> Starting offset: 2147343575
> offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
> adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
> {noformat}
> My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
> exception. Was there a specific reason, this check was added in 0.10.2?
> E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of 
> "key 1" following, wouldn't we run into this situation whenever the log 
> cleaner runs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5413) Log cleaner fails due to large offset in segment file

2017-06-09 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044637#comment-16044637
 ] 

Ismael Juma commented on KAFKA-5413:


cc [~junrao]

> Log cleaner fails due to large offset in segment file
> -
>
> Key: KAFKA-5413
> URL: https://issues.apache.org/jira/browse/KAFKA-5413
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.2.0, 0.10.2.1
> Environment: Ubuntu 14.04 LTS, Oracle Java 8u92, kafka_2.11-0.10.2.0
>Reporter: Nicholas Ngorok
>
> The log cleaner thread in our brokers is failing with the trace below
> {noformat}
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 0 in log __consumer_offsets-12 (largest timestamp Thu Jun 08 
> 15:48:59 PDT 2017) into 0, retaining deletes. (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,822] INFO {kafka-log-cleaner-thread-0} Cleaner 0: 
> Cleaning segment 2147343575 in log __consumer_offsets-12 (largest timestamp 
> Thu Jun 08 15:49:06 PDT 2017) into 0, retaining deletes. 
> (kafka.log.LogCleaner)
> [2017-06-08 15:49:54,834] ERROR {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Error due to  (kafka.log.LogCleaner)
> java.lang.IllegalArgumentException: requirement failed: largest offset in 
> message set can not be safely converted to relative offset.
> at scala.Predef$.require(Predef.scala:224)
> at kafka.log.LogSegment.append(LogSegment.scala:109)
> at kafka.log.Cleaner.cleanInto(LogCleaner.scala:478)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:405)
> at 
> kafka.log.Cleaner$$anonfun$cleanSegments$1.apply(LogCleaner.scala:401)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:401)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:363)
> at kafka.log.Cleaner$$anonfun$clean$4.apply(LogCleaner.scala:362)
> at scala.collection.immutable.List.foreach(List.scala:381)
> at kafka.log.Cleaner.clean(LogCleaner.scala:362)
> at 
> kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:241)
> at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:220)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> [2017-06-08 15:49:54,835] INFO {kafka-log-cleaner-thread-0} 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> {noformat}
> This seems to point at the specific line [here| 
> https://github.com/apache/kafka/blob/0.11.0/core/src/main/scala/kafka/log/LogSegment.scala#L92]
>  in the kafka src where the difference is actually larger than MAXINT as both 
> baseOffset and offset are of type long. It was introduced in this [pr| 
> https://github.com/apache/kafka/pull/2210/files/56d1f8196b77a47b176b7bbd1e4220a3be827631]
> These were the outputs of dumping the first two log segments
> {noformat}
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0.log
> Dumping /kafka-logs/__consumer_offsets-12/.log
> Starting offset: 0
> offset: 1810054758 position: 0 NoTimestampType: -1 isvalid: true payloadsize: 
> -1 magic: 0 compresscodec: NONE crc: 3127861909 keysize: 34
> :~$ /usr/bin/kafka-run-class kafka.tools.DumpLogSegments --deep-iteration 
> --files /kafka-logs/__consumer_offsets-12/000
> 0002147343575.log
> Dumping /kafka-logs/__consumer_offsets-12/002147343575.log
> Starting offset: 2147343575
> offset: 2147539884 position: 0 NoTimestampType: -1 isvalid: true paylo
> adsize: -1 magic: 0 compresscodec: NONE crc: 2282192097 keysize: 34
> {noformat}
> My guess is that since 2147539884 is larger than MAXINT, we are hitting this 
> exception. Was there a specific reason, this check was added in 0.10.2?
> E.g. if the first offset is a key = "key 0" and then we have MAXINT + 1 of 
> "key 1" following, wouldn't we run into this situation whenever the log 
> cleaner runs?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-09 Thread Paolo Patierno (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno updated KAFKA-5412:
--
Status: Patch Available  (was: Open)

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.1
>Reporter: Paolo Patierno
> Fix For: 0.11.1.0
>
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4743) Add a tool to Reset Consumer Group Offsets

2017-06-09 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044620#comment-16044620
 ] 

Matthias J. Sax commented on KAFKA-4743:


[~jeqo] As this is done, are you interested to extend 
`bin/kafka-streams-application-reset.sh` with a similar feature? Would require 
a KIP, too (there is also no JIRA yet). The "problem" right now is, that user 
need to use two tools to reset an Streams application and set an arbitrary 
start offset. Would be nice if a single tool could do this :)

> Add a tool to Reset Consumer Group Offsets
> --
>
> Key: KAFKA-4743
> URL: https://issues.apache.org/jira/browse/KAFKA-4743
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer, core, tools
>Reporter: Jorge Quilcate
>Assignee: Jorge Quilcate
>  Labels: kip
> Fix For: 0.11.0.0
>
>
> Add an external tool to reset Consumer Group offsets, and achieve rewind over 
> the topics, without changing client-side code.
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-122%3A+Add+Reset+Consumer+Group+Offsets+tooling



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-09 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044608#comment-16044608
 ] 

Randall Hauch commented on KAFKA-5412:
--

[~ppatierno], you might want to use "Submit Patch" button to set the correct 
status. Not sure why that was not done automatically.

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.1
>Reporter: Paolo Patierno
> Fix For: 0.11.1.0
>
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-09 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-5412:
-
Affects Version/s: (was: 0.11.0.0)
   0.10.2.1

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.1
>Reporter: Paolo Patierno
> Fix For: 0.11.1.0
>
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-09 Thread Randall Hauch (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044606#comment-16044606
 ] 

Randall Hauch commented on KAFKA-5412:
--

I've initially targeted this to 0.11.1.0, since I think the first release 
candidate for 0.11.0.0 was already cut and this likely isn't an essential fix. 
Since this is a problem with the examples, it'd be great to backport.

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Paolo Patierno
> Fix For: 0.11.1.0
>
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-5412) Using connect-console-sink/source.properties raises an exception related to "file" property not found

2017-06-09 Thread Randall Hauch (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch updated KAFKA-5412:
-
Fix Version/s: 0.11.1.0

> Using connect-console-sink/source.properties raises an exception related to 
> "file" property not found
> -
>
> Key: KAFKA-5412
> URL: https://issues.apache.org/jira/browse/KAFKA-5412
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Paolo Patierno
> Fix For: 0.11.1.0
>
>
> Hi,
> with the latest 0.11.1.0-SNAPSHOT it happens that the Kafka Connect example 
> using connect-console-sink/source.properties doesn't work anymore because the 
> needed "file" property isn't found.
> This is because the underlying used FileStreamSink/Source connector and task 
> has defined a ConfigDef with "file" as mandatory parameter. In the case of 
> console example we want to have file=null so that stdin and stdout are used.
> One possible solution is set "file=" inside the provided 
> connect-console-sink/source.properties.
> The other one could be modify the FileStreamSink/Source source code in order 
> to remove the "file" definition from the ConfigDef.
> What do you think ?
> I can provide a PR for that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5409) Providing a custom client-id to the ConsoleProducer tool

2017-06-09 Thread Paolo Patierno (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044592#comment-16044592
 ] 

Paolo Patierno commented on KAFKA-5409:
---

I think that we can leave without the --client-id option but using the same 
approach as the ConsoleConsumer for the group.id so having the user to specify 
it with --property options but then avoiding to overwrite it. If the client.id 
is not specified then a random one is generated.
I'll provide a PR soon.

> Providing a custom client-id to the ConsoleProducer tool
> 
>
> Key: KAFKA-5409
> URL: https://issues.apache.org/jira/browse/KAFKA-5409
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Paolo Patierno
>Priority: Minor
>
> Hi,
> I see that the client-id properties for the ConsoleProducer tool is always 
> "console-producer". It could be useful having it as parameter on the command 
> line or generating a random one like happens for the ConsolerConsumer.
> If it makes sense to you, I can work on that.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-2170) 10 LogTest cases failed for file.renameTo failed under windows

2017-06-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044582#comment-16044582
 ] 

ASF GitHub Bot commented on KAFKA-2170:
---

GitHub user nxmbriggs404 opened a pull request:

https://github.com/apache/kafka/pull/3283

KAFKA-2170: Updated Fixes For Windows Platform

During stress testing of kafka 0.10.2.1 on a Windows platform, our group 
has encountered some issues that appear to be known to the community but not 
fully addressed by kafka.  Using:

https://github.com/apache/kafka/pull/154

as a guide, we have made derived changes to the source code and automated 
tests such that the "clients" and "core" tests pass for us on Windows and Linux 
platforms.  Our stress tests succeed as well.

This pull request adapts those changes to merge and build with kafka/trunk. 
 The "clients" and "core" tests from kafka/trunk pass on Linux for us with 
these changes in place, and all tests pass on Windows except:

ConsumerBounceTest (intermittent failures)
TransactionsTest
DeleteTopicTest.testDeleteTopicWithCleaner
EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards

Our intention is to help efforts to further kafka support for the Windows 
platform.  Our changes are the work of engineers from Nexidia building upon the 
work found in the aforementioned pull request link, and they are contributed to 
the community per kafka's open source license.

We welcome all feedback and look forward to working with the kafka 
community.

Matt Briggs
Principal Software Engineer
Nexidia, a NICE Analytics Company
www.nexidia.com

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nxmbriggs404/kafka nx-windows-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3283


commit 6ee3c167c6e2daa8ce4564d98f9f63967a0efece
Author: Matt Briggs 
Date:   2017-06-06T15:10:58Z

Handle log file deletion and renaming on Windows

Special treatment is needed for the deletion and renaming of log and
index files on Windows, due to the latter's general inability to
perform those operations while a file is opened or memory mapped.

The changes in this commit are essentially adapted from:

https://github.com/apache/kafka/pull/154

More detailed background information on the issues can also be found
via that link.

commit a0cd773a8d89d7df90fc75ce55a46fd8bb93d368
Author: Matt Briggs 
Date:   2017-06-06T15:21:23Z

Colliding log filenames cause test failures on Windows

This commit addresses an edge case with compaction and asynchronous
deletion of log files initially encountered when debugging:

LogCleanerTest.testRecoveryAfterCrash

failures on Windows.  It appears that troubles arise when compaction
of logs results in two segments having the same base address, hence
the same file names, and the segments get scheduled for background
deletion.  If one segment's files are pending deletion at the time the
other segment's files are scheduled for deletion, the file rename
attempted during the latter will fail on Windows (due to the typical
Windows issues with open/memory-mapped files).  It doesn't appear like
we can simply close out the former files, as it seems that kafka
intends to have them open for concurrent readers until the file
deletion interval has fully passed.

The fix in this commit basically sidesteps the issue by ensuring files
scheduled for background delete are renamed uniquely (by injecting a
UUID into the filename).  Essentially this follows the approach taken
with LogManager.asyncDelete and Log.DeleteDirSuffix.

Collision related errors were also observed when running a custom
stress test on Windows against a standalone kafka server.  The test
code caused extremely frequent compaction of the __consumer_offsets
topic partitions, which resulted in collisions of the latter's log
files when they were scheduled for deletion.  Use of the UUID was
successful in avoiding collision related issues in this context.

commit 3633d493bc3c0de3f177eecd11e374be74d4ac32
Author: Matt Briggs 
Date:   2017-06-06T15:27:59Z

Fixing log recovery crash on Windows

When a sanity check failure was detected by log recovery code, an
attempt to delete index files that were memory-mapped would lead to
a crash on Windows.  This commit adjusts the code to unmap, delete,
recreate, and remap index files such the recovery can continue.


[GitHub] kafka pull request #3283: KAFKA-2170: Updated Fixes For Windows Platform

2017-06-09 Thread nxmbriggs404
GitHub user nxmbriggs404 opened a pull request:

https://github.com/apache/kafka/pull/3283

KAFKA-2170: Updated Fixes For Windows Platform

During stress testing of kafka 0.10.2.1 on a Windows platform, our group 
has encountered some issues that appear to be known to the community but not 
fully addressed by kafka.  Using:

https://github.com/apache/kafka/pull/154

as a guide, we have made derived changes to the source code and automated 
tests such that the "clients" and "core" tests pass for us on Windows and Linux 
platforms.  Our stress tests succeed as well.

This pull request adapts those changes to merge and build with kafka/trunk. 
 The "clients" and "core" tests from kafka/trunk pass on Linux for us with 
these changes in place, and all tests pass on Windows except:

ConsumerBounceTest (intermittent failures)
TransactionsTest
DeleteTopicTest.testDeleteTopicWithCleaner
EpochDrivenReplicationProtocolAcceptanceTest.offsetsShouldNotGoBackwards

Our intention is to help efforts to further kafka support for the Windows 
platform.  Our changes are the work of engineers from Nexidia building upon the 
work found in the aforementioned pull request link, and they are contributed to 
the community per kafka's open source license.

We welcome all feedback and look forward to working with the kafka 
community.

Matt Briggs
Principal Software Engineer
Nexidia, a NICE Analytics Company
www.nexidia.com

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nxmbriggs404/kafka nx-windows-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/3283.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3283


commit 6ee3c167c6e2daa8ce4564d98f9f63967a0efece
Author: Matt Briggs 
Date:   2017-06-06T15:10:58Z

Handle log file deletion and renaming on Windows

Special treatment is needed for the deletion and renaming of log and
index files on Windows, due to the latter's general inability to
perform those operations while a file is opened or memory mapped.

The changes in this commit are essentially adapted from:

https://github.com/apache/kafka/pull/154

More detailed background information on the issues can also be found
via that link.

commit a0cd773a8d89d7df90fc75ce55a46fd8bb93d368
Author: Matt Briggs 
Date:   2017-06-06T15:21:23Z

Colliding log filenames cause test failures on Windows

This commit addresses an edge case with compaction and asynchronous
deletion of log files initially encountered when debugging:

LogCleanerTest.testRecoveryAfterCrash

failures on Windows.  It appears that troubles arise when compaction
of logs results in two segments having the same base address, hence
the same file names, and the segments get scheduled for background
deletion.  If one segment's files are pending deletion at the time the
other segment's files are scheduled for deletion, the file rename
attempted during the latter will fail on Windows (due to the typical
Windows issues with open/memory-mapped files).  It doesn't appear like
we can simply close out the former files, as it seems that kafka
intends to have them open for concurrent readers until the file
deletion interval has fully passed.

The fix in this commit basically sidesteps the issue by ensuring files
scheduled for background delete are renamed uniquely (by injecting a
UUID into the filename).  Essentially this follows the approach taken
with LogManager.asyncDelete and Log.DeleteDirSuffix.

Collision related errors were also observed when running a custom
stress test on Windows against a standalone kafka server.  The test
code caused extremely frequent compaction of the __consumer_offsets
topic partitions, which resulted in collisions of the latter's log
files when they were scheduled for deletion.  Use of the UUID was
successful in avoiding collision related issues in this context.

commit 3633d493bc3c0de3f177eecd11e374be74d4ac32
Author: Matt Briggs 
Date:   2017-06-06T15:27:59Z

Fixing log recovery crash on Windows

When a sanity check failure was detected by log recovery code, an
attempt to delete index files that were memory-mapped would lead to
a crash on Windows.  This commit adjusts the code to unmap, delete,
recreate, and remap index files such the recovery can continue.

Issue was found via the LogTest.testCorruptLog test

commit 6b8debd2b906d8691dba73fbbc917f10febf4959
Author: Matt Briggs 
Date:   2017-06-06T19:11:31Z

Windows-driven resource cleanup in tests

Fix 

[jira] [Updated] (KAFKA-5421) Getting InvalidRecordException

2017-06-09 Thread Rishi Reddy Bokka (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rishi Reddy Bokka updated KAFKA-5421:
-
Affects Version/s: 0.9.0.1

> Getting InvalidRecordException
> --
>
> Key: KAFKA-5421
> URL: https://issues.apache.org/jira/browse/KAFKA-5421
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Rishi Reddy Bokka
>Priority: Blocker
>
> In my application, I get data which gets queued using kafka and saved on the 
> disk and the consumer which gets this data from kafka and does the 
> processing. But When my consumer is trying to read data from kafka I am 
> getting below exceptions :
> 2017-06-09 10:57:24,733 ERROR NetworkClient Uncaught error in request 
> completion:
> org.apache.kafka.common.KafkaException: Error deserializing key/value for 
> partition TcpMessage-1 at offset 155884487
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:628)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.handleFetchResponse(Fetcher.java:566)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.access$000(Fetcher.java:69)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:139)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher$1.onSuccess(Fetcher.java:136)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
>  ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274) 
> [kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
>  [kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
>  [kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
>  [kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:908)
>  [kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:853) 
> [kafka-clients-0.9.0.1.jar:?]
> at 
> com.affirmed.mediation.edr.kafka.tcpMessage.TcpMessageConsumer.doWork(TcpMessageConsumer.java:190)
>  [EdrServer.jar:?]
> at 
> com.affirmed.mediation.edr.kafka.tcpMessage.TcpMessageConsumer.run(TcpMessageConsumer.java:248)
>  [EdrServer.jar:?]
> Caused by: org.apache.kafka.common.record.InvalidRecordException: Record is 
> corrupt (stored crc = 2016852547, computed crc = 1399853379)
> at org.apache.kafka.common.record.Record.ensureValid(Record.java:226) 
> ~[kafka-clients-0.9.0.1.jar:?]
> at 
> org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:617)
>  ~[kafka-clients-0.9.0.1.jar:?]
> ... 15 more
> Could anyone please help me with this. I got stuck with it and not able to 
> figure out the root.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-5420) Console producer --key-serializer and --value-serializer are always overwritten by ByteArraySerializer

2017-06-09 Thread Paolo Patierno (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno resolved KAFKA-5420.
---
Resolution: Duplicate

Duplicate of KAFKA-2526

> Console producer --key-serializer and --value-serializer are always 
> overwritten by ByteArraySerializer
> --
>
> Key: KAFKA-5420
> URL: https://issues.apache.org/jira/browse/KAFKA-5420
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Paolo Patierno
>Priority: Minor
>
> Hi,
> the --key-serializer and --value-serializer options passed to the command 
> line are always overwritten here :
> {code}
> props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
> "org.apache.kafka.common.serialization.ByteArraySerializer")
> props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, 
> "org.apache.kafka.common.serialization.ByteArraySerializer")
> {code}
> in the getNewProducerProps() method.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-5419) Console consumer --key-deserializer and --value-deserializer are always overwritten by ByteArrayDeserializer

2017-06-09 Thread Paolo Patierno (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paolo Patierno resolved KAFKA-5419.
---
Resolution: Duplicate

Duplicate of KAFKA-2526

> Console consumer --key-deserializer and --value-deserializer are always 
> overwritten by ByteArrayDeserializer
> 
>
> Key: KAFKA-5419
> URL: https://issues.apache.org/jira/browse/KAFKA-5419
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Paolo Patierno
>Priority: Minor
>
> Hi,
> the --key-deserializer and  --value-deserializer options passed to the 
> command line are always overwritten here :
> {code}
> props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer")
> props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
> "org.apache.kafka.common.serialization.ByteArrayDeserializer")
> {code}
> in the getNewConsumerProps() method.
> Thanks,
> Paolo.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >