[jira] [Commented] (KAFKA-8008) Clients unable to connect and replicas are not able to connect to each other

2019-02-27 Thread Naga (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780189#comment-16780189
 ] 

Naga commented on KAFKA-8008:
-

I'm also observing the similar issue.

> Clients unable to connect and replicas are not able to connect to each other
> 
>
> Key: KAFKA-8008
> URL: https://issues.apache.org/jira/browse/KAFKA-8008
> Project: Kafka
>  Issue Type: Bug
>  Components: controller, core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11
>Reporter: Abhi
>Priority: Critical
>
> Hi,
> I upgrade to Kafka v2.1.1 recently and seeing the below exceptions in all the 
> servers. The kafka-network-thread-1-ListenerName are all consuming full cpu 
> cycles. Lots of TCP connections are in CLOSE_WAIT state.
> My broker setup is using kerberos authentication with 
> -Dsun.security.jgss.native=true.
> I am not sure how to handle this? Will increasing the kafka-network thread 
> count help if it is possible?
> Does this seem like a bug? I am happy to help in anyway I can as this issue 
> blocking our production usage and would like to get it resolved as early as 
> possible.
> *server.log snippet from one of the servers:*
> [2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node 
> 2 with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 
> (id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
> SEND_APIVERSIONS_REQUEST 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG Creating SaslClient: 
> client=kafka/mwkafka-prod-01.nyc.foo@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI]
>  (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
> SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector)
> [2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
> RECEIVE_APIVERSIONS_RESPONSE 
> (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
> [2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
> fetcherId=3] Completed connection to node 2. Ready. 
> (org.apache.kafka.clients.NetworkClient)
> [2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 
> 5 with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) 
> to node 5: java.net.SocketTimeoutException: Failed to connect within 3 
> ms. (org.apache.kafka.clients.FetchSessionHandler)
> [2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, 
> fetcherId=1] Error in response for fetch request (type=FetchRequest, 
> replicaId=1, maxWait=1, minBytes=1, maxBytes=10485760, 
> fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-newtp-5-64-0=(fetchOffset=8936, logStartOffset=8936, 
> maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
> reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> reddyvel-tp3-58-0=(fetchOffset=264495, logStartOffset=264495, 
> maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
> fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, 
> currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, 
> logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, logStartOffset=2, 
> maxBytes=1048576, currentLeaderEpoch=Optional[6]), 
> reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-988-0=(fetchOffset=3173185, logStartOffset=3173185, 
> maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
> reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, 
> 

[jira] [Resolved] (KAFKA-7962) StickyAssignor: throws NullPointerException during assignments if topic is deleted

2019-02-27 Thread Vahid Hashemian (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian resolved KAFKA-7962.

   Resolution: Fixed
 Reviewer: Vahid Hashemian
Fix Version/s: 2.3.0

> StickyAssignor: throws NullPointerException during assignments if topic is 
> deleted
> --
>
> Key: KAFKA-7962
> URL: https://issues.apache.org/jira/browse/KAFKA-7962
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
> Environment: 1. MacOS, com.salesforce.kafka.test.KafkaTestUtils (kind 
> of embedded kafka integration tests)
> 2. Linux, dockerised kafka and our service
>Reporter: Oleg Smirnov
>Assignee: huxihx
>Priority: Major
> Fix For: 2.3.0
>
> Attachments: NPE-StickyAssignor-issues.apache.log
>
>
> Integration tests with  com.salesforce.kafka.test.KafkaTestUtils, local 
> setup, StickyAssignor used, local topics are created / removed, one topic is 
> created in the beginning of test and without unsubscribing from it - deleted.
> Same happens in real environment.
>  
>  # have single "topic" with 1 partition
>  # single consumer subscribed to this "topic" (StickyAssignor)
>  # delete "topic"
> =>
>  * rebalance starts, topic partition(s) is revoked
>  * on assignment StickyAssignor throws exception (line 223), because 
> partitionsPerTopic.("topic") returns null in for loop (topic deleted - no 
> partitions are present)
>  
> In the provided log part, tearDown() causes topic deletion, while consumer 
> still running and tries to poll data from topic.
> RangeAssignor works fine (revokes partition, assigns empty set).
> Problem doesn't have workaround (like handle i in onPartitionsAssigned and 
> remove unsubscribe topic), because everything happens before listener called.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7962) StickyAssignor: throws NullPointerException during assignments if topic is deleted

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780163#comment-16780163
 ] 

ASF GitHub Bot commented on KAFKA-7962:
---

vahidhashemian commented on pull request #6308: KAFKA-7962: Avoid NPE for 
StickyAssignor
URL: https://github.com/apache/kafka/pull/6308
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> StickyAssignor: throws NullPointerException during assignments if topic is 
> deleted
> --
>
> Key: KAFKA-7962
> URL: https://issues.apache.org/jira/browse/KAFKA-7962
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 2.1.0
> Environment: 1. MacOS, com.salesforce.kafka.test.KafkaTestUtils (kind 
> of embedded kafka integration tests)
> 2. Linux, dockerised kafka and our service
>Reporter: Oleg Smirnov
>Assignee: huxihx
>Priority: Major
> Attachments: NPE-StickyAssignor-issues.apache.log
>
>
> Integration tests with  com.salesforce.kafka.test.KafkaTestUtils, local 
> setup, StickyAssignor used, local topics are created / removed, one topic is 
> created in the beginning of test and without unsubscribing from it - deleted.
> Same happens in real environment.
>  
>  # have single "topic" with 1 partition
>  # single consumer subscribed to this "topic" (StickyAssignor)
>  # delete "topic"
> =>
>  * rebalance starts, topic partition(s) is revoked
>  * on assignment StickyAssignor throws exception (line 223), because 
> partitionsPerTopic.("topic") returns null in for loop (topic deleted - no 
> partitions are present)
>  
> In the provided log part, tearDown() causes topic deletion, while consumer 
> still running and tries to poll data from topic.
> RangeAssignor works fine (revokes partition, assigns empty set).
> Problem doesn't have workaround (like handle i in onPartitionsAssigned and 
> remove unsubscribe topic), because everything happens before listener called.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8014) Extend Connect integration tests to add and remove workers dynamically

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780128#comment-16780128
 ] 

ASF GitHub Bot commented on KAFKA-8014:
---

kkonstantine commented on pull request #6342: KAFKA-8014: Extend Connect 
integration tests to add and remove workers dynamically
URL: https://github.com/apache/kafka/pull/6342
 
 
   Extend Connect's integration test harness to add the capability to add and 
remove workers dynamically as well as discover the set of active workers at any 
point during an integration test. 
   
   The current integration tests are extended to test the new capabilities. 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Extend Connect integration tests to add and remove workers dynamically
> --
>
> Key: KAFKA-8014
> URL: https://issues.apache.org/jira/browse/KAFKA-8014
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 2.3.0
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Major
>
>  
> To allow for even more integration tests that can focus on testing Connect 
> framework itself, it seems necessary to add the ability to add and remove 
> workers from within a test case. 
> The suggestion is to extend Connect's integration test harness 
> {{EmbeddedConnectCluster}} to include methods to add and remove workers as 
> well as return the workers that are online at any given point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8014) Extend Connect integration tests to add and remove workers dynamically

2019-02-27 Thread Konstantine Karantasis (JIRA)
Konstantine Karantasis created KAFKA-8014:
-

 Summary: Extend Connect integration tests to add and remove 
workers dynamically
 Key: KAFKA-8014
 URL: https://issues.apache.org/jira/browse/KAFKA-8014
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.3.0
Reporter: Konstantine Karantasis
Assignee: Konstantine Karantasis


 

To allow for even more integration tests that can focus on testing Connect 
framework itself, it seems necessary to add the ability to add and remove 
workers from within a test case. 

The suggestion is to extend Connect's integration test harness 
{{EmbeddedConnectCluster}} to include methods to add and remove workers as well 
as return the workers that are online at any given point.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-02-27 Thread Jonathan Gordon (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780115#comment-16780115
 ] 

Jonathan Gordon commented on KAFKA-7652:


{quote}1) when you profile on latest trunk did you see the same pattern as 
observed in [https://i.imgur.com/IHxC2cZ.png] as well as in the trace logging 
compared with 0.10.2.x?
{quote}
The image you linked is actually for 0.10.2.x, which is our current deployment. 
It shows us gated by RocksDB, but that's actually *faster* than what we saw in 
0.11.0.0, the recent trunk, or the test I just ran against 2.2.0-rc0:

[https://i.imgur.com/L6PWIEF.png]
{quote}2) practically the lookups in the caching layer is very cheap and hence 
even increased a lot it should not contribute to much overhead, whereas the 
fetches on the underlying store would be much more expensive. Could you confirm 
if the performance bottleneck is from the underlying rocksDB, or from the 
caching layer access?
{quote}
For 2.2.0-rc0, we're spending the bulk of our time trying to retrieve records 
from the NamedCache. See:

[^0.10.2.1-NamedCache.txt]

[^2.2.0-rc0_b-NamedCache.txt]

While I agree it seems it should be more performant per retrieval, as you can 
see from the latest logs, it's the difference between 1,096,089 (2.2.0-rc0) and 
19,245 (0.10.2.1) hits per second to the cache. The two orders of magnitude 
appear to outweigh whatever performance benefit we'd receive from the caching 
layer. 

This is just one of 8 tasks. During their respective runs, the services 
consumed 8.4M messages (0.10.2.1) with no lag vs 637K messages (2.2.0-rc0) with 
considerable lag. I'd be happy to run again with whatever custom logging or 
configuration you suggest to help further pinpoint the problem. 

 

 

 

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
> Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7652) Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0

2019-02-27 Thread Jonathan Gordon (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gordon updated KAFKA-7652:
---
Attachment: 0.10.2.1-NamedCache.txt
2.2.0-rc0_b-NamedCache.txt

> Kafka Streams Session store performance degradation from 0.10.2.2 to 0.11.0.0
> -
>
> Key: KAFKA-7652
> URL: https://issues.apache.org/jira/browse/KAFKA-7652
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.0, 0.11.0.1, 0.11.0.2, 0.11.0.3, 1.1.1, 2.0.0, 
> 2.0.1
>Reporter: Jonathan Gordon
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 2.2.0
>
> Attachments: 0.10.2.1-NamedCache.txt, 2.2.0-rc0_b-NamedCache.txt, 
> kafka_10_2_1_flushes.txt, kafka_11_0_3_flushes.txt
>
>
> I'm creating this issue in response to [~guozhang]'s request on the mailing 
> list:
> [https://lists.apache.org/thread.html/97d620f4fd76be070ca4e2c70e2fda53cafe051e8fc4505dbcca0321@%3Cusers.kafka.apache.org%3E]
> We are attempting to upgrade our Kafka Streams application from 0.10.2.1 but 
> experience a severe performance degradation. The highest amount of CPU time 
> seems spent in retrieving from the local cache. Here's an example thread 
> profile with 0.11.0.0:
> [https://i.imgur.com/l5VEsC2.png]
> When things are running smoothly we're gated by retrieving from the state 
> store with acceptable performance. Here's an example thread profile with 
> 0.10.2.1:
> [https://i.imgur.com/IHxC2cZ.png]
> Some investigation reveals that it appears we're performing about 3 orders 
> magnitude more lookups on the NamedCache over a comparable time period. I've 
> attached logs of the NamedCache flush logs for 0.10.2.1 and 0.11.0.3.
> We're using session windows and have the app configured for 
> commit.interval.ms = 30 * 1000 and cache.max.bytes.buffering = 10485760
> I'm happy to share more details if they would be helpful. Also happy to run 
> tests on our data.
> I also found this issue, which seems like it may be related:
> https://issues.apache.org/jira/browse/KAFKA-4904
>  
> KIP-420: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-420%3A+Add+Single+Value+Fetch+in+Session+Stores]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8013) Avoid buffer underflow when reading a Struct from a partially correct buffer

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779973#comment-16779973
 ] 

ASF GitHub Bot commented on KAFKA-8013:
---

kkonstantine commented on pull request #6340: KAFKA-8013: Avoid underflow when 
reading a Struct from a partially correct buffer
URL: https://github.com/apache/kafka/pull/6340
 
 
   Protocol compatibility can be facilitated if a Struct, that has been defined 
as an extension of a previous Struct by adding fields at the end of the older 
version, can read a message of an older version by ignoring the absence of the 
missing new fields. Reading the missing fields should be allowed by the 
definition of these fields (they have to be nullable).
   
   * Tested by adding unit tests around Schema.read in both directions 
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Avoid buffer underflow when reading a Struct from a partially correct buffer
> 
>
> Key: KAFKA-8013
> URL: https://issues.apache.org/jira/browse/KAFKA-8013
> Project: Kafka
>  Issue Type: Bug
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Major
> Fix For: 2.3.0
>
>
> Protocol compatibility can be facilitated if a {{Struct}}, that has been 
> defined as an extension of a previous {{Struct}} by adding fields at the end 
> of the older version, can read an older version by ignoring the absence of 
> the missing new fields. Of course this has to be allowed by the definition of 
> these fields (they have to be {{nullable}}). 
> For example, this should work: 
> {code:java}
> Schema oldSchema = new Schema(new Field("field1", Type.NULLABLE_STRING));
> Schema newSchema = new Schema(new Field("field1", Type.NULLABLE_STRING), new 
> Field("field2" , Type.NULLABLE_STRING));
> String value = "foo bar baz";
> Struct oldFormat = new Struct(oldSchema).set("field1", value);
> ByteBuffer buffer = ByteBuffer.allocate(oldSchema.sizeOf(oldFormat));
> oldFormat.writeTo(buffer);
> buffer.flip();
> Struct newFormat = newSchema.read(buffer);
> assertEquals(value, newFormat.get("field1"));
> assertEquals(null, newFormat.get("field2"));
> {code}
> Currently it does not. 
> A fix to the above is considered safe, because depending on buffer underflow 
> to detect missing data at the end of a {{Struct}} is not an appropriate 
> check. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7936) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7936.

Resolution: Duplicate

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7936
> URL: https://issues.apache.org/jira/browse/KAFKA-7936
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, unit tests
>Reporter: Matthias J. Sax
>Priority: Major
>
> [https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3393/tests]
> {quote}java.util.concurrent.ExecutionException: Boxed Error
>  at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
>  at 
> scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
>  at 
> scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
>  at scala.concurrent.Promise.complete(Promise.scala:53)
>  at scala.concurrent.Promise.complete$(Promise.scala:52)
>  at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
>  at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
>  at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.lang.AssertionError: Received 0, expected at least 68
>  at org.junit.Assert.fail(Assert.java:89)
>  at org.junit.Assert.assertTrue(Assert.java:42)
>  at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:562)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$5(ConsumerBounceTest.scala:347)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
>  at scala.util.Success.$anonfun$map$1(Try.scala:255)
>  at scala.util.Success.map(Try.scala:213)
>  at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
>  at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
>  ... 9 more
> {quote}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8013) Avoid buffer underflow when reading a Struct from a partially correct buffer

2019-02-27 Thread Konstantine Karantasis (JIRA)
Konstantine Karantasis created KAFKA-8013:
-

 Summary: Avoid buffer underflow when reading a Struct from a 
partially correct buffer
 Key: KAFKA-8013
 URL: https://issues.apache.org/jira/browse/KAFKA-8013
 Project: Kafka
  Issue Type: Bug
Reporter: Konstantine Karantasis
Assignee: Konstantine Karantasis
 Fix For: 2.3.0


Protocol compatibility can be facilitated if a {{Struct}}, that has been 
defined as an extension of a previous {{Struct}} by adding fields at the end of 
the older version, can read an older version by ignoring the absence of the 
missing new fields. Of course this has to be allowed by the definition of these 
fields (they have to be {{nullable}}). 

For example, this should work: 
{code:java}
Schema oldSchema = new Schema(new Field("field1", Type.NULLABLE_STRING));
Schema newSchema = new Schema(new Field("field1", Type.NULLABLE_STRING), new 
Field("field2" , Type.NULLABLE_STRING));
String value = "foo bar baz";
Struct oldFormat = new Struct(oldSchema).set("field1", value);
ByteBuffer buffer = ByteBuffer.allocate(oldSchema.sizeOf(oldFormat));
oldFormat.writeTo(buffer);
buffer.flip();
Struct newFormat = newSchema.read(buffer);
assertEquals(value, newFormat.get("field1"));
assertEquals(null, newFormat.get("field2"));
{code}
Currently it does not. 

A fix to the above is considered safe, because depending on buffer underflow to 
detect missing data at the end of a {{Struct}} is not an appropriate check. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-02-27 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779959#comment-16779959
 ] 

Matthias J. Sax commented on KAFKA-7965:


Failed again: 
[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk11/detail/kafka-trunk-jdk11/323/tests]
{quote}java.util.concurrent.ExecutionException: Boxed Error
at scala.concurrent.impl.Promise$.resolver(Promise.scala:87)
at 
scala.concurrent.impl.Promise$.scala$concurrent$impl$Promise$$resolveTry(Promise.scala:79)
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:284)
at scala.concurrent.Promise.complete(Promise.scala:53)
at scala.concurrent.Promise.complete$(Promise.scala:52)
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:187)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.scalatest.junit.JUnitTestFailedError: Received more than one 
class org.apache.kafka.common.errors.GroupMaxSizeReachedException
at 
org.scalatest.junit.AssertionsForJUnit.newAssertionFailedException(AssertionsForJUnit.scala:100)
at 
org.scalatest.junit.AssertionsForJUnit.newAssertionFailedException$(AssertionsForJUnit.scala:99)
at 
org.scalatest.junit.JUnitSuite.newAssertionFailedException(JUnitSuite.scala:71)
at org.scalatest.Assertions.fail(Assertions.scala:1089)
at org.scalatest.Assertions.fail$(Assertions.scala:1085)
at org.scalatest.junit.JUnitSuite.fail(JUnitSuite.scala:71)
at 
kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$5(ConsumerBounceTest.scala:355)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:658)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
... 8 more{quote}

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Stanislav Kozlovski
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6582) Partitions get underreplicated, with a single ISR, and doesn't recover. Other brokers do not take over and we need to manually restart the broker.

2019-02-27 Thread Sebastian Schmitz (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779640#comment-16779640
 ] 

Sebastian Schmitz commented on KAFKA-6582:
--

2.1.1 fixed the problem for us and we'll now go ahead with upgrading up to 
Prod...

> Partitions get underreplicated, with a single ISR, and doesn't recover. Other 
> brokers do not take over and we need to manually restart the broker.
> --
>
> Key: KAFKA-6582
> URL: https://issues.apache.org/jira/browse/KAFKA-6582
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 1.0.0
> Environment: Ubuntu 16.04
> Linux kafka04 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux
> java version "9.0.1"
> Java(TM) SE Runtime Environment (build 9.0.1+11)
> Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode) 
> but also tried with the latest JVM 8 before with the same result.
>Reporter: Jurriaan Pruis
>Priority: Major
> Attachments: Screenshot 2019-01-18 at 13.08.17.png, Screenshot 
> 2019-01-18 at 13.16.59.png
>
>
> Partitions get underreplicated, with a single ISR, and doesn't recover. Other 
> brokers do not take over and we need to manually restart the 'single ISR' 
> broker (if you describe the partitions of replicated topic it is clear that 
> some partitions are only in sync on this broker).
> This bug resembles KAFKA-4477 a lot, but since that issue is marked as 
> resolved this is probably something else but similar.
> We have the same issue (or at least it looks pretty similar) on Kafka 1.0. 
> Since upgrading to Kafka 1.0 in November 2017 we've had these issues (we've 
> upgraded from Kafka 0.10.2.1).
> This happens almost every 24-48 hours on a random broker. This is why we 
> currently have a cronjob which restarts every broker every 24 hours. 
> During this issue the ISR shows the following server log: 
> {code:java}
> [2018-02-20 12:02:08,342] WARN Attempting to send response via channel for 
> which there is no open connection, connection id 
> 10.132.0.32:9092-10.14.148.20:56352-96708 (kafka.network.Processor)
> [2018-02-20 12:02:08,364] WARN Attempting to send response via channel for 
> which there is no open connection, connection id 
> 10.132.0.32:9092-10.14.150.25:54412-96715 (kafka.network.Processor)
> [2018-02-20 12:02:08,349] WARN Attempting to send response via channel for 
> which there is no open connection, connection id 
> 10.132.0.32:9092-10.14.149.18:35182-96705 (kafka.network.Processor)
> [2018-02-20 12:02:08,379] WARN Attempting to send response via channel for 
> which there is no open connection, connection id 
> 10.132.0.32:9092-10.14.150.25:54456-96717 (kafka.network.Processor)
> [2018-02-20 12:02:08,448] WARN Attempting to send response via channel for 
> which there is no open connection, connection id 
> 10.132.0.32:9092-10.14.159.20:36388-96720 (kafka.network.Processor)
> [2018-02-20 12:02:08,683] WARN Attempting to send response via channel for 
> which there is no open connection, connection id 
> 10.132.0.32:9092-10.14.157.110:41922-96740 (kafka.network.Processor)
> {code}
> Also on the ISR broker, the controller log shows this:
> {code:java}
> [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-3-send-thread]: 
> Controller 3 connected to 10.132.0.32:9092 (id: 3 rack: null) for sending 
> state change requests (kafka.controller.RequestSendThread)
> [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-0-send-thread]: 
> Controller 3 connected to 10.132.0.10:9092 (id: 0 rack: null) for sending 
> state change requests (kafka.controller.RequestSendThread)
> [2018-02-20 12:02:14,928] INFO [Controller-3-to-broker-1-send-thread]: 
> Controller 3 connected to 10.132.0.12:9092 (id: 1 rack: null) for sending 
> state change requests (kafka.controller.RequestSendThread){code}
> And the non-ISR brokers show these kind of errors:
>  
> {code:java}
> 2018-02-20 12:02:29,204] WARN [ReplicaFetcher replicaId=1, leaderId=3, 
> fetcherId=0] Error in fetch to broker 3, request (type=FetchRequest, 
> replicaId=1, maxWait=500, minBytes=1, maxBytes=10485760, 
> fetchData={..}, isolationLevel=READ_UNCOMMITTED) 
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response was 
> read
>  at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:95)
>  at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
>  at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:205)
>  at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:41)
>  at 
> 

[jira] [Commented] (KAFKA-8008) Clients unable to connect and replicas are not able to connect to each other

2019-02-27 Thread Abhi (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779637#comment-16779637
 ] 

Abhi commented on KAFKA-8008:
-

One the kafka network thread is stuck in below state:

"kafka-network-thread-1-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2" #81 
prio=5 os_prio=0 cpu=4839838.11ms elapsed=33760.37s allocated=59G 
defined_classes=36 tid=0x000
07fb046d12800 nid=0x100fa runnable  [0x7faee8df8000]
   java.lang.Thread.State: RUNNABLE
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1928)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at java.util.HashMap$TreeNode.find(java.base@11.0.2/HashMap.java:1924)
at 
java.util.HashMap$TreeNode.putTreeVal(java.base@11.0.2/HashMap.java:2043)
at java.util.HashMap.putVal(java.base@11.0.2/HashMap.java:633)
at java.util.HashMap.put(java.base@11.0.2/HashMap.java:607)
at java.util.HashSet.add(java.base@11.0.2/HashSet.java:220)
at 
javax.security.auth.Subject$ClassSet.populateSet(java.base@11.0.2/Subject.java:1518)
at 
javax.security.auth.Subject$ClassSet.(java.base@11.0.2/Subject.java:1472)
- locked <0x0006c8a9b970> (a java.util.Collections$SynchronizedSet)
at 
javax.security.auth.Subject.getPrivateCredentials(java.base@11.0.2/Subject.java:764)
at 
sun.security.jgss.GSSUtil$1.run(java.security.jgss@11.0.2/GSSUtil.java:336)
at 
sun.security.jgss.GSSUtil$1.run(java.security.jgss@11.0.2/GSSUtil.java:328)
at java.security.AccessController.doPrivileged(java.base@11.0.2/Native 
Method)
at 
sun.security.jgss.GSSUtil.searchSubject(java.security.jgss@11.0.2/GSSUtil.java:328)
at 
sun.security.jgss.wrapper.NativeGSSFactory.getCredFromSubject(java.security.jgss@11.0.2/NativeGSSFactory.java:53)
at 
sun.security.jgss.wrapper.NativeGSSFactory.getCredentialElement(java.security.jgss@11.0.2/NativeGSSFactory.java:116)
at 
sun.security.jgss.GSSManagerImpl.getCredentialElement(java.security.jgss@11.0.2/GSSManagerImpl.java:187)
at 
sun.security.jgss.GSSCredentialImpl.add(java.security.jgss@11.0.2/GSSCredentialImpl.java:439)
at 
sun.security.jgss.GSSCredentialImpl.(java.security.jgss@11.0.2/GSSCredentialImpl.java:74)
at 
sun.security.jgss.GSSManagerImpl.createCredential(java.security.jgss@11.0.2/GSSManagerImpl.java:148)
at 
com.sun.security.sasl.gsskerb.GssKrb5Server.(jdk.security.jgss@11.0.2/GssKrb5Server.java:108)
at 
com.sun.security.sasl.gsskerb.FactoryImpl.createSaslServer(jdk.security.jgss@11.0.2/FactoryImpl.java:85)
at 
javax.security.sasl.Sasl.createSaslServer(java.security.sasl@11.0.2/Sasl.java:537)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.lambda$createSaslKerberosServer$1(SaslServerAuthenticator.java:212)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator$$Lambda$970/0x0008006a7840.run(Unknown
 Source)
at java.security.AccessController.doPrivileged(java.base@11.0.2/Native 
Method)
at javax.security.auth.Subject.doAs(java.base@11.0.2/Subject.java:423)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.createSaslKerberosServer(SaslServerAuthenticator.java:211)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.createSaslServer(SaslServerAuthenticator.java:164)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.handleKafkaRequest(SaslServerAuthenticator.java:450)
at 
org.apache.kafka.common.security.authenticator.SaslServerAuthenticator.authenticate(SaslServerAuthenticator.java:248)
at 
org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:132)
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:532)
at org.apache.kafka.common.network.Selector.poll(Selector.java:467)
at kafka.network.Processor.poll(SocketServer.scala:689)
at kafka.network.Processor.run(SocketServer.scala:594)
at java.lang.Thread.run(java.base@11.0.2/Thread.java:834)



> Clients unable to connect and replicas are not able to connect to each other
> 
>
> Key: KAFKA-8008
> URL: https://issues.apache.org/jira/browse/KAFKA-8008
>

[jira] [Assigned] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

2019-02-27 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reassigned KAFKA-8011:
--

Assignee: Bill Bejeck

> Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> 
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Bill Bejeck
>Assignee: Bill Bejeck
>Priority: Critical
>  Labels: flaky-test, newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779605#comment-16779605
 ] 

ASF GitHub Bot commented on KAFKA-8011:
---

bbejeck commented on pull request #6338: KAFKA-8011: Fix for race condition 
causing ConcurrentModificationException
URL: https://github.com/apache/kafka/pull/6338
 
 
   In the `RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated()` and 
`RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted()` a race 
condition exists where the `ConsumerRebalanceListener` in the test modifies the 
list of subscribed topics when the condition for the test success is comparing 
the same array instance against expected values.
   
   This PR should fix this race condition by using a `CopyOnWriteArrayList` 
which guarantees safe traversal of the list even when a concurrent modification 
is happening.  
   
   Using the `CopyOnWriteArrayList` should not impact performance negatively as 
the number of traversals, a result of using `ArrayList.equals()`, far outnumber 
(`TestUtils.waitForCondition()` checks for a successful result every`100ms`) 
the possible modifications as there will be at most one topic name 
added/removed during the test.
   
   For testing, I updated the `RegexSourceIntegrationTest`integration test and 
ran the suite of streams tests.
   
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> 
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Critical
>  Labels: flaky-test, newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7978) Flaky Test SaslSslAdminClientIntegrationTest#testConsumerGroups

2019-02-27 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779552#comment-16779552
 ] 

Matthias J. Sax commented on KAFKA-7978:


Happened again on `trunk`: 
[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3421/tests]

> Flaky Test SaslSslAdminClientIntegrationTest#testConsumerGroups
> ---
>
> Key: KAFKA-7978
> URL: https://issues.apache.org/jira/browse/KAFKA-7978
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/25/]
> {quote}java.lang.AssertionError: expected:<2> but was:<0> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.junit.Assert.assertEquals(Assert.java:631) at 
> kafka.api.AdminClientIntegrationTest.testConsumerGroups(AdminClientIntegrationTest.scala:1157)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8012) NullPointerException while truncating at high watermark can crash replica fetcher thread

2019-02-27 Thread Colin Hicks (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Hicks updated KAFKA-8012:
---
Description: 
An NPE can occur when the replica fetcher manager simultaneously calls 
`removeFetcherForPartitions`, removing the corresponding partitionStates, while 
a replica fetcher thread attempts to truncate the same partition(s) in 
`truncateToHighWatermark`.

Stack trace for failure case:

{{java.lang.NullPointerException}}
{{at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$2(AbstractFetcherThread.scala:213)}}
{{at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)}}
{{at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$1(AbstractFetcherThread.scala:211)}}
{{at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
{{at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)}}
{{at 
kafka.server.AbstractFetcherThread.truncateToHighWatermark(AbstractFetcherThread.scala:207)}}
{{at 
kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)}}
{{at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)}}
{{at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)}}

 

  was:
An NPE can occur when the replica fetcher manager simultaneously calls 
`removeFetcherForPartitions`, removing the corresponding partitionStates, while 
a replica fetcher thread attempts to truncate the same partition(s) in 
`truncateToHighWatermark`.

Stack trace for failure case:

{{java.lang.NullPointerException}}
{{ at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$2(AbstractFetcherThread.scala:213)}}
{{ at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)}}
{{ at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$1(AbstractFetcherThread.scala:211)}}
{{ at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
{{ at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)}}
{{ at 
kafka.server.AbstractFetcherThread.truncateToHighWatermark(AbstractFetcherThread.scala:207)}}
{{ at 
kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)}}
{{ at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)}}
{{ at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)}}

 


> NullPointerException while truncating at high watermark can crash replica 
> fetcher thread
> 
>
> Key: KAFKA-8012
> URL: https://issues.apache.org/jira/browse/KAFKA-8012
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Colin Hicks
>Priority: Blocker
> Fix For: 2.2.0
>
>
> An NPE can occur when the replica fetcher manager simultaneously calls 
> `removeFetcherForPartitions`, removing the corresponding partitionStates, 
> while a replica fetcher thread attempts to truncate the same partition(s) in 
> `truncateToHighWatermark`.
> Stack trace for failure case:
> {{java.lang.NullPointerException}}
> {{at 
> kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$2(AbstractFetcherThread.scala:213)}}
> {{at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)}}
> {{at 
> kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$1(AbstractFetcherThread.scala:211)}}
> {{at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
> {{at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)}}
> {{at 
> kafka.server.AbstractFetcherThread.truncateToHighWatermark(AbstractFetcherThread.scala:207)}}
> {{at 
> kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)}}
> {{at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)}}
> {{at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7978) Flaky Test SaslSslAdminClientIntegrationTest#testConsumerGroups

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7978:
---
Affects Version/s: 2.3.0

> Flaky Test SaslSslAdminClientIntegrationTest#testConsumerGroups
> ---
>
> Key: KAFKA-7978
> URL: https://issues.apache.org/jira/browse/KAFKA-7978
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/25/]
> {quote}java.lang.AssertionError: expected:<2> but was:<0> at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.failNotEquals(Assert.java:834) at 
> org.junit.Assert.assertEquals(Assert.java:645) at 
> org.junit.Assert.assertEquals(Assert.java:631) at 
> kafka.api.AdminClientIntegrationTest.testConsumerGroups(AdminClientIntegrationTest.scala:1157)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8012) NullPointerException while truncating at high watermark can crash replica fetcher thread

2019-02-27 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779550#comment-16779550
 ] 

Matthias J. Sax commented on KAFKA-8012:


Marking this as "fixed version" 2.2 to make clear we need to fix this before 
releasing 2.2

> NullPointerException while truncating at high watermark can crash replica 
> fetcher thread
> 
>
> Key: KAFKA-8012
> URL: https://issues.apache.org/jira/browse/KAFKA-8012
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Colin Hicks
>Priority: Blocker
> Fix For: 2.2.0
>
>
> An NPE can occur when the replica fetcher manager simultaneously calls 
> `removeFetcherForPartitions`, removing the corresponding partitionStates, 
> while a replica fetcher thread attempts to truncate the same partition(s) in 
> `truncateToHighWatermark`.
> Stack trace for failure case:
> {{java.lang.NullPointerException}}
> {{ at 
> kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$2(AbstractFetcherThread.scala:213)}}
> {{ at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)}}
> {{ at 
> kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$1(AbstractFetcherThread.scala:211)}}
> {{ at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
> {{ at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)}}
> {{ at 
> kafka.server.AbstractFetcherThread.truncateToHighWatermark(AbstractFetcherThread.scala:207)}}
> {{ at 
> kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)}}
> {{ at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)}}
> {{ at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8012) NullPointerException while truncating at high watermark can crash replica fetcher thread

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8012:
---
Fix Version/s: 2.2.0

> NullPointerException while truncating at high watermark can crash replica 
> fetcher thread
> 
>
> Key: KAFKA-8012
> URL: https://issues.apache.org/jira/browse/KAFKA-8012
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2.0, 2.1.1
>Reporter: Colin Hicks
>Priority: Blocker
> Fix For: 2.2.0
>
>
> An NPE can occur when the replica fetcher manager simultaneously calls 
> `removeFetcherForPartitions`, removing the corresponding partitionStates, 
> while a replica fetcher thread attempts to truncate the same partition(s) in 
> `truncateToHighWatermark`.
> Stack trace for failure case:
> {{java.lang.NullPointerException}}
> {{ at 
> kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$2(AbstractFetcherThread.scala:213)}}
> {{ at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)}}
> {{ at 
> kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$1(AbstractFetcherThread.scala:211)}}
> {{ at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
> {{ at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)}}
> {{ at 
> kafka.server.AbstractFetcherThread.truncateToHighWatermark(AbstractFetcherThread.scala:207)}}
> {{ at 
> kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)}}
> {{ at 
> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)}}
> {{ at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8012) NullPointerException while truncating at high watermark can crash replica fetcher thread

2019-02-27 Thread Colin Hicks (JIRA)
Colin Hicks created KAFKA-8012:
--

 Summary: NullPointerException while truncating at high watermark 
can crash replica fetcher thread
 Key: KAFKA-8012
 URL: https://issues.apache.org/jira/browse/KAFKA-8012
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.1.1, 2.2.0
Reporter: Colin Hicks


An NPE can occur when the replica fetcher manager simultaneously calls 
`removeFetcherForPartitions`, removing the corresponding partitionStates, while 
a replica fetcher thread attempts to truncate the same partition(s) in 
`truncateToHighWatermark`.

Stack trace for failure case:

{{java.lang.NullPointerException}}
{{ at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$2(AbstractFetcherThread.scala:213)}}
{{ at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)}}
{{ at 
kafka.server.AbstractFetcherThread.$anonfun$truncateToHighWatermark$1(AbstractFetcherThread.scala:211)}}
{{ at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)}}
{{ at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)}}
{{ at 
kafka.server.AbstractFetcherThread.truncateToHighWatermark(AbstractFetcherThread.scala:207)}}
{{ at 
kafka.server.AbstractFetcherThread.maybeTruncate(AbstractFetcherThread.scala:173)}}
{{ at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)}}
{{ at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)}}

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6755) MaskField SMT should optionally take a literal value to use instead of using null

2019-02-27 Thread Randall Hauch (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779506#comment-16779506
 ] 

Randall Hauch commented on KAFKA-6755:
--

[~nimfadora], you should be able to create a KIP page. Thanks!

> MaskField SMT should optionally take a literal value to use instead of using 
> null
> -
>
> Key: KAFKA-6755
> URL: https://issues.apache.org/jira/browse/KAFKA-6755
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Assignee: Valeria Vasylieva
>Priority: Major
>  Labels: needs-kip, newbie
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> The existing {{org.apache.kafka.connect.transforms.MaskField}} SMT always 
> uses the null value for the type of field. It'd be nice to *optionally* be 
> able to specify a literal value for the type, where the SMT would convert the 
> literal string value in the configuration to the desired type (using the new 
> {{Values}} methods).
> Use cases: mask out the IP address, or SSN, or other personally identifiable 
> information (PII).
> Since this changes the API, and thus will require a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7969) Flaky Test DescribeConsumerGroupTest#testDescribeOffsetsOfExistingGroupWithNoMembers

2019-02-27 Thread Stanislav Kozlovski (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779487#comment-16779487
 ] 

Stanislav Kozlovski commented on KAFKA-7969:


No luck still... I managed to find out that we get this error when the broker 
returns an empty map for the `OffsetFetchRequest`.
This is the code that gets executed when we fetch all the offsets - 
[https://github.com/apache/kafka/blob/a8f2307164ce0f1f47c458eee8f54173f7218a16/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L461.]
 I then started digging around when we might remove these `offsets` but it 
seems to be only from expiry (which definitely doesn't happen in this test) and 
`deleteGroup/deletePartition` calls.

> Flaky Test 
> DescribeConsumerGroupTest#testDescribeOffsetsOfExistingGroupWithNoMembers
> 
>
> Key: KAFKA-7969
> URL: https://issues.apache.org/jira/browse/KAFKA-7969
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, clients, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Stanislav Kozlovski
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/24/]
> {quote}java.lang.AssertionError: Expected no active member in describe group 
> results, state: Some(Empty), assignments: Some(List()) at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.admin.DescribeConsumerGroupTest.testDescribeOffsetsOfExistingGroupWithNoMembers(DescribeConsumerGroupTest.scala:278{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

2019-02-27 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779472#comment-16779472
 ] 

Bill Bejeck edited comment on KAFKA-8011 at 2/27/19 4:17 PM:
-

Here's the link to the build 

[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2730/]

 
{noformat}
java.util.ConcurrentModificationException
at 
java.base/java.util.ArrayList.checkForComodification(ArrayList.java:604)
at java.base/java.util.ArrayList.equals(ArrayList.java:563)
at 
org.apache.kafka.streams.integration.RegexSourceIntegrationTest.lambda$testRegexMatchesTopicsAWhenDeleted$3(RegexSourceIntegrationTest.java:213)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:355)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325)
at 
org.apache.kafka.streams.integration.RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted(RegexSourceIntegrationTest.java:213)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

[jira] [Commented] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

2019-02-27 Thread Bill Bejeck (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779472#comment-16779472
 ] 

Bill Bejeck commented on KAFKA-8011:


Here's the link to the build 

[https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/2730/]

 

Stacktrace

 

 
{noformat}
java.util.ConcurrentModificationException
at 
java.base/java.util.ArrayList.checkForComodification(ArrayList.java:604)
at java.base/java.util.ArrayList.equals(ArrayList.java:563)
at 
org.apache.kafka.streams.integration.RegexSourceIntegrationTest.lambda$testRegexMatchesTopicsAWhenDeleted$3(RegexSourceIntegrationTest.java:213)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:355)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:325)
at 
org.apache.kafka.streams.integration.RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted(RegexSourceIntegrationTest.java:213)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 

[jira] [Updated] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8011:
---
Summary: Flaky Test 
RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated  (was: Flaky Test 
RegexSourceIntegrationTest.)

> Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> 
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Critical
>  Labels: flaky-test, newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

2019-02-27 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779448#comment-16779448
 ] 

Matthias J. Sax edited comment on KAFKA-8011 at 2/27/19 3:51 PM:
-

Did this fail on `trunk` ? Please provide link to the build and also copy the 
stack trace into the ticket (the build is deleted eventually).


was (Author: mjsax):
Did this fail on `trunk` ?

> Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> 
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Critical
>  Labels: flaky-test, newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8011:
---
Issue Type: Bug  (was: Improvement)

> Flaky Test RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> 
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Critical
>  Labels: flaky-test, newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest.

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8011:
---
Labels: flaky-test newbie  (was: newbie)

> Flaky Test RegexSourceIntegrationTest.
> --
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Critical
>  Labels: flaky-test, newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest.

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-8011:
---
Priority: Critical  (was: Major)

> Flaky Test RegexSourceIntegrationTest.
> --
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Critical
>  Labels: newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest.

2019-02-27 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779448#comment-16779448
 ] 

Matthias J. Sax commented on KAFKA-8011:


Did this fail on `trunk` ?

> Flaky Test RegexSourceIntegrationTest.
> --
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Critical
>  Labels: newbie
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-02-27 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-7965:
---
Affects Version/s: 2.3.0

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Matthias J. Sax
>Assignee: Stanislav Kozlovski
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7965) Flaky Test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup

2019-02-27 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779445#comment-16779445
 ] 

Matthias J. Sax commented on KAFKA-7965:


Failed again on `trunk`: 
[https://builds.apache.org/blue/organizations/jenkins/kafka-trunk-jdk8/detail/kafka-trunk-jdk8/3420/tests]
{quote}java.lang.AssertionError: Received 0, expected at least 102
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:562)
at 
kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$7(ConsumerBounceTest.scala:383)
at 
kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$7$adapted(ConsumerBounceTest.scala:382)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at 
kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:382){quote}

> Flaky Test 
> ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
> 
>
> Key: KAFKA-7965
> URL: https://issues.apache.org/jira/browse/KAFKA-7965
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Stanislav Kozlovski
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/21/]
> {quote}java.lang.AssertionError: Received 0, expected at least 68 at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.api.ConsumerBounceTest.receiveAndCommit(ConsumerBounceTest.scala:557) 
> at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1(ConsumerBounceTest.scala:320)
>  at 
> kafka.api.ConsumerBounceTest.$anonfun$testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup$1$adapted(ConsumerBounceTest.scala:319)
>  at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) 
> at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) 
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at 
> kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:319){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KAFKA-8011) Flaky Test RegexSourceIntegrationTest.

2019-02-27 Thread Bill Bejeck (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck reassigned KAFKA-8011:
--

Assignee: (was: Bill Bejeck)

> Flaky Test RegexSourceIntegrationTest.
> --
>
> Key: KAFKA-8011
> URL: https://issues.apache.org/jira/browse/KAFKA-8011
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Bill Bejeck
>Priority: Major
>
> The RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenCreated
> and RegexSourceIntegrationTest#testRegexMatchesTopicsAWhenDeleted  tests use 
> an ArrayList to assert the topics assigned to the Streams application. 
> The ConsumerRebalanceListener used in the test operates on this list as does 
> the TestUtils.waitForCondition() to verify the expected topic assignments.
> Using the same list in both places can cause a ConcurrentModficationException 
> if the rebalance listener modifies the assignment at the same time 
> TestUtils.waitForCondition() is using the list to verify the expected topics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7918) Streams store cleanup: inline byte-store generic parameters

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779417#comment-16779417
 ] 

ASF GitHub Bot commented on KAFKA-7918:
---

bbejeck commented on pull request #6327: KAFKA-7918: Inline generic parameters 
Pt. II: RocksDB Bytes Store and Memory LRU Caches
URL: https://github.com/apache/kafka/pull/6327
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Streams store cleanup: inline byte-store generic parameters
> ---
>
> Key: KAFKA-7918
> URL: https://issues.apache.org/jira/browse/KAFKA-7918
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: John Roesler
>Assignee: Sophie Blee-Goldman
>Priority: Major
>
> Currently, the fundamental layer of stores in Streams is the "bytes store".
> The easiest way to identify this is in 
> `org.apache.kafka.streams.state.Stores`, all the `StoreBuilder`s require a 
> `XXBytesStoreSupplier`. 
> We provide several implementations of these bytes stores, typically an 
> in-memory one and a persistent one (aka RocksDB).
> Inside these bytes stores, the key is always `Bytes` and the value is always 
> `byte[]` (serialization happens at a higher level). However, the store 
> implementations are generically typed, just `K` and `V`.
> This is good for flexibility, but it makes the code a little harder to 
> understand. I think that we used to do serialization at a lower level, so the 
> generics are a hold-over from that.
> It would simplify the code if we just inlined the actual k/v types and maybe 
> even renamed the classes from (e.g.) `InMemoryKeyValueStore` to 
> `InMemoryKeyValueBytesStore`, and so forth.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8010) kafka-configs.sh does not allow setting config with an equal in the value

2019-02-27 Thread Mickael Maison (JIRA)
Mickael Maison created KAFKA-8010:
-

 Summary: kafka-configs.sh does not allow setting config with an 
equal in the value
 Key: KAFKA-8010
 URL: https://issues.apache.org/jira/browse/KAFKA-8010
 Project: Kafka
  Issue Type: Bug
  Components: tools
Reporter: Mickael Maison


The sasl.jaas.config typically includes equals in its value. Unfortunately the 
kafka-configs tool does not parse such values correctly and hits an error:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config "sasl.jaas.config=KafkaServer \{\n  
org.apache.kafka.common.security.plain.PlainLoginModule required\n  
username=\"myuser\"\n  password=\"mypassword\";\n};\nClient \{\n  
org.apache.zookeeper.server.auth.DigestLoginModule required\n  
username=\"myuser2\"\n  password=\"mypassword2\;\n};"
requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-02-27 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779305#comment-16779305
 ] 

ASF GitHub Bot commented on KAFKA-7925:
---

rajinisivaram commented on pull request #6337: KAFKA-7925 - Create credentials 
only once for sun.security.jgss.native
URL: https://github.com/apache/kafka/pull/6337
 
 
   When `sun.security.jgss.native=true`, we currently create a new server 
credential for every new client connection and add to the private credential 
set of the server's Subject. This is expensive and can result in an unbounded 
number of private credentials in the Subject used by the broker. The PR creates 
a credential immediately after login so that a single credential can be reused.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> 

[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-02-27 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779307#comment-16779307
 ] 

Rajini Sivaram commented on KAFKA-7925:
---

[~xabhi] Will you be able to checkout the PR 
(https://github.com/apache/kafka/pull/6337) and test it out? Thanks.

> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
> java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
> java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39994 (ESTABLISHED)
> java 144319 kafkagod 2045u IPv4 30012899 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34548 (ESTABLISHED)
> java 144319 kafkagod 2046u sock 0,7 0t0 30003437 protocol: TCP
> java 144319 kafkagod 2047u IPv4 30012980 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38120 (ESTABLISHED)
> java 144319 kafkagod 2048u sock 0,7 0t0 30012546 protocol: TCP
> java 144319 kafkagod 2049u IPv4 30005418 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39686 (CLOSE_WAIT)
> java 144319 kafkagod 2050u IPv4 30009977 0t0 

[jira] [Commented] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-02-27 Thread Rajini Sivaram (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779131#comment-16779131
 ] 

Rajini Sivaram commented on KAFKA-7925:
---

[~xabhi] The three JIRAs (KAFKA-7925, KAFKA-7812 and KAFKA-8008) look like the 
same issue. Can we close two and focus on one?

I dont have a test environment to test with sun.security.jgss.native=true. But 
I will be happy to submit a PR if you can build locally and test it out. We 
can't merge the PR until it has been tested.


> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
> java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
> java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39994 (ESTABLISHED)
> java 144319 kafkagod 2045u IPv4 30012899 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34548 (ESTABLISHED)
> java 144319 kafkagod 2046u sock 0,7 0t0 30003437 protocol: TCP
> java 144319 kafkagod 2047u IPv4 30012980 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38120 (ESTABLISHED)
> java 144319 

[jira] [Commented] (KAFKA-3410) Unclean leader election and "Halting because log truncation is not allowed"

2019-02-27 Thread Asaf Mesika (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779094#comment-16779094
 ] 

Asaf Mesika commented on KAFKA-3410:


Ran into same issue, when disk suddenly became read-only (Amazon SSD). Any 
chance this is on the roadmap?

> Unclean leader election and "Halting because log truncation is not allowed"
> ---
>
> Key: KAFKA-3410
> URL: https://issues.apache.org/jira/browse/KAFKA-3410
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Reporter: James Cheng
>Priority: Major
>  Labels: reliability
>
> I ran into a scenario where one of my brokers would continually shutdown, 
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
> 5. At this point, both brokers are in the ISR. Broker1 is the partition 
> leader.
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets 
> dropped out of ISR. Broker1 is still the leader. I can still write data to 
> the partition.
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement or 
> full hardware replacement)
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed 
> because broker1 is down. At this point, the partition is offline. Can't write 
> to it.
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts 
> to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I am able to recover by setting unclean.leader.election.enable=true on my 
> brokers.
> I'm trying to understand a couple things:
> * In step 10, why is broker1 allowed to resume leadership even though it has 
> no data?
> * In step 10, why is it necessary to stop the entire broker due to one 
> partition that is in this state? Wouldn't it be possible for the broker to 
> continue to serve traffic for all the other topics, and just mark this one as 
> unavailable?
> * Would it make sense to allow an operator to manually specify which broker 
> they want to become the new master? This would give me more control over how 
> much data loss I am willing to handle. In this case, I would want broker2 to 
> become the new master. Or, is that possible and I just don't know how to do 
> it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (KAFKA-7967) Kafka Streams: some values in statestore rollback to old value

2019-02-27 Thread Ziming Dong (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16777571#comment-16777571
 ] 

Ziming Dong edited comment on KAFKA-7967 at 2/27/19 10:07 AM:
--

[~guozhang], since the value rollback to an old value which is several days 
ago(also some value rollback to the value just before the previous record), and 
our data is not small per partition, it should not be an cache problem, also I 
saw statestore's cache is disabled by default. Also, 
[https://github.com/apache/kafka/pull/6191] says that 
[https://github.com/apache/kafka/pull/4331] fixed KV store, but we met this 
issue on both KV store and windowed KV store.

Let's upgrade to [https://github.com/apache/kafka/releases/tag/2.2.0-rc0] to 
see what will happen since both KAFKA-7652 and KAFKA-7672 commits are 
included..(we have some hack code for KAFKA-7672)


was (Author: suiyuan2009):
[~guozhang], since the value rollback to an old value which is several days 
ago(also some value rollback to the value just before the previous record), and 
our data is not small per partition, it should not be an cache problem? Also, 
[https://github.com/apache/kafka/pull/6191] says that 
[https://github.com/apache/kafka/pull/4331] fixed KV store, but we met this 
issue on both KV store and windowed KV store.

Let's upgrade to [https://github.com/apache/kafka/releases/tag/2.2.0-rc0] to 
see what will happen since both KAFKA-7652 and KAFKA-7672 commits are 
included..(we have some hack code for KAFKA-7672)

> Kafka Streams: some values in statestore rollback to old value
> --
>
> Key: KAFKA-7967
> URL: https://issues.apache.org/jira/browse/KAFKA-7967
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 2.1.0
>Reporter: Ziming Dong
>Priority: Critical
>
> We are using kafka streams 2.1.0, we use both persistentKeyValueStore 
> statestore and persistentWindowStore statestore. We found sometimes both 
> types of statestore could `fetch` old values instead of newly updated values. 
> We didn't find any logs except INFO level logs, no instance restart in the 
> period, also there is no rebalance log which indicates it's not a rebalance 
> bug. The bug happened no more than one time each week, but many records were 
> affected each time, and we didn't find a way to reproduce it manually.
> For example, the issue may happen like this, note the changelog contains all 
> the `update`:
>  # got value 1 from key 1
>  # update value 2 to key 1
>  # got value 2 from key 1
>  # update value 3 to key 1
>  # got value 1 from key 1(something wrong!!)
>  # update value 2 to key 1
> there is only one type log as follow
>  
> {code:java}
> 2019-02-19x14:20:00x xx INFO 
> [org.apache.kafka.clients.FetchSessionHandler] 
> [xxx-streams-xx-xxx--xxx-xx-StreamThread-1] [Consumer 
> clientId=x--xxx-xxx--x-StreamThread-1-consumer, 
> groupId=x] Node 2 was unable to process the fetch request with 
> (sessionId=1998942517, epoch=4357): INVALID_FETCH_SESSION_EPOCH.
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-7969) Flaky Test DescribeConsumerGroupTest#testDescribeOffsetsOfExistingGroupWithNoMembers

2019-02-27 Thread Stanislav Kozlovski (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-7969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779085#comment-16779085
 ] 

Stanislav Kozlovski commented on KAFKA-7969:


I still haven't been able to reproduce this well. I ran it for 200-300 times, 
then ran it in combination with some other tests from the suite for a total of 
1h 8m. I will dive a bit more into the code to try and figure out what is 
causing this

> Flaky Test 
> DescribeConsumerGroupTest#testDescribeOffsetsOfExistingGroupWithNoMembers
> 
>
> Key: KAFKA-7969
> URL: https://issues.apache.org/jira/browse/KAFKA-7969
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, clients, unit tests
>Affects Versions: 2.2.0
>Reporter: Matthias J. Sax
>Assignee: Stanislav Kozlovski
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.3.0, 2.2.1
>
>
> To get stable nightly builds for `2.2` release, I create tickets for all 
> observed test failures.
> [https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/24/]
> {quote}java.lang.AssertionError: Expected no active member in describe group 
> results, state: Some(Empty), assignments: Some(List()) at 
> org.junit.Assert.fail(Assert.java:88) at 
> org.junit.Assert.assertTrue(Assert.java:41) at 
> kafka.admin.DescribeConsumerGroupTest.testDescribeOffsetsOfExistingGroupWithNoMembers(DescribeConsumerGroupTest.scala:278{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-6755) MaskField SMT should optionally take a literal value to use instead of using null

2019-02-27 Thread Valeria Vasylieva (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16779049#comment-16779049
 ] 

Valeria Vasylieva commented on KAFKA-6755:
--

Thank you for your help Randall! 
I have created wiki account: 
[https://cwiki.apache.org/confluence/display/~nimfadora]
Once the KIP will be created, I will be happy if you review it.

> MaskField SMT should optionally take a literal value to use instead of using 
> null
> -
>
> Key: KAFKA-6755
> URL: https://issues.apache.org/jira/browse/KAFKA-6755
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Assignee: Valeria Vasylieva
>Priority: Major
>  Labels: needs-kip, newbie
>   Original Estimate: 8h
>  Remaining Estimate: 8h
>
> The existing {{org.apache.kafka.connect.transforms.MaskField}} SMT always 
> uses the null value for the type of field. It'd be nice to *optionally* be 
> able to specify a literal value for the type, where the SMT would convert the 
> literal string value in the configuration to the desired type (using the new 
> {{Values}} methods).
> Use cases: mask out the IP address, or SSN, or other personally identifiable 
> information (PII).
> Since this changes the API, and thus will require a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8009) Uppgrade Jenkins job Gradle version from 4.10.2 to 5.x

2019-02-27 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/KAFKA-8009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dejan Stojadinović updated KAFKA-8009:
--
Description: 
*Rationale:*
 * Kafka project already uses gradle 5.x (5.1.1 at the moment)
 * recently released *spotbugs-gradle-plugin* version 1.6.10 drops support for 
Gradle 4: 
[https://github.com/spotbugs/spotbugs-gradle-plugin/blob/1.6.10/CHANGELOG.md#1610---2019-02-18]
  

*Note:* related github pull request that contains spotbugs plugin version bump 
(among other things): 
[https://github.com/apache/kafka/pull/6332#issuecomment-467631246]

{code}
22:42:44 Setting GRADLE_4_10_2_HOME=/home/jenkins/tools/gradle/4.10.2
22:42:44 [kafka-pr-jdk11-scala2.12] $ /bin/bash -xe 
/tmp/jenkins5613406374589745135.sh
22:42:44 + rm -rf 
/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/.gradle
22:42:44 + /home/jenkins/tools/gradle/4.10.2/bin/gradle
22:42:45 To honour the JVM settings for this build a new JVM will be forked. 
Please consider using the daemon: 
https://docs.gradle.org/4.10.2/userguide/gradle_daemon.html.
22:42:47 Daemon will be stopped at the end of the build stopping after 
processing
22:42:58 
22:42:58 FAILURE: Build failed with an exception.
22:42:58 
22:42:58 * Where:
22:42:58 Build file 
'/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/build.gradle' 
line: 155
22:42:58 
22:42:58 * What went wrong:
22:42:58 A problem occurred evaluating root project 'kafka-pr-jdk11-scala2.12'.
22:42:58 > Failed to apply plugin [id 'com.github.spotbugs']
22:42:58    > Gradle version Gradle 4.10.2 is unsupported. Please use Gradle 
5.0 or later.
{code}

 

 

 

  was:
*Rationale:* 
 * Kafka project already uses gradle 5.x (5.1.1 at the moment)
 * recently released *spotbugs-gradle-plugin* version 1.6.10 drops support for 
Gradle 4: 
[https://github.com/spotbugs/spotbugs-gradle-plugin/blob/1.6.10/CHANGELOG.md#1610---2019-02-18]

 

*Note:* related github pull request that contains spotbugs plugin version bump 
(among other things): 
https://github.com/apache/kafka/pull/6332#issuecomment-467631246


> Uppgrade Jenkins job Gradle version from 4.10.2 to 5.x
> --
>
> Key: KAFKA-8009
> URL: https://issues.apache.org/jira/browse/KAFKA-8009
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Dejan Stojadinović
>Assignee: Ismael Juma
>Priority: Minor
>
> *Rationale:*
>  * Kafka project already uses gradle 5.x (5.1.1 at the moment)
>  * recently released *spotbugs-gradle-plugin* version 1.6.10 drops support 
> for Gradle 4: 
> [https://github.com/spotbugs/spotbugs-gradle-plugin/blob/1.6.10/CHANGELOG.md#1610---2019-02-18]
>   
> *Note:* related github pull request that contains spotbugs plugin version 
> bump (among other things): 
> [https://github.com/apache/kafka/pull/6332#issuecomment-467631246]
> {code}
> 22:42:44 Setting GRADLE_4_10_2_HOME=/home/jenkins/tools/gradle/4.10.2
> 22:42:44 [kafka-pr-jdk11-scala2.12] $ /bin/bash -xe 
> /tmp/jenkins5613406374589745135.sh
> 22:42:44 + rm -rf 
> /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/.gradle
> 22:42:44 + /home/jenkins/tools/gradle/4.10.2/bin/gradle
> 22:42:45 To honour the JVM settings for this build a new JVM will be forked. 
> Please consider using the daemon: 
> https://docs.gradle.org/4.10.2/userguide/gradle_daemon.html.
> 22:42:47 Daemon will be stopped at the end of the build stopping after 
> processing
> 22:42:58 
> 22:42:58 FAILURE: Build failed with an exception.
> 22:42:58 
> 22:42:58 * Where:
> 22:42:58 Build file 
> '/home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/build.gradle' 
> line: 155
> 22:42:58 
> 22:42:58 * What went wrong:
> 22:42:58 A problem occurred evaluating root project 
> 'kafka-pr-jdk11-scala2.12'.
> 22:42:58 > Failed to apply plugin [id 'com.github.spotbugs']
> 22:42:58    > Gradle version Gradle 4.10.2 is unsupported. Please use Gradle 
> 5.0 or later.
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8008) Clients unable to connect and replicas are not able to connect to each other

2019-02-27 Thread Abhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhi updated KAFKA-8008:

Description: 
Hi,

I upgrade to Kafka v2.1.1 recently and seeing the below exceptions in all the 
servers. The kafka-network-thread-1-ListenerName are all consuming full cpu 
cycles. Lots of TCP connections are in CLOSE_WAIT state.

My broker setup is using kerberos authentication with 
-Dsun.security.jgss.native=true.

I am not sure how to handle this? Will increasing the kafka-network thread 
count help if it is possible?

Does this seem like a bug? I am happy to help in anyway I can as this issue 
blocking our production usage and would like to get it resolved as early as 
possible.


*server.log snippet from one of the servers:*
[2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node 2 
with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 
(id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
SEND_APIVERSIONS_REQUEST 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG Creating SaslClient: 
client=kafka/mwkafka-prod-01.nyc.foo@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI]
 (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
RECEIVE_APIVERSIONS_RESPONSE 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Completed connection to node 2. Ready. 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 5 
with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) 
to node 5: java.net.SocketTimeoutException: Failed to connect within 3 ms. 
(org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error in response for fetch request (type=FetchRequest, 
replicaId=1, maxWait=1, minBytes=1, maxBytes=10485760, 
fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, 
maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-newtp-5-64-0=(fetchOffset=8936, 
logStartOffset=8936, maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, maxBytes=1048576, 
currentLeaderEpoch=Optional[19]), reddyvel-tp3-58-0=(fetchOffset=264495, 
logStartOffset=264495, maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, 
currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, 
logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, 
logStartOffset=2, maxBytes=1048576, currentLeaderEpoch=Optional[6]), 
reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-988-0=(fetchOffset=3173185, 
logStartOffset=3173185, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-409-0=(fetchOffset=3173194, 
logStartOffset=3173194, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-104-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), fps.priveq.reins-0=(fetchOffset=12, 
logStartOffset=6, maxBytes=1048576, currentLeaderEpoch=Optional[5]), 
reddyvel-353-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-tp10-63-0=(fetchOffset=220652, 
logStartOffset=220652, maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
reddyvel-newtp-5-86-0=(fetchOffset=8935, logStartOffset=8935, maxBytes=1048576, 

[jira] [Created] (KAFKA-8009) Uppgrade Jenkins job Gradle version from 4.10.2 to 5.x

2019-02-27 Thread JIRA
Dejan Stojadinović created KAFKA-8009:
-

 Summary: Uppgrade Jenkins job Gradle version from 4.10.2 to 5.x
 Key: KAFKA-8009
 URL: https://issues.apache.org/jira/browse/KAFKA-8009
 Project: Kafka
  Issue Type: Improvement
  Components: build
Reporter: Dejan Stojadinović
Assignee: Ismael Juma


*Rationale:* 
 * Kafka project already uses gradle 5.x (5.1.1 at the moment)
 * recently released *spotbugs-gradle-plugin* version 1.6.10 drops support for 
Gradle 4: 
[https://github.com/spotbugs/spotbugs-gradle-plugin/blob/1.6.10/CHANGELOG.md#1610---2019-02-18]

 

*Note:* related github pull request that contains spotbugs plugin version bump 
(among other things): 
https://github.com/apache/kafka/pull/6332#issuecomment-467631246



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-8008) Clients unable to connect and replicas are not able to connect to each other

2019-02-27 Thread Abhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhi updated KAFKA-8008:

Description: 
Hi,

I upgrade to Kafka v2.1.1 recently and seeing the below exceptions in all the 
servers. The kafka-network-thread-1-ListenerName are all consuming full cpu 
cycles. Lots of TCP connections are in CLOSE_WAIT state.

My broker setup is using kerberos authentication with 
-Dsun.security.jgss.native=true.

I am not sure how to handle this? Will increasing the kafka-network thread 
count help if it is possible?

Does this seem like a bug? I am happy to help in anyway I can as this issue 
blocking our production usage and would like to get it resolved as early as 
possible.


*server.log snippet from one of the servers:
*[2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node 2 
with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 
(id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
SEND_APIVERSIONS_REQUEST 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG Creating SaslClient: 
client=kafka/mwkafka-prod-01.nyc.foo@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI]
 (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
RECEIVE_APIVERSIONS_RESPONSE 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Completed connection to node 2. Ready. 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 5 
with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) 
to node 5: java.net.SocketTimeoutException: Failed to connect within 3 ms. 
(org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error in response for fetch request (type=FetchRequest, 
replicaId=1, maxWait=1, minBytes=1, maxBytes=10485760, 
fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, 
maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-newtp-5-64-0=(fetchOffset=8936, 
logStartOffset=8936, maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, maxBytes=1048576, 
currentLeaderEpoch=Optional[19]), reddyvel-tp3-58-0=(fetchOffset=264495, 
logStartOffset=264495, maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, 
currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, 
logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, 
logStartOffset=2, maxBytes=1048576, currentLeaderEpoch=Optional[6]), 
reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-988-0=(fetchOffset=3173185, 
logStartOffset=3173185, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-409-0=(fetchOffset=3173194, 
logStartOffset=3173194, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-104-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), fps.priveq.reins-0=(fetchOffset=12, 
logStartOffset=6, maxBytes=1048576, currentLeaderEpoch=Optional[5]), 
reddyvel-353-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-tp10-63-0=(fetchOffset=220652, 
logStartOffset=220652, maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
reddyvel-newtp-5-86-0=(fetchOffset=8935, logStartOffset=8935, maxBytes=1048576, 

[jira] [Created] (KAFKA-8008) Clients unable to connect and replicas are not able to connect to each other

2019-02-27 Thread Abhi (JIRA)
Abhi created KAFKA-8008:
---

 Summary: Clients unable to connect and replicas are not able to 
connect to each other
 Key: KAFKA-8008
 URL: https://issues.apache.org/jira/browse/KAFKA-8008
 Project: Kafka
  Issue Type: Bug
  Components: controller, core
Affects Versions: 2.1.1, 2.1.0
 Environment: Java 11
Reporter: Abhi


Hi,

I upgrade to Kafka v2.1.1 in hope that issue 
https://issues.apache.org/jira/browse/KAFKA-7925 will be fixed. However, I am 
still seeing the similar issue in my kafka cluster.

I am seeing the same exceptions in all the servers. The 
kafka-network-thread-1-ListenerName are all consuming full cpu cycles. Lots of 
TCP connections are in CLOSE_WAIT state.

My broker setup is using kerberos authentication with 
-Dsun.security.jgss.native=true.

I am not sure how to handle this? Will increasing the kafka-network thread 
count help if it is possible?

Does this seem like a bug? I am happy to help in anyway I can as this issue 
blocking our production usage and would like to get it resolved as early as 
possible.


*server.log snippet from one of the servers:
*[2019-02-27 00:00:02,948] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Built full fetch (sessionId=1488865423, epoch=INITIAL) for node 2 
with 3 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Initiating connection to node mwkafka-prod-02.nyc.foo.com:9092 
(id: 2 rack: null) using address mwkafka-prod-02.nyc.foo.com/10.219.247.26 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
SEND_APIVERSIONS_REQUEST 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG Creating SaslClient: 
client=kafka/mwkafka-prod-01.nyc.foo@unix.foo.com;service=kafka;serviceHostname=mwkafka-prod-02.nyc.foo.com;mechs=[GSSAPI]
 (org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Created socket with SO_RCVBUF = 65536, SO_SNDBUF = 166400, 
SO_TIMEOUT = 0 to node 2 (org.apache.kafka.common.network.Selector)
[2019-02-27 00:00:02,949] DEBUG Set SASL client state to 
RECEIVE_APIVERSIONS_RESPONSE 
(org.apache.kafka.common.security.authenticator.SaslClientAuthenticator)
[2019-02-27 00:00:02,949] DEBUG [ReplicaFetcher replicaId=1, leaderId=2, 
fetcherId=3] Completed connection to node 2. Ready. 
(org.apache.kafka.clients.NetworkClient)
[2019-02-27 00:00:03,007] DEBUG [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=0] Built full fetch (sessionId=2039987243, epoch=INITIAL) for node 5 
with 0 partition(s). (org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] INFO [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error sending fetch request (sessionId=397037945, epoch=INITIAL) 
to node 5: java.net.SocketTimeoutException: Failed to connect within 3 ms. 
(org.apache.kafka.clients.FetchSessionHandler)
[2019-02-27 00:00:03,317] WARN [ReplicaFetcher replicaId=1, leaderId=5, 
fetcherId=1] Error in response for fetch request (type=FetchRequest, 
replicaId=1, maxWait=1, minBytes=1, maxBytes=10485760, 
fetchData={reddyvel-159-0=(fetchOffset=3173198, logStartOffset=3173198, 
maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-331-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-newtp-5-64-0=(fetchOffset=8936, 
logStartOffset=8936, maxBytes=1048576, currentLeaderEpoch=Optional[18]), 
reddyvel-tp9-78-0=(fetchOffset=247943, logStartOffset=247943, maxBytes=1048576, 
currentLeaderEpoch=Optional[19]), reddyvel-tp3-58-0=(fetchOffset=264495, 
logStartOffset=264495, maxBytes=1048576, currentLeaderEpoch=Optional[19]), 
fps.trsy.fe_prvt-0=(fetchOffset=24, logStartOffset=8, maxBytes=1048576, 
currentLeaderEpoch=Optional[3]), reddyvel-7-0=(fetchOffset=3173199, 
logStartOffset=3173199, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-298-0=(fetchOffset=3173197, logStartOffset=3173197, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), fps.guas.peeq.fe_marb_us-0=(fetchOffset=2, 
logStartOffset=2, maxBytes=1048576, currentLeaderEpoch=Optional[6]), 
reddyvel-108-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-988-0=(fetchOffset=3173185, 
logStartOffset=3173185, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-111-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), reddyvel-409-0=(fetchOffset=3173194, 
logStartOffset=3173194, maxBytes=1048576, currentLeaderEpoch=Optional[23]), 
reddyvel-104-0=(fetchOffset=3173198, logStartOffset=3173198, maxBytes=1048576, 
currentLeaderEpoch=Optional[23]), 

[jira] [Updated] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-02-27 Thread Abhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhi updated KAFKA-7925:

Affects Version/s: 2.1.1

> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0, 2.1.1
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
> java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
> java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39994 (ESTABLISHED)
> java 144319 kafkagod 2045u IPv4 30012899 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34548 (ESTABLISHED)
> java 144319 kafkagod 2046u sock 0,7 0t0 30003437 protocol: TCP
> java 144319 kafkagod 2047u IPv4 30012980 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38120 (ESTABLISHED)
> java 144319 kafkagod 2048u sock 0,7 0t0 30012546 protocol: TCP
> java 144319 kafkagod 2049u IPv4 30005418 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39686 (CLOSE_WAIT)
> java 144319 kafkagod 2050u IPv4 30009977 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34552 (ESTABLISHED)
> java 144319 kafkagod 2060u sock 0,7 0t0 30003439 protocol: TCP
> java 

[jira] [Updated] (KAFKA-7925) Constant 100% cpu usage by all kafka brokers

2019-02-27 Thread Abhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhi updated KAFKA-7925:

Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1  (was: Java 11, Kafka 
v2.1.0)

> Constant 100% cpu usage by all kafka brokers
> 
>
> Key: KAFKA-7925
> URL: https://issues.apache.org/jira/browse/KAFKA-7925
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.1.0
> Environment: Java 11, Kafka v2.1.0, Kafka v2.1.1
>Reporter: Abhi
>Priority: Critical
> Attachments: threadump20190212.txt
>
>
> Hi,
> I am seeing constant 100% cpu usage on all brokers in our kafka cluster even 
> without any clients connected to any broker.
> This is a bug that we have seen multiple times in our kafka setup that is not 
> yet open to clients. It is becoming a blocker for our deployment now.
> I am seeing lot of connections to other brokers in CLOSE_WAIT state (see 
> below). In thread usage, I am seeing these threads 
> 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2'
>  taking up more than 90% of the cpu time in a 60s interval.
> I have attached a thread dump of one of the brokers in the cluster.
> *Java version:*
> openjdk 11.0.2 2019-01-15
> OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
> OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
> *Kafka verison:* v2.1.0
>  
> *connections:*
> java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
> java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
> java 144319 kafkagod 104u IPv4 3064219 0t0 TCP 
> mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
> java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
> java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP 
> mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
> java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP 
> mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP 
> mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP 
> mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP 
> mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP 
> mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP 
> mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP 
> mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP 
> mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP 
> mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP 
> mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
> java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP 
> mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP 
> mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP 
> mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP 
> mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
> java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP 
> mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP 
> mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
> java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
> java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
> java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39994 (ESTABLISHED)
> java 144319 kafkagod 2045u IPv4 30012899 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34548 (ESTABLISHED)
> java 144319 kafkagod 2046u sock 0,7 0t0 30003437 protocol: TCP
> java 144319 kafkagod 2047u IPv4 30012980 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38120 (ESTABLISHED)
> java 144319 kafkagod 2048u sock 0,7 0t0 30012546 protocol: TCP
> java 144319 kafkagod 2049u IPv4 30005418 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39686 (CLOSE_WAIT)
> java 144319 kafkagod 2050u IPv4 30009977 0t0 TCP 
> mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34552 (ESTABLISHED)
> java 144319 kafkagod 2060u