[jira] [Created] (KAFKA-8041) Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll

2019-03-04 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8041:
--

 Summary: Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll
 Key: KAFKA-8041
 URL: https://issues.apache.org/jira/browse/KAFKA-8041
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.0.1
Reporter: Matthias J. Sax
 Fix For: 2.0.2


[https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/236/tests]
{quote}java.lang.AssertionError: Expected some messages
at kafka.utils.TestUtils$.fail(TestUtils.scala:357)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:787)
at 
kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:189)
at 
kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63){quote}
STDOUT
{quote}[2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition topic-6 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition topic-0 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition topic-10 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition topic-4 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition topic-8 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, 
fetcherId=0] Error for partition topic-2 at offset 0 
(kafka.server.ReplicaFetcherThread:76)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition.
[2019-03-05 03:45:00,248] ERROR Error while rolling log segment for topic-0 in 
dir 
/home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216
 (kafka.server.LogDirFailureChannel:76)
java.io.FileNotFoundException: 
/home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216/topic-0/.index
 (Not a directory)
at java.io.RandomAccessFile.open0(Native Method)
at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
at java.io.RandomAccessFile.(RandomAccessFile.java:243)
at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:121)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.log.AbstractIndex.resize(AbstractIndex.scala:115)
at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:184)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:184)
at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501)
at kafka.log.Log.$anonfun$roll$8(Log.scala:1520)
at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1520)
at scala.Option.foreach(Option.scala:257)
at kafka.log.Log.$anonfun$roll$2(Log.scala:1520)
at kafka.log.Log.maybeHandleIOException(Log.scala:1881)
at kafka.log.Log.roll(Log.scala:1484)
at 
kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:154)
at 
kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 

Build failed in Jenkins: kafka-2.0-jdk8 #236

2019-03-04 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-8002: Log dir reassignment stalls if future replica has different

--
[...truncated 896.17 KB...]

kafka.zk.ReassignPartitionsZNodeTest > testDecodeValidJson PASSED

kafka.zk.KafkaZkClientTest > testZNodeChangeHandlerForDataChange STARTED

kafka.zk.KafkaZkClientTest > testZNodeChangeHandlerForDataChange PASSED

kafka.zk.KafkaZkClientTest > testCreateAndGetTopicPartitionStatesRaw STARTED

kafka.zk.KafkaZkClientTest > testCreateAndGetTopicPartitionStatesRaw PASSED

kafka.zk.KafkaZkClientTest > testLogDirGetters STARTED

kafka.zk.KafkaZkClientTest > testLogDirGetters PASSED

kafka.zk.KafkaZkClientTest > testSetGetAndDeletePartitionReassignment STARTED

kafka.zk.KafkaZkClientTest > testSetGetAndDeletePartitionReassignment PASSED

kafka.zk.KafkaZkClientTest > testIsrChangeNotificationsDeletion STARTED

kafka.zk.KafkaZkClientTest > testIsrChangeNotificationsDeletion PASSED

kafka.zk.KafkaZkClientTest > testGetDataAndVersion STARTED

kafka.zk.KafkaZkClientTest > testGetDataAndVersion PASSED

kafka.zk.KafkaZkClientTest > testGetChildren STARTED

kafka.zk.KafkaZkClientTest > testGetChildren PASSED

kafka.zk.KafkaZkClientTest > testSetAndGetConsumerOffset STARTED

kafka.zk.KafkaZkClientTest > testSetAndGetConsumerOffset PASSED

kafka.zk.KafkaZkClientTest > testClusterIdMethods STARTED

kafka.zk.KafkaZkClientTest > testClusterIdMethods PASSED

kafka.zk.KafkaZkClientTest > testEntityConfigManagementMethods STARTED

kafka.zk.KafkaZkClientTest > testEntityConfigManagementMethods PASSED

kafka.zk.KafkaZkClientTest > testUpdateLeaderAndIsr STARTED

kafka.zk.KafkaZkClientTest > testUpdateLeaderAndIsr PASSED

kafka.zk.KafkaZkClientTest > testUpdateBrokerInfo STARTED

kafka.zk.KafkaZkClientTest > testUpdateBrokerInfo PASSED

kafka.zk.KafkaZkClientTest > testCreateRecursive STARTED

kafka.zk.KafkaZkClientTest > testCreateRecursive PASSED

kafka.zk.KafkaZkClientTest > testGetConsumerOffsetNoData STARTED

kafka.zk.KafkaZkClientTest > testGetConsumerOffsetNoData PASSED

kafka.zk.KafkaZkClientTest > testDeleteTopicPathMethods STARTED

kafka.zk.KafkaZkClientTest > testDeleteTopicPathMethods PASSED

kafka.zk.KafkaZkClientTest > testSetTopicPartitionStatesRaw STARTED

kafka.zk.KafkaZkClientTest > testSetTopicPartitionStatesRaw PASSED

kafka.zk.KafkaZkClientTest > testAclManagementMethods STARTED

kafka.zk.KafkaZkClientTest > testAclManagementMethods PASSED

kafka.zk.KafkaZkClientTest > testPreferredReplicaElectionMethods STARTED

kafka.zk.KafkaZkClientTest > testPreferredReplicaElectionMethods PASSED

kafka.zk.KafkaZkClientTest > testPropagateLogDir STARTED

kafka.zk.KafkaZkClientTest > testPropagateLogDir PASSED

kafka.zk.KafkaZkClientTest > testGetDataAndStat STARTED

kafka.zk.KafkaZkClientTest > testGetDataAndStat PASSED

kafka.zk.KafkaZkClientTest > testReassignPartitionsInProgress STARTED

kafka.zk.KafkaZkClientTest > testReassignPartitionsInProgress PASSED

kafka.zk.KafkaZkClientTest > testCreateTopLevelPaths STARTED

kafka.zk.KafkaZkClientTest > testCreateTopLevelPaths PASSED

kafka.zk.KafkaZkClientTest > testIsrChangeNotificationGetters STARTED

kafka.zk.KafkaZkClientTest > testIsrChangeNotificationGetters PASSED

kafka.zk.KafkaZkClientTest > testLogDirEventNotificationsDeletion STARTED

kafka.zk.KafkaZkClientTest > testLogDirEventNotificationsDeletion PASSED

kafka.zk.KafkaZkClientTest > testGetLogConfigs STARTED

kafka.zk.KafkaZkClientTest > testGetLogConfigs PASSED

kafka.zk.KafkaZkClientTest > testBrokerSequenceIdMethods STARTED

kafka.zk.KafkaZkClientTest > testBrokerSequenceIdMethods PASSED

kafka.zk.KafkaZkClientTest > testCreateSequentialPersistentPath STARTED

kafka.zk.KafkaZkClientTest > testCreateSequentialPersistentPath PASSED

kafka.zk.KafkaZkClientTest > testConditionalUpdatePath STARTED

kafka.zk.KafkaZkClientTest > testConditionalUpdatePath PASSED

kafka.zk.KafkaZkClientTest > testDeleteTopicZNode STARTED

kafka.zk.KafkaZkClientTest > testDeleteTopicZNode PASSED

kafka.zk.KafkaZkClientTest > testDeletePath STARTED

kafka.zk.KafkaZkClientTest > testDeletePath PASSED

kafka.zk.KafkaZkClientTest > testGetBrokerMethods STARTED

kafka.zk.KafkaZkClientTest > testGetBrokerMethods PASSED

kafka.zk.KafkaZkClientTest > testCreateTokenChangeNotification STARTED

kafka.zk.KafkaZkClientTest > testCreateTokenChangeNotification PASSED

kafka.zk.KafkaZkClientTest > testGetTopicsAndPartitions STARTED

kafka.zk.KafkaZkClientTest > testGetTopicsAndPartitions PASSED

kafka.zk.KafkaZkClientTest > testRegisterBrokerInfo STARTED

kafka.zk.KafkaZkClientTest > testRegisterBrokerInfo PASSED

kafka.zk.KafkaZkClientTest > testConsumerOffsetPath STARTED

kafka.zk.KafkaZkClientTest > testConsumerOffsetPath PASSED

kafka.zk.KafkaZkClientTest > testControllerManagementMethods STARTED

kafka.zk.KafkaZkClientTest > testControllerManagementMethods PASSED

kafka.zk.KafkaZkClientTest > 

Build failed in Jenkins: kafka-trunk-jdk8 #3434

2019-03-04 Thread Apache Jenkins Server
See 


Changes:

[rhauch] improve some logging statements (#6078)

[rhauch] KAFKA-7880:Naming worker thread by task id (#6275)

--
[...truncated 2.32 MB...]

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestampWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullExpectedRecordForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValueTimestamp 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldNotAllowNullProducerRecordWithExpectedRecordForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueAndTimestampIsEqualForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueIsEqualWithNullForCompareValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualWithNullForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullForCompareValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfKeyIsDifferentWithNullReversForCompareKeyValue PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfKeyAndValueIsEqualForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueWithProducerRecord PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 

Build failed in Jenkins: kafka-2.2-jdk8 #43

2019-03-04 Thread Apache Jenkins Server
See 


Changes:

[manikumar] KAFKA-7312: Change broker port used in testMinimumRequestTimeouts 
and

--
[...truncated 2.73 MB...]

kafka.coordinator.group.GroupCoordinatorTest > testValidJoinGroup STARTED

kafka.coordinator.group.GroupCoordinatorTest > testValidJoinGroup PASSED

kafka.coordinator.group.GroupCoordinatorTest > 
shouldDelayRebalanceUptoRebalanceTimeout STARTED

kafka.coordinator.group.GroupCoordinatorTest > 
shouldDelayRebalanceUptoRebalanceTimeout PASSED

kafka.coordinator.group.GroupCoordinatorTest > testFetchOffsets STARTED

kafka.coordinator.group.GroupCoordinatorTest > testFetchOffsets PASSED

kafka.coordinator.group.GroupCoordinatorTest > 
testSessionTimeoutDuringRebalance STARTED

kafka.coordinator.group.GroupCoordinatorTest > 
testSessionTimeoutDuringRebalance PASSED

kafka.coordinator.group.GroupCoordinatorTest > testNewMemberJoinExpiration 
STARTED

kafka.coordinator.group.GroupCoordinatorTest > testNewMemberJoinExpiration 
PASSED

kafka.coordinator.group.GroupCoordinatorTest > testFetchTxnOffsetsWithAbort 
STARTED

kafka.coordinator.group.GroupCoordinatorTest > testFetchTxnOffsetsWithAbort 
PASSED

kafka.coordinator.group.GroupCoordinatorTest > testSyncGroupLeaderAfterFollower 
STARTED

kafka.coordinator.group.GroupCoordinatorTest > testSyncGroupLeaderAfterFollower 
PASSED

kafka.coordinator.group.GroupCoordinatorTest > testSyncGroupFromUnknownMember 
STARTED

kafka.coordinator.group.GroupCoordinatorTest > testSyncGroupFromUnknownMember 
PASSED

kafka.coordinator.group.GroupCoordinatorTest > testValidLeaveGroup STARTED

kafka.coordinator.group.GroupCoordinatorTest > testValidLeaveGroup PASSED

kafka.coordinator.group.GroupCoordinatorTest > testDescribeGroupInactiveGroup 
STARTED

kafka.coordinator.group.GroupCoordinatorTest > testDescribeGroupInactiveGroup 
PASSED

kafka.coordinator.group.GroupCoordinatorTest > 
testFetchTxnOffsetsIgnoreSpuriousCommit STARTED

kafka.coordinator.group.GroupCoordinatorTest > 
testFetchTxnOffsetsIgnoreSpuriousCommit PASSED

kafka.coordinator.group.GroupCoordinatorTest > testPendingMembersLeavesGroup 
STARTED

kafka.coordinator.group.GroupCoordinatorTest > testPendingMembersLeavesGroup 
PASSED

kafka.coordinator.group.GroupCoordinatorTest > testSyncGroupNotCoordinator 
STARTED

kafka.coordinator.group.GroupCoordinatorTest > testSyncGroupNotCoordinator 
PASSED

kafka.coordinator.group.GroupCoordinatorTest > testBasicFetchTxnOffsets STARTED

kafka.coordinator.group.GroupCoordinatorTest > testBasicFetchTxnOffsets PASSED

kafka.coordinator.group.GroupCoordinatorTest > 
shouldResetRebalanceDelayWhenNewMemberJoinsGroupInInitialRebalance STARTED

kafka.coordinator.group.GroupCoordinatorTest > 
shouldResetRebalanceDelayWhenNewMemberJoinsGroupInInitialRebalance PASSED

kafka.coordinator.group.GroupCoordinatorTest > 
testHeartbeatUnknownConsumerExistingGroup STARTED

kafka.coordinator.group.GroupCoordinatorTest > 
testHeartbeatUnknownConsumerExistingGroup PASSED

kafka.coordinator.group.GroupCoordinatorTest > testValidHeartbeat STARTED

kafka.coordinator.group.GroupCoordinatorTest > testValidHeartbeat PASSED

kafka.coordinator.group.GroupCoordinatorTest > 
testRequestHandlingWhileLoadingInProgress STARTED

kafka.coordinator.group.GroupCoordinatorTest > 
testRequestHandlingWhileLoadingInProgress PASSED

kafka.network.SocketServerTest > testGracefulClose STARTED

kafka.network.SocketServerTest > testGracefulClose PASSED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone STARTED

kafka.network.SocketServerTest > 
testSendActionResponseWithThrottledChannelWhereThrottlingAlreadyDone PASSED

kafka.network.SocketServerTest > controlThrowable STARTED

kafka.network.SocketServerTest > controlThrowable PASSED

kafka.network.SocketServerTest > testRequestMetricsAfterStop STARTED

kafka.network.SocketServerTest > testRequestMetricsAfterStop PASSED

kafka.network.SocketServerTest > testConnectionIdReuse STARTED

kafka.network.SocketServerTest > testConnectionIdReuse PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testProcessorMetricsTags STARTED

kafka.network.SocketServerTest > testProcessorMetricsTags PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > testConnectionId STARTED

kafka.network.SocketServerTest > testConnectionId PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics STARTED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > testNoOpAction STARTED

kafka.network.SocketServerTest > 

[jira] [Created] (KAFKA-8040) Streams needs to retry initTransactions

2019-03-04 Thread John Roesler (JIRA)
John Roesler created KAFKA-8040:
---

 Summary: Streams needs to retry initTransactions
 Key: KAFKA-8040
 URL: https://issues.apache.org/jira/browse/KAFKA-8040
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 2.0.0, 2.0.1, 2.1.0, 2.2.0, 2.1.1
Reporter: John Roesler
Assignee: John Roesler
 Fix For: 2.3.0


Following on KAFKA-7763, Streams needs to handle the new behavior.

See also [https://github.com/apache/kafka/pull/6066]

Streams code (StreamTask.java) needs to be modified to handle the new exception.

Also, from another upstream change, `initTxn` can also throw TimeoutException 
now: default `MAX_BLOCK_MS_CONFIG` in producer is 60 seconds, so I think just 
wrapping it as StreamsException should be reasonable, similar to what we do for 
`producer#send`'s TimeoutException 
([https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/processor/internals/RecordCollectorImpl.java#L220-L225]
 ).

 

Note we need to handle in three functions: init/commit/abortTxn.

 

See also https://github.com/apache/kafka/pull/6066#issuecomment-464403448



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Build failed in Jenkins: kafka-1.1-jdk7 #251

2019-03-04 Thread Apache Jenkins Server
See 


Changes:

[jason] KAFKA-8002: Log dir reassignment stalls if future replica has different

--
[...truncated 1.94 MB...]
org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldMergeSessions STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldMergeSessions PASSED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldHandleMultipleSessionsAndMerging STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldHandleMultipleSessionsAndMerging PASSED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldImmediatelyForwardNewSessionWhenNonCachedStore STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldImmediatelyForwardNewSessionWhenNonCachedStore PASSED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldGetAggregatedValuesFromValueGetter STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldGetAggregatedValuesFromValueGetter PASSED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldImmediatelyForwardRemovedSessionsWhenMerging STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldImmediatelyForwardRemovedSessionsWhenMerging PASSED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldUpdateSessionIfTheSameTime STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldUpdateSessionIfTheSameTime PASSED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldHaveMultipleSessionsForSameIdWhenTimestampApartBySessionGap STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldHaveMultipleSessionsForSameIdWhenTimestampApartBySessionGap PASSED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldCreateSingleSessionWhenWithinGap STARTED

org.apache.kafka.streams.kstream.internals.KStreamSessionWindowAggregateProcessorTest
 > shouldCreateSingleSessionWhenWithinGap PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddRegexTopicToLatestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddRegexTopicToLatestAutoOffsetResetList PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldMapStateStoresToCorrectSourceTopics STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldMapStateStoresToCorrectSourceTopics PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddTableToEarliestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddTableToEarliestAutoOffsetResetList PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldHaveNullTimestampExtractorWhenNoneSupplied STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldHaveNullTimestampExtractorWhenNoneSupplied PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldBuildGlobalTopologyWithAllGlobalTables STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldBuildGlobalTopologyWithAllGlobalTables PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddRegexTopicToEarliestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddRegexTopicToEarliestAutoOffsetResetList PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddTopicToEarliestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddTopicToEarliestAutoOffsetResetList PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldHaveCorrectSourceTopicsForTableFromMergedStream STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldHaveCorrectSourceTopicsForTableFromMergedStream PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
testNewStoreName STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
testNewStoreName PASSED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddTopicToLatestAutoOffsetResetList STARTED

org.apache.kafka.streams.kstream.internals.InternalStreamsBuilderTest > 
shouldAddTopicToLatestAutoOffsetResetList PASSED


Guava version upgrade

2019-03-04 Thread JIAHAO ZHOU
Hello,
when downloading Kafka 2.1.1, the  kafka_2.12-2.1.1.tgz still contains
guava-20.0.jar. This guava version currently has a vulnerability
described here: https://github.com/google/guava/wiki/CVE-2018-10237
The version 24.1.1 and 25.0+ are fixed version.
Are there any plans to upgrade this dependency?

Regards
Jiahao Zhou


Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2019-03-04 Thread John Roesler
Hi all,

I don't have too much time to respond at the moment, but I don't have much
to say anyway. I just wanted to respond to your requests for more opinions.

The point Matthias brought up is a good one. I think this was similar to
Jan's concern before about "swallowing" some intermediate states. Adam's
example of triggering logic based on the state transitions of the KTables
is a plausibe (if advanced and exotic) case where it would matter. For all
"pure" relational-algebra-type use cases, I *think* the proposed,
eventually-correct, solution would be fine.

I think both the "swallowing intermediate results" and the "duplicate final
results" problems are probably possible to solve, but my instinct says that
it would be costly, and what we have proposed right now probably satisfies
the majority use case. If it turns out we're wrong about this, then it
should be possible to fix the semantics in place, without messing with the
API.

Thanks, everyone,
-John

On Mon, Mar 4, 2019 at 12:23 PM Matthias J. Sax 
wrote:

> Thanks Adam,
>
> *> Q)
> Range scans work with caching enabled, too. Thus, there is no
> functional/correctness requirement to disable caching. I cannot remember
> why Jan's proposal added this? It might be an implementation detail
> though (maybe just remove it from the KIP? -- might be miss leading).
>
>
> *> (b)
> For stream.flatMap()... There is only one input record for the LHS. This
> single input record, may produce multiple updates to the KTable --
> however, those update records are not written into a topic (before the
> update) and thus they don't have their own offsets (if you call
> context.offset() for them, you will get the offset of the original input
> record, ie, the same offset for all the time). Of course, on write into
> the first repartition topic they will get their own offsets. However,
> this information seems not to be useful, because this offsets are not
> related to the input topic offset and might be from different
> partitions. Thus, I don't see how reordering could be resolves using
> them? However, I don't think we need offsets anyway to resolve ordering
> issues, thus, it does not really matter. (I added this just as a side
> remark in my response.)
>
>
> *c-P1)*
> *c-P2: Duplicates)*
> I see your point here and agree that it might be difficult to send _all_
> result updates if the RHS processors have different load and one might
> lag. However, "skipping" some intermediate updates seems to be something
> different that "duplicating" an update? However, I also agree that
> sending the payload to the RHS might be cost prohibitive. For your
> filter example, it seems that calling `KTable.toStream().filter()` would
> be incorrect, but that it should be `KTable.filter()`. For the first
> scenario I agree that the output record would never make it into the
> result, however, for the second case, update semantics are preserved and
> thus the result would still be correct (the output record won't be in
> the result KTable, either, but this would be correct! -- note that
> calling `KTable.toStream()` changes the semantics(!)).
>
> Additionally, as you mentioned ("Monkey Wrench"), there are still other
> scenario for which intermediate update might get swallowed. This seems
> to be a general "issue" in the current design. Maybe that was the main
> criticism from Jan (and it is a quite valid point). However, which
> caching enabled, swallowing updates is intended behavior anyway, so I
> personally don't think this is a big concern atm. (This might go back to
> Jan's proposal to disable caching to avoid swallowing any updates?)
>
> -> would be good to get input from others about the "swallowing" issue
>
> Thus, I agree with your conclusion that sending the payload to RHS does
> not resolve this issue. However, I still believe that the current design
> ensures that we don't overwrite the correct final result with a stale
> intermediate result. The FK check before the result data is emitted
> seems to be sufficient for that (I don't think we need timestamp or
> offsets to resolve this). The FK check has the only issue, that it may
> produce "duplicate" updates. Or did I miss anything?
>
> -> would be good to get input from others about this "duplicate update"
> issue
>
>
>
> -Matthias
>
>
>
>
> On 3/4/19 8:29 AM, Adam Bellemare wrote:
> > Hi Matthias
> >
> > Thank you for the feedback! I appreciate your well thought-out
> questions. I
> > have tried to answer and comment on everything that I know below.
> >
> >
> >
> > *> Q) For the materialized combined-key store, why do we need to disable>
> > caching? And why do we need to flush the store?*
> > This is an artifact from Jan's implementation that I have carried along.
> My
> > understanding (though possibly erroneous!) is that RocksDB prefix scan
> > doesn't work with the cache, and ignores any data stored within it. I
> have
> > tried to validate this but I have not succeeded, so I believe that this
> > will need more 

[jira] [Resolved] (KAFKA-8002) Replica reassignment to new log dir may not complete if future and current replicas segment files have different base offsets

2019-03-04 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-8002.

   Resolution: Fixed
Fix Version/s: 2.1.2
   2.0.2
   2.2.0
   1.1.2

> Replica reassignment to new log dir may not complete if future and current 
> replicas segment files have different base offsets
> -
>
> Key: KAFKA-8002
> URL: https://issues.apache.org/jira/browse/KAFKA-8002
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1.0, 1.1.1, 2.0.0, 2.0.1, 2.1.0, 2.1.1
>Reporter: Anna Povzner
>Assignee: Bob Barrett
>Priority: Critical
> Fix For: 1.1.2, 2.2.0, 2.0.2, 2.1.2
>
>
> Once future replica fetches log end offset, the intended logic is to finish 
> the move (and rename the future dir to current replica dir, etc). However, 
> the check in Partition.maybeReplaceCurrentWithFutureReplica compares  the 
> whole LogOffsetMetadata vs. log end offset. The resulting behavior is that 
> the re-assignment will not finish for topic partitions that were cleaned/ 
> compacted such that base offset of the last segment is different for the 
> current and future replica. 
> The proposed fix is to compare only log end offsets of the current and future 
> replica.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8039) Flaky Test SaslAuthenticatorTest#testCannotReauthenticateAgainFasterThanOneSecond

2019-03-04 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8039:
--

 Summary: Flaky Test 
SaslAuthenticatorTest#testCannotReauthenticateAgainFasterThanOneSecond
 Key: KAFKA-8039
 URL: https://issues.apache.org/jira/browse/KAFKA-8039
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.2.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0, 2.2.1


[https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/46/testReport/junit/org.apache.kafka.common.security.authenticator/SaslAuthenticatorTest/testCannotReauthenticateAgainFasterThanOneSecond/]
{quote}java.lang.AssertionError: Should have received the SaslHandshakeRequest 
bytes back since we re-authenticated too quickly, but instead we got our 
generated message echoed back, implying re-auth succeeded when it should not 
have at org.junit.Assert.fail(Assert.java:88) at 
org.junit.Assert.assertTrue(Assert.java:41) at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.testCannotReauthenticateAgainFasterThanOneSecond(SaslAuthenticatorTest.java:1503){quote}
STDOUT
{quote}[2019-03-04 19:33:46,222] ERROR Extensions provided in login context 
without a token 
(org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule:318) 
java.io.IOException: Extensions provided in login context without a token at 
org.apache.kafka.common.security.oauthbearer.internals.unsecured.OAuthBearerUnsecuredLoginCallbackHandler.handle(OAuthBearerUnsecuredLoginCallbackHandler.java:164)
 at 
org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule.identifyToken(OAuthBearerLoginModule.java:316)
 at 
org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule.login(OAuthBearerLoginModule.java:301)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at 
javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at 
javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at 
javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at 
javax.security.auth.login.LoginContext.login(LoginContext.java:587) at 
org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
 at 
org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
 at 
org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
 at 
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:149)
 at 
org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
 at 
org.apache.kafka.common.network.ChannelBuilders.serverChannelBuilder(ChannelBuilders.java:85)
 at 
org.apache.kafka.common.network.NioEchoServer.(NioEchoServer.java:120) at 
org.apache.kafka.common.network.NioEchoServer.(NioEchoServer.java:96) at 
org.apache.kafka.common.network.NetworkTestUtils.createEchoServer(NetworkTestUtils.java:49)
 at 
org.apache.kafka.common.network.NetworkTestUtils.createEchoServer(NetworkTestUtils.java:43)
 at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.createEchoServer(SaslAuthenticatorTest.java:1842)
 at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.createEchoServer(SaslAuthenticatorTest.java:1838)
 at 
org.apache.kafka.common.security.authenticator.SaslAuthenticatorTest.testValidSaslOauthBearerMechanismWithoutServerTokens(SaslAuthenticatorTest.java:1578)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
 at 

[jira] [Created] (KAFKA-8038) Flaky Test SslTransportLayerTest#testCloseSsl

2019-03-04 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8038:
--

 Summary: Flaky Test SslTransportLayerTest#testCloseSsl
 Key: KAFKA-8038
 URL: https://issues.apache.org/jira/browse/KAFKA-8038
 Project: Kafka
  Issue Type: Bug
  Components: core, unit tests
Affects Versions: 2.2.0, 2.3.0
Reporter: Matthias J. Sax
 Fix For: 2.3.0, 2.2.1


[https://jenkins.confluent.io/job/apache-kafka-test/job/2.2/46/testReport/junit/org.apache.kafka.common.network/SslTransportLayerTest/testCloseSsl/]
{quote}java.lang.IllegalStateException: handshake is not completed at 
org.apache.kafka.common.network.SslTransportLayer.removeInterestOps(SslTransportLayer.java:770)
 at org.apache.kafka.common.network.KafkaChannel.mute(KafkaChannel.java:246) at 
org.apache.kafka.common.network.Selector.mute(Selector.java:685) at 
org.apache.kafka.common.network.Selector.muteAll(Selector.java:705) at 
org.apache.kafka.common.network.SslTransportLayerTest.testClose(SslTransportLayerTest.java:866)
 at 
org.apache.kafka.common.network.SslTransportLayerTest.testCloseSsl(SslTransportLayerTest.java:846){quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8037) KTable restore may load bad data

2019-03-04 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created KAFKA-8037:
--

 Summary: KTable restore may load bad data
 Key: KAFKA-8037
 URL: https://issues.apache.org/jira/browse/KAFKA-8037
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Matthias J. Sax


If an input topic contains bad data, users can specify a 
`deserialization.exception.handler` to drop corrupted records on read. However, 
this mechanism may be by-passed on restore. Assume a `builder.table()` call 
reads and drops a corrupted record. If the table state is lost and restored 
from the changelog topic, the corrupted record may be copied into the store, 
because on restore plain bytes are copied.

If the KTable is used in a join, an internal `store.get()` call to lookup the 
record would fail with a deserialization exception if the value part cannot be 
deserialized.

GlobalKTables are affected, too (cf. KAFKA-7663 that may allow a fix for 
GlobalKTable case). It's unclear to me atm, how this issue could be addressed 
for KTables though.

Note, that user state stores are not affected, because they always have a 
dedicated changelog topic (and don't reuse an input topic) and thus the 
corrupted record would not be written into the changelog.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7651) Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts

2019-03-04 Thread Manikumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-7651.
--
   Resolution: Fixed
Fix Version/s: 2.2.1
   2.1.2
   2.0.2

This should have been fixed in [https://github.com/apache/kafka/pull/6360]

> Flaky test SaslSslAdminClientIntegrationTest.testMinimumRequestTimeouts
> ---
>
> Key: KAFKA-7651
> URL: https://issues.apache.org/jira/browse/KAFKA-7651
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core, unit tests
>Affects Versions: 2.3.0
>Reporter: Dong Lin
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.0.2, 2.3.0, 2.1.2, 2.2.1
>
>
> Here is stacktrace from 
> https://builds.apache.org/job/kafka-2.1-jdk8/51/testReport/junit/kafka.api/SaslSslAdminClientIntegrationTest/testMinimumRequestTimeouts/
> {code}
> Error Message
> java.lang.AssertionError: Expected an exception of type 
> org.apache.kafka.common.errors.TimeoutException; got type 
> org.apache.kafka.common.errors.SslAuthenticationException
> Stacktrace
> java.lang.AssertionError: Expected an exception of type 
> org.apache.kafka.common.errors.TimeoutException; got type 
> org.apache.kafka.common.errors.SslAuthenticationException
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at 
> kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1404)
>   at 
> kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-7312) Transient failure in kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts

2019-03-04 Thread Manikumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar resolved KAFKA-7312.
--
   Resolution: Fixed
Fix Version/s: (was: 2.3.0)
   2.1.2
   2.0.2
   2.2.0

Issue resolved by pull request 6360
[https://github.com/apache/kafka/pull/6360]

> Transient failure in 
> kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts
> 
>
> Key: KAFKA-7312
> URL: https://issues.apache.org/jira/browse/KAFKA-7312
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, unit tests
>Affects Versions: 2.3.0
>Reporter: Guozhang Wang
>Priority: Critical
>  Labels: flaky-test
> Fix For: 2.2.0, 2.0.2, 2.1.2
>
>
> {code}
> Error Message
> org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
> Stacktrace
> org.junit.runners.model.TestTimedOutException: test timed out after 12 
> milliseconds
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:502)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:92)
>   at 
> org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:262)
>   at 
> kafka.utils.TestUtils$.assertFutureExceptionTypeEquals(TestUtils.scala:1345)
>   at 
> kafka.api.AdminClientIntegrationTest.testMinimumRequestTimeouts(AdminClientIntegrationTest.scala:1080)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2019-03-04 Thread Matthias J. Sax
Thanks Adam,

*> Q)
Range scans work with caching enabled, too. Thus, there is no
functional/correctness requirement to disable caching. I cannot remember
why Jan's proposal added this? It might be an implementation detail
though (maybe just remove it from the KIP? -- might be miss leading).


*> (b)
For stream.flatMap()... There is only one input record for the LHS. This
single input record, may produce multiple updates to the KTable --
however, those update records are not written into a topic (before the
update) and thus they don't have their own offsets (if you call
context.offset() for them, you will get the offset of the original input
record, ie, the same offset for all the time). Of course, on write into
the first repartition topic they will get their own offsets. However,
this information seems not to be useful, because this offsets are not
related to the input topic offset and might be from different
partitions. Thus, I don't see how reordering could be resolves using
them? However, I don't think we need offsets anyway to resolve ordering
issues, thus, it does not really matter. (I added this just as a side
remark in my response.)


*c-P1)*
*c-P2: Duplicates)*
I see your point here and agree that it might be difficult to send _all_
result updates if the RHS processors have different load and one might
lag. However, "skipping" some intermediate updates seems to be something
different that "duplicating" an update? However, I also agree that
sending the payload to the RHS might be cost prohibitive. For your
filter example, it seems that calling `KTable.toStream().filter()` would
be incorrect, but that it should be `KTable.filter()`. For the first
scenario I agree that the output record would never make it into the
result, however, for the second case, update semantics are preserved and
thus the result would still be correct (the output record won't be in
the result KTable, either, but this would be correct! -- note that
calling `KTable.toStream()` changes the semantics(!)).

Additionally, as you mentioned ("Monkey Wrench"), there are still other
scenario for which intermediate update might get swallowed. This seems
to be a general "issue" in the current design. Maybe that was the main
criticism from Jan (and it is a quite valid point). However, which
caching enabled, swallowing updates is intended behavior anyway, so I
personally don't think this is a big concern atm. (This might go back to
Jan's proposal to disable caching to avoid swallowing any updates?)

-> would be good to get input from others about the "swallowing" issue

Thus, I agree with your conclusion that sending the payload to RHS does
not resolve this issue. However, I still believe that the current design
ensures that we don't overwrite the correct final result with a stale
intermediate result. The FK check before the result data is emitted
seems to be sufficient for that (I don't think we need timestamp or
offsets to resolve this). The FK check has the only issue, that it may
produce "duplicate" updates. Or did I miss anything?

-> would be good to get input from others about this "duplicate update"
issue



-Matthias




On 3/4/19 8:29 AM, Adam Bellemare wrote:
> Hi Matthias
> 
> Thank you for the feedback! I appreciate your well thought-out questions. I
> have tried to answer and comment on everything that I know below.
> 
> 
> 
> *> Q) For the materialized combined-key store, why do we need to disable>
> caching? And why do we need to flush the store?*
> This is an artifact from Jan's implementation that I have carried along. My
> understanding (though possibly erroneous!) is that RocksDB prefix scan
> doesn't work with the cache, and ignores any data stored within it. I have
> tried to validate this but I have not succeeded, so I believe that this
> will need more investigation and testing. I will dig deeper on this and get
> back to you.
> 
> 
> 
> *> a) Thus, I am wondering why we would need to send the `null` message
> back> (from RHS to LHS) in the first place?*
> 
> We don't need to, if we follow your subsequent tombstone suggestion.
> 
> 
> 
> 
> 
> *> (b) About using "offsets" to resolve ordering issue: I don't think this>
> would work. The join input table would be created as>
> stream.flatMapValues().groupByKey().aggregate()*
> Hmmm... I am a bit fuzzy on this part. Wouldn't the LHS processor be able
> to get the highest offset and propagate that onwards to the RHS processor?
> In my original design I had a wrapper that kept track of the input offset,
> though I suspect it did not work for the above aggregation scenario.
> 
> *c-P1)*
> Regarding the consistency examples, everything you wrote is correct as far
> as I can tell in how the proposed system would behave. Rapid updates to the
> LHS will result in some of the results being discarded (in the case of
> deletes or change of FK) or doubly-produced (discussed below, after the
> following example).
> 
> It does not seem to me to be possible to avoid quashing records 

Re: [DISCUSS] KIP-434: Add Replica Fetcher and Log Cleaner Count Metrics

2019-03-04 Thread Stanislav Kozlovski
Hey Viktor,

> however displaying the thread count (be it alive or dead) would still add
extra information regarding the failure, that a thread died during cleanup.

I agree, I think it's worth adding.


> Doing this on the replica fetchers though would be a bit harder as the
number of replica fetchers is the (brokers-to-fetch-from *
fetchers-per-broker) and we don't really maintain the capacity information
or any kind of cluster information and I'm not sure we should.

Perhaps we could split the metric per broker that is being fetched from?
i.e each replica fetcher would have a `dead-fetcher-threads` metric that
has the broker-id it's fetching from as a tag?
It would solve an observability question which I think is very important -
are we replicating from this broker at all?
On the other hand, this could potentially produce a lot of metric data with
a big cluster, so that is definitely something to consider as well.

All in all, I think this is a great KIP and very much needed in my opinion.
I can't wait to see this roll out

Best,
Stanislav

On Mon, Feb 25, 2019 at 10:29 AM Viktor Somogyi-Vass <
viktorsomo...@gmail.com> wrote:

> Hi Stanislav,
>
> Thanks for the feedback and sharing that discussion thread.
>
> I read your KIP and the discussion on it too and it seems like that'd cover
> the same motivation I had with the log-cleaner-thread-count metric. This
> supposed to tell the count of the alive threads which might differ from the
> config (I could've used a better name :) ). Now I'm thinking that using
> uncleanable-bytes, uncleanable-partition-count together with
> time-since-last-run would mostly cover the motivation I have in this KIP,
> however displaying the thread count (be it alive or dead) would still add
> extra information regarding the failure, that a thread died during cleanup.
>
> You had a very good idea about instead of the alive threads, display the
> dead ones! That way we wouldn't need log-cleaner-current-live-thread-rate
> just a "dead-log-cleaner-thread-count" as it it would make easy to trigger
> warnings based on that (if it's even > 0 then we can say there's a
> potential problem).
> Doing this on the replica fetchers though would be a bit harder as the
> number of replica fetchers is the (brokers-to-fetch-from *
> fetchers-per-broker) and we don't really maintain the capacity information
> or any kind of cluster information and I'm not sure we should. It would add
> too much responsibility to the class and wouldn't be a rock-solid solution
> but I guess I have to look into it more.
>
> I don't think that restarting the cleaner threads would be helpful as the
> problems I've seen mostly are non-recoverable and requires manual user
> intervention and partly I agree what Colin said on the KIP-346 discussion
> thread about the problems experienced with HDFS.
>
> Best,
> Viktor
>
>
> On Fri, Feb 22, 2019 at 5:03 PM Stanislav Kozlovski <
> stanis...@confluent.io>
> wrote:
>
> > Hey Viktor,
> >
> > First off, thanks for the KIP! I think that it is almost always a good
> idea
> > to have more metrics. Observability never hurts.
> >
> > In regards to the LogCleaner:
> > * Do we need to know log-cleaner-thread-count? That should always be
> equal
> > to "log.cleaner.threads" if I'm not mistaken.
> > * log-cleaner-current-live-thread-rate -  We already have the
> > "time-since-last-run-ms" metric which can let you know if something is
> > wrong with the log cleaning
> > As you said, we would like to have these two new metrics in order to
> > understand when a partial failure has happened - e.g only 1/3 log cleaner
> > threads are alive. I'm wondering if it may make more sense to either:
> > a) restart the threads when they die
> > b) add a metric which shows the dead thread count. You should probably
> > always have a low-level alert in the case that any threads have died
> >
> > We had discussed a similar topic about thread revival and metrics in
> > KIP-346. Have you had a chance to look over that discussion? Here is the
> > mailing discussion for that -
> >
> >
> http://mail-archives.apache.org/mod_mbox/kafka-dev/201807.mbox/%3ccanzzngyr_22go9swl67hedcm90xhvpyfzy_tezhz1mrizqk...@mail.gmail.com%3E
> >
> > Best,
> > Stanislav
> >
> >
> >
> > On Fri, Feb 22, 2019 at 11:18 AM Viktor Somogyi-Vass <
> > viktorsomo...@gmail.com> wrote:
> >
> > > Hi All,
> > >
> > > I'd like to start a discussion about exposing count gauge metrics for
> the
> > > replica fetcher and log cleaner thread counts. It isn't a long KIP and
> > the
> > > motivation is very simple: monitoring the thread counts in these cases
> > > would help with the investigation of various issues and might help in
> > > preventing more serious issues when a broker is in a bad state. Such a
> > > scenario that we seen with users is that their disk fills up as the log
> > > cleaner died for some reason and couldn't recover (like log
> corruption).
> > In
> > > this case an early warning would help in the root cause analysis
> 

Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka Streams

2019-03-04 Thread Boyang Chen
Thanks Guozhang for the great questions. Answers are inlined:

1. I'm still not sure if it's worthwhile to add a new type of "learner
task" in addition to "standby task": if the only difference is that for the
latter, we would consider workload balance while for the former we would
not, I think we can just adjust the logic of StickyTaskAssignor a bit to
break that difference. Adding a new type of task would be adding a lot of
code complexity, so if we can still piggy-back the logic on a standby-task
I would prefer to do so.
In the proposal we stated that we are not adding a new type of task 
implementation. The
learner task shall share the same implementation with normal standby task, only 
that we
shall tag the standby task with learner and prioritize the learner tasks replay 
effort.
2. One thing that's still not clear from the KIP wiki itself is which layer
would the logic be implemented at. Although for most KIPs we would not
require internal implementation details but only public facing API updates,
for a KIP like this I think it still requires to flesh out details on the
implementation design. More specifically: today Streams embed a full
fledged Consumer client, which hard-code a ConsumerCoordinator inside,
Streams then injects a StreamsPartitionAssignor to its pluggable
PartitionAssignor interface and inside the StreamsPartitionAssignor we also
have a TaskAssignor interface whose default implementation is
StickyPartitionAssignor. Streams partition assignor logic today sites in
the latter two classes. Hence the hierarchy today is:

KafkaConsumer -> ConsumerCoordinator -> StreamsPartitionAssignor ->
StickyTaskAssignor.

We need to think about where the proposed implementation would take place
at, and personally I think it is not the best option to inject all of them
into the StreamsPartitionAssignor / StickyTaskAssignor since the logic of
"triggering another rebalance" etc would require some coordinator logic
which is hard to mimic at PartitionAssignor level. On the other hand, since
we are embedding a KafkaConsumer client as a whole we cannot just replace
ConsumerCoordinator with a specialized StreamsCoordinator like Connect does
in KIP-415. So I'd like to maybe split the current proposal in both
consumer layer and streams-assignor layer like we did in KIP-98/KIP-129.
And then the key thing to consider is how to cut off the boundary so that
the modifications we push to ConsumerCoordinator would be beneficial
universally for any consumers, while keep the Streams-specific logic at the
assignor level.
Yes, that's also my ideal plan. The details for the implementation are depicted
in this 
doc,
 and I have explained the reasoning on why we want to push a
global change of replacing ConsumerCoordinator with StreamCoordinator. The 
motivation
is that KIP space is usually used for public & algorithm level change, not for 
internal
implementation details.

3. Depending on which design direction we choose, our migration plan would
also be quite different. For example, if we stay with ConsumerCoordinator
whose protocol type is "consumer" still, and we can manage to make all
changes agnostic to brokers as well as to old versioned consumers, then our
migration plan could be much easier.
Yes, the upgrade plan was designed to take the new StreamCoordinator approach
which means we shall define a new protocol type. For existing application we 
could only
maintain the same `consumer` protocol type is because current broker only allows
change of protocol type when the consumer group is empty. It is of course 
user-unfriendly to force
a wipe-out for the entire application, and I don't think maintaining old 
protocol type would greatly
impact ongoing services using new stream coordinator. WDYT?

4. I think one major issue related to this KIP is that today, in the
StickyPartitionAssignor, we always try to honor stickiness over workload
balance, and hence "learner task" is needed to break this priority, but I'm
wondering if we can have a better solution within sticky task assignor that
accommodate this?
Great question! That's what I explained in the proposal, which is that we 
should breakdown our
delivery into different stages. At very beginning, our goal is to trigger 
learner task assignment only on
`new` hosts, where we shall leverage leader's knowledge of previous round of 
rebalance to figure out. After
stage one, our goal is to have a smooth scaling up experience, but the task 
balance problem is kind of orthogonal.
The load balance problem is a much broader topic than auto scaling, which I 
figure worth discussing within
this KIP's context since it's a naturally next-step, but wouldn't be the main 
topic.
Learner task or auto scaling support should be treated as `a helpful mechanism 
to reach load balance`, but not `an algorithm defining load balance`. It would 
be great if you could share some insights of 

Re: KIP-213 - Scalable/Usable Foreign-Key KTable joins - Rebooted.

2019-03-04 Thread Adam Bellemare
Hi Matthias

Thank you for the feedback! I appreciate your well thought-out questions. I
have tried to answer and comment on everything that I know below.



*> Q) For the materialized combined-key store, why do we need to disable>
caching? And why do we need to flush the store?*
This is an artifact from Jan's implementation that I have carried along. My
understanding (though possibly erroneous!) is that RocksDB prefix scan
doesn't work with the cache, and ignores any data stored within it. I have
tried to validate this but I have not succeeded, so I believe that this
will need more investigation and testing. I will dig deeper on this and get
back to you.



*> a) Thus, I am wondering why we would need to send the `null` message
back> (from RHS to LHS) in the first place?*

We don't need to, if we follow your subsequent tombstone suggestion.





*> (b) About using "offsets" to resolve ordering issue: I don't think this>
would work. The join input table would be created as>
stream.flatMapValues().groupByKey().aggregate()*
Hmmm... I am a bit fuzzy on this part. Wouldn't the LHS processor be able
to get the highest offset and propagate that onwards to the RHS processor?
In my original design I had a wrapper that kept track of the input offset,
though I suspect it did not work for the above aggregation scenario.

*c-P1)*
Regarding the consistency examples, everything you wrote is correct as far
as I can tell in how the proposed system would behave. Rapid updates to the
LHS will result in some of the results being discarded (in the case of
deletes or change of FK) or doubly-produced (discussed below, after the
following example).

It does not seem to me to be possible to avoid quashing records that are
late arriving from the RHS. This could commonly be exhibited by two RHS
processors that are receiving very different loads. In the example below,
consider RHS-1 to be heavily loaded while RHS-2 is idle.

Example:
1- RHS-1 is updated to Y|bar
2- RHS-2 is updated to Z|foo
3- LHS is updated to A|Y
   -> sends Y|A+ subscription message to RHS-1
3- LHS is updated to A|Z
   -> sends Y|A- unsubscribe message to RHS-1
   -> sends Z|A+ subscription to RHS-2
4- RHS-2 processes Z|A message immediately
   -> sends A|Z,foo back
5- LHS processes A|Z,foo and produces result record A|join(Z,foo)
4- RHS-1 processes Y|A message
   -> sends A|Y,bar back
4- RHS-1 processes Y|A- unsubscribe message
   -> sends A|null message back
X- LHS processes A|Y,bar, compares it to A|Z, and discards it due to
staleness.
X- LHS processes A|null, compares it to A|Z, and discards it due to
staleness.

In this case, intermediate messages were discarded due to staleness. From
the outside, this can be seen as "incorrect" because these intermediate
results were not shown. However, there is no possible way for RHS-2 to know
to delay production of its event until RHS-1 has completed its
propagations. If we wish to produce all intermediate events, in order, we
must maintain state on the LHS about which events have been sent out, await
their return, and only publish them in order. Aside from the obvious
complexity and memory requirements, the intermediate events would
immediately be stomped by the final output.


*c-P2: Duplicates)*
With regards to duplicates (as per the double-sending of `A|Y,2,bar`), one
option is to ship the entire payload of the LHS over to the RHS, and either
join there or ship the entire payload back along with the RHS record. We
would still need to compare the FK on the LHS to ensure that it is still
valid. To take your example and expand it:

1- RHS is updated to Y|bar
2- LHS is updated to A|Y,1
   -> sends Y|(A, (Y,1))+ subscription message to RHS
3- LHS is updated to A|Y,2
   -> sends Y|(A, (Y,1))- unsubscribe message to RHS
   -> sends Y|(A, (Y,2))+ subscription to RHS
4- RHS processes first Y|A+ message
   -> sends A|(A, (Y,1)),bar back
5- LHS processes A|(A, (Y,1)),bar and produces result record A|Y,1,bar
6- RHS processes Y|(A, (Y,1))- unsubscribe message (update store only)
7- RHS processes second Y|(A, (Y,2))+ subscribe message
   -> sends A|(A, (Y,2)),bar back
8- LHS processes A|(A, (Y,2)),bar and produces result record A|Y,2,bar

Thus, the first result record is now `A|Y,1,bar`, while the second is
`A|Y,2,bar`.

This will add substantially to the data payload size. The question here
then becomes, "In which scenario is this a necessity"?

A possible scenario may include:
ktable.toStream.filter(filterFunc).foreach( workFunc )
//filterFunc true if value == (Y,1), else false
If the intermediate event (`A|Y,1`) is never produced + filtered, then
workFunc will not be executed. If I am mistaken on this point, please let
me know.



*Monkey Wrench)If you change the foreign key (Say, A|Z,1) while the updates
in step 2 & 3 above are processing in step 4, the results will all be
rejected anyways upon returning to the LHS. So even if we send the payload,
the results will be rejected as stale.*

*Conclusion:*
My two cents is 

[jira] [Created] (KAFKA-8036) Log dir reassignment on followers fails with FileNotFoundException for the leader epoch cache on leader election

2019-03-04 Thread Stanislav Kozlovski (JIRA)
Stanislav Kozlovski created KAFKA-8036:
--

 Summary: Log dir reassignment on followers fails with 
FileNotFoundException for the leader epoch cache on leader election
 Key: KAFKA-8036
 URL: https://issues.apache.org/jira/browse/KAFKA-8036
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.0.1, 1.1.0, 1.0.2
Reporter: Stanislav Kozlovski
Assignee: Stanislav Kozlovski


When changing a partition's log directories for a follower broker, we move all 
the data related to that partition to the other log dir (as per 
[KIP-113|https://cwiki.apache.org/confluence/display/KAFKA/KIP-113:+Support+replicas+movement+between+log+directories]).
 On a successful move, we rename the original directory by adding a suffix 
consisting of an UUID and `-delete`. (e.g `test_log_dir` would be renamed to 
`test_log_dir-0.32e77c96939140f9a56a49b75ad8ec8d-delete`)

We copy every log file and [initialize a new leader epoch file 
cache|[https://github.com/apache/kafka/blob/0d56f1413557adabc736cae2dffcdc56a620403e/core/src/main/scala/kafka/log/Log.scala#L768].]
 The problem is that we do not update the associated `Replica` class' leader 
epoch cache - it still points to the old `LeaderEpochFileCache` instance.
This results in a FileNotFound exception when the broker is [elected as a 
leader for the 
partition|[https://github.com/apache/kafka/blob/255f4a6effdc71c273691859cd26c4138acad778/core/src/main/scala/kafka/cluster/Partition.scala#L312].]
 This has the unintended side effect of marking the log directory as offline, 
resulting in all partitions from that log directory becoming unavailable for 
the specific broker.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Kafka Memory Leakage???

2019-03-04 Thread Sönke Liebau
Hi Syed,

and you are sure that this memory is actually allocated? I still have my
reservations about that metric to be honest. Is there any way to connect to
the process with for example jconsole and having a look at memory
consumption in there?
Or alternatively, since the code you have sent is not relying on SnapLogic
anymore, can you just run it as a standalone application and check memory
consumption?

That code looks very similar to what I ran (without knowing your input
parameters for issuggest et. al of course) and for me memory consumption
stayed between 120mb and 200mb.

Best regards,
Sönke


On Mon, Mar 4, 2019 at 1:44 PM Syed Mudassir Ahmed <
syed.mudas...@gaianconsultants.com> wrote:

> Sonke,
>   thanks again.
>   Yes, I replaced the non-kafka code from our end with a simple Sysout
> statement as follows:
>
> do {
> ConsumerRecords records = 
> consumer.poll(Duration.of(timeout, ChronoUnit.MILLIS));
> for (final ConsumerRecord record : records) {
> if (!infiniteLoop && !oneTimeMode) {
> --msgCount;
> if (msgCount < 0) {
> break;
> }
> }
> Debugger.doPrint("value read:<" + record.value() + ">");
> /*outputViews.write(new BinaryOutput() {
> @Override
> public Document getHeader() {
> return generateHeader(record, oldHeader);
> }
>
> @Override
> public void write(WritableByteChannel writeChannel) throws 
> IOException {
> try (OutputStream os = 
> Channels.newOutputStream(writeChannel)) {
> os.write(record.value());
> }
> }
> });*/
> //The offset to commit should be the next offset of the current one,
> // according to the API
> offsets.put(new TopicPartition(record.topic(), record.partition()),
> new OffsetAndMetadata(record.offset() + 1));
> //In suggest mode, we should not change the current offset
> if (isSyncCommit && isSuggest) {
> commitOffset(offsets);
> offsets.clear();
> }
> }
> } while ((msgCount > 0 || infiniteLoop) && isRunning.get());
>
>
> *Note: *Debugger is a wrapper class that just writes the given string to a 
> local file using PrintStream's println() method.
>
> And I don't see any diff in the metrics.  I still see the huge amount of
> memory allocated.
>
> See the image attached.
>
>
> Thanks,
>
>
>
> On Mon, Mar 4, 2019 at 5:17 PM Sönke Liebau
>  wrote:
>
>> Hi Syed,
>>
>> let's keep it on the list for now so that everybody can participate :)
>>
>> The different .poll() method was just an unrelated observation, the
>> main points of my mail were the question about whether this is the
>> correct metric you are looking at and replacing the payload of your
>> code with a println statement to remove non-Kafka code from your
>> program and make sure that the leak is not in there. Have you tried
>> that?
>>
>> Best regards,
>> Sönke
>>
>> On Mon, Mar 4, 2019 at 7:21 AM Syed Mudassir Ahmed
>>  wrote:
>> >
>> > Sonke,
>> >   Thanks so much for the reply.  I used the new version of
>> poll(Duration) method.  Still, I see memory issue.
>> >   Is there a way we can get on a one-one call and discuss this pls?
>> Let me know your availability.  I can share zoom meeting link.
>> >
>> > Thanks,
>> >
>> >
>> >
>> > On Sat, Mar 2, 2019 at 2:15 AM Sönke Liebau 
>> > 
>> wrote:
>> >>
>> >> Hi Syed,
>> >>
>> >> from your screenshot I assume that you are using SnapLogic to run your
>> >> code (full disclosure: I do not have the faintest idea of this
>> >> product!). I've just had a look at the docs and am a bit confused by
>> >> their explanation of the metric that you point out in your image
>> >> "Memory Allocated". The docs say: "The Memory Allocated reflects the
>> >> number of bytes that were allocated by the Snap.  Note that this
>> >> number does not reflect the amount of memory that was freed and it is
>> >> not the peak memory usage of the Snap.  So, it is not necessarily a
>> >> metric that can be used to estimate the required size of a Snaplex
>> >> node.  Rather, the number provides an insight into how much memory had
>> >> to be allocated to process all of the documents.  For example, if the
>> >> total allocated was 5MB and the Snap processed 32 documents, then the
>> >> Snap allocated roughly 164KB per document.  When combined with the
>> >> other statistics, this number can help to identify the potential
>> >> causes of performance issues."
>> >> The part about not reflecting memory that was freed makes me somewhat
>> >> doubtful whether this actually reflects how much memory the process
>> >> currently holds.  Can you give some more insight there?
>> >>
>> >> Apart from that, I just ran your code somewhat modified to make it
>> >> work without dependencies for 2 hours and saw no unusual memory
>> >> consumption, just a regular garbage collection 

Re: Apache Kafka Memory Leakage???

2019-03-04 Thread Syed Mudassir Ahmed
Sonke,
  thanks again.
  Yes, I replaced the non-kafka code from our end with a simple Sysout
statement as follows:

do {
ConsumerRecords records =
consumer.poll(Duration.of(timeout, ChronoUnit.MILLIS));
for (final ConsumerRecord record : records) {
if (!infiniteLoop && !oneTimeMode) {
--msgCount;
if (msgCount < 0) {
break;
}
}
Debugger.doPrint("value read:<" + record.value() + ">");
/*outputViews.write(new BinaryOutput() {
@Override
public Document getHeader() {
return generateHeader(record, oldHeader);
}

@Override
public void write(WritableByteChannel writeChannel) throws
IOException {
try (OutputStream os = Channels.newOutputStream(writeChannel)) {
os.write(record.value());
}
}
});*/
//The offset to commit should be the next offset of the current one,
// according to the API
offsets.put(new TopicPartition(record.topic(), record.partition()),
new OffsetAndMetadata(record.offset() + 1));
//In suggest mode, we should not change the current offset
if (isSyncCommit && isSuggest) {
commitOffset(offsets);
offsets.clear();
}
}
} while ((msgCount > 0 || infiniteLoop) && isRunning.get());


*Note: *Debugger is a wrapper class that just writes the given string
to a local file using PrintStream's println() method.

And I don't see any diff in the metrics.  I still see the huge amount of
memory allocated.

See the image attached.


Thanks,



On Mon, Mar 4, 2019 at 5:17 PM Sönke Liebau
 wrote:

> Hi Syed,
>
> let's keep it on the list for now so that everybody can participate :)
>
> The different .poll() method was just an unrelated observation, the
> main points of my mail were the question about whether this is the
> correct metric you are looking at and replacing the payload of your
> code with a println statement to remove non-Kafka code from your
> program and make sure that the leak is not in there. Have you tried
> that?
>
> Best regards,
> Sönke
>
> On Mon, Mar 4, 2019 at 7:21 AM Syed Mudassir Ahmed
>  wrote:
> >
> > Sonke,
> >   Thanks so much for the reply.  I used the new version of
> poll(Duration) method.  Still, I see memory issue.
> >   Is there a way we can get on a one-one call and discuss this pls?  Let
> me know your availability.  I can share zoom meeting link.
> >
> > Thanks,
> >
> >
> >
> > On Sat, Mar 2, 2019 at 2:15 AM Sönke Liebau 
> > 
> wrote:
> >>
> >> Hi Syed,
> >>
> >> from your screenshot I assume that you are using SnapLogic to run your
> >> code (full disclosure: I do not have the faintest idea of this
> >> product!). I've just had a look at the docs and am a bit confused by
> >> their explanation of the metric that you point out in your image
> >> "Memory Allocated". The docs say: "The Memory Allocated reflects the
> >> number of bytes that were allocated by the Snap.  Note that this
> >> number does not reflect the amount of memory that was freed and it is
> >> not the peak memory usage of the Snap.  So, it is not necessarily a
> >> metric that can be used to estimate the required size of a Snaplex
> >> node.  Rather, the number provides an insight into how much memory had
> >> to be allocated to process all of the documents.  For example, if the
> >> total allocated was 5MB and the Snap processed 32 documents, then the
> >> Snap allocated roughly 164KB per document.  When combined with the
> >> other statistics, this number can help to identify the potential
> >> causes of performance issues."
> >> The part about not reflecting memory that was freed makes me somewhat
> >> doubtful whether this actually reflects how much memory the process
> >> currently holds.  Can you give some more insight there?
> >>
> >> Apart from that, I just ran your code somewhat modified to make it
> >> work without dependencies for 2 hours and saw no unusual memory
> >> consumption, just a regular garbage collection sawtooth pattern. That
> >> being said, I had to replace your actual processing with a simple
> >> println, so if there is a memory leak in there I would of course not
> >> have noticed.
> >> I've uploaded the code I ran [1] for reference. For further analysis,
> >> maybe you could run something similar with just a println or noop and
> >> see if the symptoms persist, to localize the leak (if it exists).
> >>
> >> Also, two random observations on your code:
> >>
> >> KafkaConsumer.poll(Long timeout) is deprecated, you should consider
> >> using the overloaded version with a Duration parameter instead.
> >>
> >> The comment at [2] seems to contradict the following code, as the
> >> offsets are only changed when in suggest mode. But as I have no idea
> >> what suggest mode even is or all this means this observation may be
> >> miles of point :)
> >>
> >> I hope that 

[jira] [Created] (KAFKA-8035) Add tests for generic bounds for KStream API

2019-03-04 Thread Bruno Cadonna (JIRA)
Bruno Cadonna created KAFKA-8035:


 Summary: Add tests for generic bounds for KStream API 
 Key: KAFKA-8035
 URL: https://issues.apache.org/jira/browse/KAFKA-8035
 Project: Kafka
  Issue Type: Test
  Components: streams
Reporter: Bruno Cadonna


Add tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Apache Kafka Memory Leakage???

2019-03-04 Thread Sönke Liebau
Hi Syed,

let's keep it on the list for now so that everybody can participate :)

The different .poll() method was just an unrelated observation, the
main points of my mail were the question about whether this is the
correct metric you are looking at and replacing the payload of your
code with a println statement to remove non-Kafka code from your
program and make sure that the leak is not in there. Have you tried
that?

Best regards,
Sönke

On Mon, Mar 4, 2019 at 7:21 AM Syed Mudassir Ahmed
 wrote:
>
> Sonke,
>   Thanks so much for the reply.  I used the new version of poll(Duration) 
> method.  Still, I see memory issue.
>   Is there a way we can get on a one-one call and discuss this pls?  Let me 
> know your availability.  I can share zoom meeting link.
>
> Thanks,
>
>
>
> On Sat, Mar 2, 2019 at 2:15 AM Sönke Liebau 
>  wrote:
>>
>> Hi Syed,
>>
>> from your screenshot I assume that you are using SnapLogic to run your
>> code (full disclosure: I do not have the faintest idea of this
>> product!). I've just had a look at the docs and am a bit confused by
>> their explanation of the metric that you point out in your image
>> "Memory Allocated". The docs say: "The Memory Allocated reflects the
>> number of bytes that were allocated by the Snap.  Note that this
>> number does not reflect the amount of memory that was freed and it is
>> not the peak memory usage of the Snap.  So, it is not necessarily a
>> metric that can be used to estimate the required size of a Snaplex
>> node.  Rather, the number provides an insight into how much memory had
>> to be allocated to process all of the documents.  For example, if the
>> total allocated was 5MB and the Snap processed 32 documents, then the
>> Snap allocated roughly 164KB per document.  When combined with the
>> other statistics, this number can help to identify the potential
>> causes of performance issues."
>> The part about not reflecting memory that was freed makes me somewhat
>> doubtful whether this actually reflects how much memory the process
>> currently holds.  Can you give some more insight there?
>>
>> Apart from that, I just ran your code somewhat modified to make it
>> work without dependencies for 2 hours and saw no unusual memory
>> consumption, just a regular garbage collection sawtooth pattern. That
>> being said, I had to replace your actual processing with a simple
>> println, so if there is a memory leak in there I would of course not
>> have noticed.
>> I've uploaded the code I ran [1] for reference. For further analysis,
>> maybe you could run something similar with just a println or noop and
>> see if the symptoms persist, to localize the leak (if it exists).
>>
>> Also, two random observations on your code:
>>
>> KafkaConsumer.poll(Long timeout) is deprecated, you should consider
>> using the overloaded version with a Duration parameter instead.
>>
>> The comment at [2] seems to contradict the following code, as the
>> offsets are only changed when in suggest mode. But as I have no idea
>> what suggest mode even is or all this means this observation may be
>> miles of point :)
>>
>> I hope that helps a little.
>>
>> Best regards,
>> Sönke
>>
>> [1] https://gist.github.com/soenkeliebau/e77e8665a1e7e49ade9ec27a6696e983
>> [2] 
>> https://gist.github.com/soenkeliebau/e77e8665a1e7e49ade9ec27a6696e983#file-memoryleak-java-L86
>>
>>
>> On Fri, Mar 1, 2019 at 7:35 AM Syed Mudassir Ahmed
>>  wrote:
>> >
>> >
>> > Thanks,
>> >
>> >
>> >
>> > -- Forwarded message -
>> > From: Syed Mudassir Ahmed 
>> > Date: Tue, Feb 26, 2019 at 12:40 PM
>> > Subject: Apache Kafka Memory Leakage???
>> > To: 
>> > Cc: Syed Mudassir Ahmed 
>> >
>> >
>> > Hi Team,
>> >   I have a java application based out of latest Apache Kafka version 2.1.1.
>> >   I have a consumer application that runs infinitely to consume messages 
>> > whenever produced.
>> >   Sometimes there are no messages produced for hours.  Still, I see that 
>> > the memory allocated to consumer program is drastically increasing.
>> >   My code is as follows:
>> >
>> > AtomicBoolean isRunning = new AtomicBoolean(true);
>> >
>> > Properties kafkaProperties = new Properties();
>> >
>> > kafkaProperties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
>> > brokers);
>> >
>> > kafkaProperties.put(ConsumerConfig.GROUP_ID_CONFIG, groupID);
>> >
>> > kafkaProperties.put(ConsumerConfig.CLIENT_ID_CONFIG, 
>> > UUID.randomUUID().toString());
>> > kafkaProperties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
>> > kafkaProperties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, 
>> > AUTO_OFFSET_RESET_EARLIEST);
>> > consumer = new KafkaConsumer(kafkaProperties, 
>> > keyDeserializer, valueDeserializer);
>> > if (topics != null) {
>> > subscribeTopics(topics);
>> > }
>> >
>> >
>> > boolean infiniteLoop = false;
>> > boolean oneTimeMode = false;
>> > int timeout = consumeTimeout;
>> > if (isSuggest) {
>> > //Configuration for suggest mode
>> > oneTimeMode