Jenkins build is back to normal : kafka-trunk-jdk11 #1549

2020-06-07 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-trunk-jdk14 #197

2020-06-07 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-10012; Reduce overhead of strings in SelectorMetrics (#8684)


--
[...truncated 6.31 MB...]

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureInternalTopicNamesIfW

[jira] [Created] (KAFKA-10118) Improve stickiness verification for AbstractStickyAssignorTest

2020-06-07 Thread Luke Chen (Jira)
Luke Chen created KAFKA-10118:
-

 Summary: Improve stickiness verification for 
AbstractStickyAssignorTest
 Key: KAFKA-10118
 URL: https://issues.apache.org/jira/browse/KAFKA-10118
 Project: Kafka
  Issue Type: Test
Reporter: Luke Chen


In KAFKA-9987 , we implemented an optimized sticky partition assignor algorithm 
(i.e. {{constrainedAssign}} mehtod), so the original *isSticky* validation is 
not suitable for all situations. In this PR: 
[https://github.com/apache/kafka/pull/8788,] we removed the unnecessary 
isSticky validation for {{constrainedAssign}} method testing. But we should 
have a replace validation method we removed with a better one. For more 
discussion, please see: 
[https://github.com/apache/kafka/pull/8788#pullrequestreview-425551903]

 

We haven't got any better idea so far, so welcome to provide suggestion. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Permissions for Jira ticket self-assignment and creation of KIP

2020-06-07 Thread Matthias J. Sax
Done.

On 6/7/20 2:18 PM, Joel Wee wrote:
> Thanks Matthias
> 
> Have just created a wiki account 
> joel.wee with the same 
> email joel@outlook.com. Please could someone 
> give write permission to create a KIP?
> 
> Thanks
> 
> Joel
> 
> On 7 Jun 2020, at 9:49 PM, Matthias J. Sax 
> mailto:mj...@apache.org>> wrote:
> 
> Add you to Jira.
> 
> However, the wiki is independent of Jira and you need to create an
> account their, too.
> 
> 
> -Matthias
> 
> On 6/7/20 9:23 AM, Joel Wee wrote:
> Hi Dev Team,
> 
> Please could I be added to the contributor list and have permission to create 
> a KIP? (For KAFKA-6435)
> 
> Jira user: JoelWee
> Email: 
> joel@outlook.com
> 
> Thank you
> 
> Joel
> 
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Resolved] (KAFKA-10012) Reducing memory overhead associated with strings in MetricName

2020-06-07 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-10012.
-
Fix Version/s: 2.6.0
   Resolution: Fixed

> Reducing memory overhead associated with strings in MetricName
> --
>
> Key: KAFKA-10012
> URL: https://issues.apache.org/jira/browse/KAFKA-10012
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Navina Ramesh
>Assignee: Navina Ramesh
>Priority: Major
> Fix For: 2.6.0
>
>
> {{SelectorMetrics}} has a per-connection metrics, which means the number of 
> {{MetricName}} objects and the strings associated with it (such as group name 
> and description) grows with the number of connections in the client. This 
> overhead of duplicate string objects is amplified when there are multiple 
> instances of kafka clients within the same JVM. 
> This patch address some of the memory overhead by making {{metricGrpName}} a 
> constant and introducing a new constant {{perConnectionMetricGrpName}}. 
> Additionally, the strings for metric name and description in {{createMeter}} 
> have been interned since there are about 8 per-client and 4 per-connection 
> {{Meter}} instances.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is back to normal : kafka-2.6-jdk8 #26

2020-06-07 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-trunk-jdk11 #1548

2020-06-07 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-9216: Enforce internal config topic settings for Connect workers


--
[...truncated 6.32 MB...]
org.apache.kafka.streams.TopologyTestDriverTest > 
shouldThrowIfPersistentBuiltInStoreIsAccessedWithUntypedMethod[Eos enabled = 
false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldAllowPrePopulatingStatesStoresWithCachingEnabled[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectPersistentStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldRespectTaskIdling[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] STARTED

[jira] [Created] (KAFKA-10117) TimingWheel addOverflowWheel

2020-06-07 Thread Rhett.Wang (Jira)
Rhett.Wang created KAFKA-10117:
--

 Summary: TimingWheel  addOverflowWheel
 Key: KAFKA-10117
 URL: https://issues.apache.org/jira/browse/KAFKA-10117
 Project: Kafka
  Issue Type: Wish
  Components: tools
Affects Versions: 2.4.0
Reporter: Rhett.Wang


The code for kafka's TimingWheel,   I think the method addOverflowWheel is not 
good.  so I  write this code.  Please check it.


class TimingWheel{
 public String test="ok";
 TimingWheel(){
 
 }
 //new instance
 TimingWheel instace = EnumInstance.INSTANCE.getInstace();
}
public enum EnumInstance {
 INSTANCE;
 private TimingWheel instance;
 EnumInstance(){
 instance= new TimingWheel();
 }
 public TimingWheel getInstace(){
 return instance;
 }
}
class Test{

 public static void main(String[] args) {
 TimingWheel first = new TimingWheel();
 
 // advanceClock
 String test = first.instace.test;
 }
}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9468) config.storage.topic partition count issue is hard to debug

2020-06-07 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch resolved KAFKA-9468.
--
  Assignee: Randall Hauch
Resolution: Fixed

> config.storage.topic partition count issue is hard to debug
> ---
>
> Key: KAFKA-9468
> URL: https://issues.apache.org/jira/browse/KAFKA-9468
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 1.0.2, 1.1.1, 2.0.1, 2.1.1, 2.2.2, 2.4.0, 2.3.1
>Reporter: Evelyn Bayes
>Assignee: Randall Hauch
>Priority: Minor
> Fix For: 2.3.2, 2.6.0, 2.4.2, 2.5.1
>
>
> When you run connect distributed with 2 or more workers and 
> config.storage.topic has more then 1 partition, you can end up with one of 
> the workers rebalancing endlessly:
> [2020-01-13 12:53:23,535] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Current config state offset 37 is behind group 
> assignment 63, reading to end of config log 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Finished reading to end of log and updated config 
> snapshot, new config log offset: 37 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Current config state offset 37 does not match group 
> assignment 63. Forcing rebalance. 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  
> In case any person viewing this doesn't know you are only ever meant to 
> create this topic with one partition.
>  
> *Suggested Solution*
> Make the connect worker check the partition count when it starts and if 
> partition count is > 1 Kafka Connect stops and logs the reason why.
> I think this is reasonable as it would stop users just starting out from 
> building it incorrectly and would be easy to fix early. For those upgrading 
> this would easily be caught in a PRE-PROD environment. And even if they 
> upgraded directly in PROD you would only be impacted if upgraded all connect 
> workers at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10116) GraalVM native-image prototype

2020-06-07 Thread Ismael Juma (Jira)
Ismael Juma created KAFKA-10116:
---

 Summary: GraalVM native-image prototype
 Key: KAFKA-10116
 URL: https://issues.apache.org/jira/browse/KAFKA-10116
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma
Assignee: Ismael Juma


It would be interesting to check how Kafka would perform if compiled into a 
native image via GraalVM.

Tools would clearly benefit from the faster start-up time and lower memory 
usage, but it would also be interesting to check how the broker performs in 
this mode (assuming we can make it work).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10115) Incorporate errors.tolerance with the Errant Record Reporter

2020-06-07 Thread Aakash Shah (Jira)
Aakash Shah created KAFKA-10115:
---

 Summary: Incorporate errors.tolerance with the Errant Record 
Reporter
 Key: KAFKA-10115
 URL: https://issues.apache.org/jira/browse/KAFKA-10115
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 2.6.0
Reporter: Aakash Shah
Assignee: Aakash Shah
 Fix For: 2.6.0


The errors.tolerance config is currently not being used when using the Errant 
Record Reporter. If errors.tolerance is none then the task should fail rather 
than sending it to the DLQ in Kafka.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-9216) Enforce connect internal topic configuration at startup

2020-06-07 Thread Randall Hauch (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Randall Hauch reopened KAFKA-9216:
--

The previous PR only checked the number of partitions, so I'm going to reopen 
this to add another PR that checks the internal topic cleanup policy, which 
should be `compact` (only), and should not be `delete,compact` or `delete`. 
Using any other topic cleanup policy for the internal topics can lead to lost 
configurations, source offsets, or statuses.

> Enforce connect internal topic configuration at startup
> ---
>
> Key: KAFKA-9216
> URL: https://issues.apache.org/jira/browse/KAFKA-9216
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Assignee: Evelyn Bayes
>Priority: Major
> Fix For: 2.3.2, 2.6.0, 2.4.2, 2.5.1
>
>
> Users sometimes configure Connect's internal topic for configurations with 
> more than one partition. One partition is expected, however, and using more 
> than one leads to weird behavior that is sometimes not easy to spot.
> Here's one example of a log message:
> {noformat}
> "textPayload": "[2019-11-20 11:12:14,049] INFO [Worker clientId=connect-1, 
> groupId=td-connect-server] Current config state offset 284 does not match 
> group assignment 274. Forcing rebalance. 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:942)\n"
> {noformat}
> Would it be possible to add a check in the KafkaConfigBackingStore and 
> prevent the worker from starting if connect config partition count !=1 ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] KIP-599: Throttle Create Topic, Create Partition and Delete Topic Operations

2020-06-07 Thread Colin McCabe
Hi David,

Thanks for the KIP.

I thought about this for a while and I actually think this approach is not 
quite right.  The problem that I see here is that using an explicitly set quota 
here requires careful tuning by the cluster operator.  Even worse, this tuning 
might be invalidated by changes in overall conditions or even more efficient 
controller software.

For example, if we empirically find that the controller can do 1000 topics in a 
minute (or whatever), this tuning might actually be wrong if the next version 
of the software can do 2000 topics in a minute because of efficiency upgrades.  
Or, the broker that the controller is located on might be experiencing heavy 
load from its non-controller operations, and so it can only do 500 topics in a 
minute during this period.

So the system administrator gets a very obscure tunable (it's not clear to a 
non-Kafka-developer what "controller mutations" are or why they should care).  
And even worse, they will have to significantly "sandbag" the value that they 
set it to, so that even under the heaviest load and oldest deployed version of 
the software, the controller can still function.  Even worse, this new quota 
adds a lot of complexity to the controller.

What we really want is backpressure when the controller is overloaded.  I 
believe this is the alternative you discuss in "Rejected Alternatives" under 
"Throttle the Execution instead of the Admission"  Your reason for rejecting it 
is that the client error handling does not work well in this case.  But 
actually, this is an artifact of our current implementation, rather than a 
fundamental issue with backpressure.

Consider the example of a CreateTopicsRequest.  The controller could return a 
special error code if the load was too high, and take the create topics event 
off the controller queue.  Let's call that error code BUSY. 
 Additionally, the controller could immediately refuse new events if the queue 
had reached its maximum length, and simply return BUSY for that case as well.

Basically, the way we handle RPC timeouts in the controller right now is not 
very good.  As you know, we time out the RPC, so the client gets 
TimeoutException, but then keep the event on the queue, so that it eventually 
gets executed!  There's no reason why we have to do that.  We could take the 
event off the queue if there is a timeout.  This would reduce load and mostly 
avoid the paradoxical situations you describe (getting TopicExistsException for 
a CreateTopicsRequest retry, etc.)

I say "mostly" because there are still cases where retries could go astray (for 
example if we execute the topic creation but a network problem prevents the 
response from being sent to the client).  But this would still be a very big 
improvement over what we have now.

Sorry for commenting so late on this but I got distracted by some other things. 
 I hope we can figure this one out-- I feel like there is a chance to 
significantly simplify this.

best,
Colin


On Fri, May 29, 2020, at 07:57, David Jacot wrote:
> Hi folks,
> 
> I'd like to start the vote for KIP-599 which proposes a new quota to
> throttle create topic, create partition, and delete topics operations to
> protect the Kafka controller:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-599%3A+Throttle+Create+Topic%2C+Create+Partition+and+Delete+Topic+Operations
> 
> Please, let me know what you think.
> 
> Cheers,
> David
>


Re: Permissions for Jira ticket self-assignment and creation of KIP

2020-06-07 Thread Joel Wee
Thanks Matthias

Have just created a wiki account 
joel.wee with the same 
email joel@outlook.com. Please could someone 
give write permission to create a KIP?

Thanks

Joel

On 7 Jun 2020, at 9:49 PM, Matthias J. Sax 
mailto:mj...@apache.org>> wrote:

Add you to Jira.

However, the wiki is independent of Jira and you need to create an
account their, too.


-Matthias

On 6/7/20 9:23 AM, Joel Wee wrote:
Hi Dev Team,

Please could I be added to the contributor list and have permission to create a 
KIP? (For KAFKA-6435)

Jira user: JoelWee
Email: 
joel@outlook.com

Thank you

Joel





Re: Permissions for Jira ticket self-assignment and creation of KIP

2020-06-07 Thread Matthias J. Sax
Add you to Jira.

However, the wiki is independent of Jira and you need to create an
account their, too.


-Matthias

On 6/7/20 9:23 AM, Joel Wee wrote:
> Hi Dev Team,
> 
> Please could I be added to the contributor list and have permission to create 
> a KIP? (For KAFKA-6435)
> 
> Jira user: JoelWee
> Email: joel@outlook.com
> 
> Thank you
> 
> Joel
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Resolved] (KAFKA-9216) Enforce connect internal topic configuration at startup

2020-06-07 Thread Konstantine Karantasis (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-9216.
---
Resolution: Fixed

> Enforce connect internal topic configuration at startup
> ---
>
> Key: KAFKA-9216
> URL: https://issues.apache.org/jira/browse/KAFKA-9216
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.11.0.0
>Reporter: Randall Hauch
>Assignee: Evelyn Bayes
>Priority: Major
> Fix For: 2.3.2, 2.6.0, 2.4.2, 2.5.1
>
>
> Users sometimes configure Connect's internal topic for configurations with 
> more than one partition. One partition is expected, however, and using more 
> than one leads to weird behavior that is sometimes not easy to spot.
> Here's one example of a log message:
> {noformat}
> "textPayload": "[2019-11-20 11:12:14,049] INFO [Worker clientId=connect-1, 
> groupId=td-connect-server] Current config state offset 284 does not match 
> group assignment 274. Forcing rebalance. 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:942)\n"
> {noformat}
> Would it be possible to add a check in the KafkaConfigBackingStore and 
> prevent the worker from starting if connect config partition count !=1 ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8177) Allow for separate connect instances to have sink connectors with the same name

2020-06-07 Thread Paul Whalen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Whalen resolved KAFKA-8177.

Resolution: Resolved

> Allow for separate connect instances to have sink connectors with the same 
> name
> ---
>
> Key: KAFKA-8177
> URL: https://issues.apache.org/jira/browse/KAFKA-8177
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Paul Whalen
>Priority: Minor
>  Labels: connect
>
> If you have multiple Connect instances (either a single standalone or 
> distributed group of workers) running against the same Kafka cluster, the 
> connect instances cannot each have a sink connector with the same name and 
> still operate independently. This is because the consumer group ID used 
> internally for reading from the source topic(s) is entirely derived from the 
> connector's name: 
> [https://github.com/apache/kafka/blob/d0e436c471ba4122ddcc0f7a1624546f97c4a517/connect/runtime/src/main/java/org/apache/kafka/connect/util/SinkUtils.java#L24]
> The documentation of Connect implies to me that it supports "multi-tenancy," 
> that is, as long as...
>  * In standalone mode, the {{offset.storage.file.filename}} is not shared 
> between instances
>  * In distributed mode, {{group.id}} and {{config.storage.topic}}, 
> {{offset.storage.topic}}, and {{status.storage.topic}} are not the same 
> between instances
> ... then the connect instances can operate completely independently without 
> fear of conflict.  But the sink connector consumer group naming policy makes 
> this untrue. Obviously this can be achieved by uniquely naming connectors 
> across instances, but in some environments that could be a bit of a nuisance, 
> or a challenging policy to enforce. For instance, imagine a large group of 
> developers or data analysts all running their own standalone Connect to load 
> into a SQL database for their own analysis, or replicating to mirroring to 
> their own local cluster for testing.
> The obvious solution is allow supplying config that gives a Connect instance 
> some notion of identity, and to use that when creating the sink task consumer 
> group. Distributed mode already has this obviously ({{group.id}}), but it 
> would need to be added for standalone mode. Maybe {{instance.id}}? Given that 
> solution it seems like this would need a small KIP.
> I could also imagine this solving this problem through better documentation 
> ("ensure your connector names are unique!"), but having that subtlety doesn't 
> seem worth it to me. (Optionally) assigning identity to every Connect 
> instance seems strictly more clear, without any downside.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Permissions for Jira ticket self-assignment and creation of KIP

2020-06-07 Thread Joel Wee
Hi Dev Team,

Please could I be added to the contributor list and have permission to create a 
KIP? (For KAFKA-6435)

Jira user: JoelWee
Email: joel@outlook.com

Thank you

Joel


[jira] [Created] (KAFKA-10114) Kafka producer stuck after broker crash

2020-06-07 Thread Itamar Benjamin (Jira)
Itamar Benjamin created KAFKA-10114:
---

 Summary: Kafka producer stuck after broker crash
 Key: KAFKA-10114
 URL: https://issues.apache.org/jira/browse/KAFKA-10114
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 2.4.1, 2.3.1
Reporter: Itamar Benjamin


Today two of our kafka brokers crashed (cluster of 3 brokers), and producers 
were not able to send new messages. After brokers started again all producers 
resumed sending data except for a single one.

at the beginning producer rejected all new messages with TimeoutException:

 
{code:java}
 org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
incoming-mutable-RuntimeIIL-1:12 ms has passed since batch creation
{code}
 

then after sometime exception changed to

 
{code:java}
org.apache.kafka.common.errors.TimeoutException: Failed to allocate memory 
within the configured max blocking time 6 ms.
{code}
 

 

jstack shows kafka-producer-network-thread is waiting to get producer id:

 
{code:java}
"kafka-producer-network-thread | producer-1" #767 daemon prio=5 os_prio=0 
cpu=63594017.16ms elapsed=1511219.38s tid=0x7fffd8353000 nid=0x4fa4 
sleeping [0x7ff55c177000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(java.base@11.0.1/Native Method)
at org.apache.kafka.common.utils.Utils.sleep(Utils.java:296)
at org.apache.kafka.common.utils.SystemTime.sleep(SystemTime.java:41)
at 
org.apache.kafka.clients.producer.internals.Sender.maybeWaitForProducerId(Sender.java:565)
at 
org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:306)
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:244)
at java.lang.Thread.run(java.base@11.0.1/Thread.java:834)   Locked 
ownable synchronizers:
- None
{code}
 

digging into maybeWaitForProducerId(), it waits until some broker is ready 
(awaitNodeReady function) which in return calls leastLoadedNode() on 
NetworkClient. This one iterates over all brokers and checks if a request can 
be sent to it using canSendRequest().

This is the code for canSendRequest():

 
{code:java}
return connectionStates.isReady(node, now) && selector.isChannelReady(node) && 
inFlightRequests.canSendMore(node)
{code}
 

 

using some debugging tools i saw this expression always evaluates to false 
since the last part (canSendMore) is false. 

 

This is the code for canSendMore:
{code:java}
public boolean canSendMore(String node) { Deque 
queue = requests.get(node); return queue == null || queue.isEmpty() || 
(queue.peekFirst().send.completed() && queue.size() < 
this.maxInFlightRequestsPerConnection); }
{code}
 

 

i verified 
{code:java}
queue.peekFirst().send.completed()
{code}
is true, and that leads to the live lock - since requests queues are full for 
all nodes a new request to check broker availability and reconnect to it cannot 
be submitted.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)