[jira] [Commented] (KAFKA-9316) ConsoleProducer help info not expose default properties

2019-12-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999829#comment-16999829
 ] 

ASF GitHub Bot commented on KAFKA-9316:
---

huxihx commented on pull request #7854: KAFKA-9316: ConsoleProducer help info 
does not expose default support…
URL: https://github.com/apache/kafka/pull/7854
 
 
   …ed properties.
   
   https://issues.apache.org/jira/browse/KAFKA-9316
   
   Unlike ConsoleConsumer, ConsoleProducer help info does not expose default 
properties. Users cannot know what default properties are supported by checking 
the help info.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> ConsoleProducer help info not expose default properties
> ---
>
> Key: KAFKA-9316
> URL: https://issues.apache.org/jira/browse/KAFKA-9316
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Affects Versions: 2.4.0
>Reporter: huxihx
>Assignee: huxihx
>Priority: Major
>
> Unlike ConsoleConsumer, ConsoleProducer help info does not expose default 
> properties. Users cannot know what default properties are supported by 
> checking the help info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9316) ConsoleProducer help info not expose default properties

2019-12-18 Thread huxihx (Jira)
huxihx created KAFKA-9316:
-

 Summary: ConsoleProducer help info not expose default properties
 Key: KAFKA-9316
 URL: https://issues.apache.org/jira/browse/KAFKA-9316
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.4.0
Reporter: huxihx
Assignee: huxihx


Unlike ConsoleConsumer, ConsoleProducer help info does not expose default 
properties. Users cannot know what default properties are supported by checking 
the help info.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9011) Add KStream#flatTransform and KStream#flatTransformValues to Scala API

2019-12-18 Thread Alex Kokachev (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999649#comment-16999649
 ] 

Alex Kokachev commented on KAFKA-9011:
--

[~bbejeck], [~vvcephei], could you guys please cherry-pick commits from PRs 
[https://github.com/apache/kafka/pull/7685]  and 
[https://github.com/apache/kafka/pull/7520] into 2.3 and 2.4 branches? Thanks.

> Add KStream#flatTransform and KStream#flatTransformValues to Scala API
> --
>
> Key: KAFKA-9011
> URL: https://issues.apache.org/jira/browse/KAFKA-9011
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Alex Kokachev
>Assignee: Alex Kokachev
>Priority: Major
>  Labels: scala, streams
> Fix For: 2.5.0
>
>
> Part of KIP-313: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-313%3A+Add+KStream.flatTransform+and+KStream.flatTransformValues]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5756) Synchronization issue on flush

2019-12-18 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999547#comment-16999547
 ] 

Chris Egerton commented on KAFKA-5756:
--

[~olkuznsmith] [~rhauch] [~hachikuji] do you agree with the analysis above?

> Synchronization issue on flush
> --
>
> Key: KAFKA-5756
> URL: https://issues.apache.org/jira/browse/KAFKA-5756
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Oleg Kuznetsov
>Priority: Major
> Fix For: 0.11.0.1, 1.0.0
>
>
> Access to *OffsetStorageWriter#toFlush* is not synchronized in *doFlush()* 
> method, whereas this collection can be accessed from 2 different threads:
> - *WorkerSourceTask.execute()*, finally block
> - *SourceTaskOffsetCommitter*, from periodic flush task



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-5756) Synchronization issue on flush

2019-12-18 Thread Chris Egerton (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999546#comment-16999546
 ] 

Chris Egerton commented on KAFKA-5756:
--

I believe a very similar issue may still be possible and that the 
synchronization added in https://github.com/apache/kafka/pull/3702, while an 
improvement, still doesn't prevent all possible errors caused by concurrent 
calls to {{WorkerSourceTask::commitOffsets}} (which, as noted earlier in the 
ticket, can from from both the periodic offset commit from the 
{{SourceTaskOffsetCommitter}} class and the end-of-life offset commit from the 
{{WorkerSourceTask}} itself).

The {{WorkerSourceTask}} class takes care to ensure that 
{{OffsetStorageWriter::beginFlush}} isn't invoked concurrently, which was 
implemented as part of https://github.com/apache/kafka/pull/3702. However, 
there doesn't appear to be anything in place to prevent that method from being 
invoked before a flush has completed (either via a call to 
{{OffsetStorageWriter::cancelFlush}} or to 
{{OffsetStorageWriter::doFlush::get}}). If this occurs, [an exception is 
thrown|https://github.com/apache/kafka/blob/c6e25bb362899f4e6335ac5578b1cae31b7f2575/connect/runtime/src/main/java/org/apache/kafka/connect/storage/OffsetStorageWriter.java#L108-L112]
 stating that "the framework should not allow this".

Reopening this issue until the above scenario has been addressed.

> Synchronization issue on flush
> --
>
> Key: KAFKA-5756
> URL: https://issues.apache.org/jira/browse/KAFKA-5756
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Oleg Kuznetsov
>Priority: Major
> Fix For: 0.11.0.1, 1.0.0
>
>
> Access to *OffsetStorageWriter#toFlush* is not synchronized in *doFlush()* 
> method, whereas this collection can be accessed from 2 different threads:
> - *WorkerSourceTask.execute()*, finally block
> - *SourceTaskOffsetCommitter*, from periodic flush task



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (KAFKA-5756) Synchronization issue on flush

2019-12-18 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton reopened KAFKA-5756:
--

> Synchronization issue on flush
> --
>
> Key: KAFKA-5756
> URL: https://issues.apache.org/jira/browse/KAFKA-5756
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Oleg Kuznetsov
>Priority: Major
> Fix For: 0.11.0.1, 1.0.0
>
>
> Access to *OffsetStorageWriter#toFlush* is not synchronized in *doFlush()* 
> method, whereas this collection can be accessed from 2 different threads:
> - *WorkerSourceTask.execute()*, finally block
> - *SourceTaskOffsetCommitter*, from periodic flush task



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9315) The Kafka Metrics class should clear the mbeans map when closing

2019-12-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999495#comment-16999495
 ] 

ASF GitHub Bot commented on KAFKA-9315:
---

cmccabe commented on pull request #7851: KAFKA-9315: The Kafka Metrics class 
should clear the mbeans map when closing
URL: https://github.com/apache/kafka/pull/7851
 
 
   The JmxReporter should clear the mbeans map when closing. Otherwise,
   metrics may be incorrectly re-registered if the JmxReporter class is
   used after it is closed.
   
   For example, calling JmxReporter#close followed by
   JmxReporter#unregister could result in some of the mbeans that were
   removed in the close operation being re-registered.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> The Kafka Metrics class should clear the mbeans map when closing
> 
>
> Key: KAFKA-9315
> URL: https://issues.apache.org/jira/browse/KAFKA-9315
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
>
> The JmxReporter should clear the mbeans map when closing.  Otherwise, metrics 
> may be incorrectly re-registered if the JmxReporter class is used after it is 
> closed.
> For example, calling JmxReporter#close followed by JmxReporter#unregister 
> could result in some of the mbeans that were removed in the close operation 
> being re-registered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9315) The Kafka Metrics class should clear the mbeans map when closing

2019-12-18 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-9315:
---

 Summary: The Kafka Metrics class should clear the mbeans map when 
closing
 Key: KAFKA-9315
 URL: https://issues.apache.org/jira/browse/KAFKA-9315
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Reporter: Colin McCabe
Assignee: Colin McCabe


The JmxReporter should clear the mbeans map when closing.  Otherwise, metrics 
may be incorrectly re-registered if the JmxReporter class is used after it is 
closed.

For example, calling JmxReporter#close followed by JmxReporter#unregister could 
result in some of the mbeans that were removed in the close operation being 
re-registered.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB

2019-12-18 Thread Sophie Blee-Goldman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999467#comment-16999467
 ] 

Sophie Blee-Goldman commented on KAFKA-9062:


[~jpzk] I agree this is a bug in Streams (that's what this ticket is for :) ) 
but just to clarify, the PUT is stalled due to the manual compaction issued at 
the end of restoration which is taking too long to compact the excessive L0 
files, not due to an autocompaction triggered by the PUT itself. The reason 
disabling autocompaction prevents this from happening is that rocksdb just 
doesn't stall writes due to excessive L0 files with autocompaction off (which 
makes sense as otherwise you could get stuck forever)

As a very simple workaround, we could consider making the "bulk loading" mode 
optional/configurable (possibly through an augmented RocksDBConfigSetter). 
Users hitting this issue could simply keep autocompaction enabled during 
restoration to hopefully keep the L0 file count under control so new writes 
won't stall. This would in theory slow down the restoration, but I suspect we 
may not be gaining as much from the bulk loading mode as we think since the 
keys are unsorted

> Handle stalled writes to RocksDB
> 
>
> Key: KAFKA-9062
> URL: https://issues.apache.org/jira/browse/KAFKA-9062
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> RocksDB may stall writes at times when background compactions or flushes are 
> having trouble keeping up. This means we can effectively end up blocking 
> indefinitely during a StateStore#put call within Streams, and may get kicked 
> from the group if the throttling does not ease up within the max poll 
> interval.
> Example: when restoring large amounts of state from scratch, we use the 
> strategy recommended by RocksDB of turning off automatic compactions and 
> dumping everything into L0. We do batch somewhat, but do not sort these small 
> batches before loading into the db, so we end up with a large number of 
> unsorted L0 files.
> When restoration is complete and we toggle the db back to normal (not bulk 
> loading) settings, a background compaction is triggered to merge all these 
> into the next level. This background compaction can take a long time to merge 
> unsorted keys, especially when the amount of data is quite large.
> Any new writes while the number of L0 files exceeds the max will be stalled 
> until the compaction can finish, and processing after restoring from scratch 
> can block beyond the polling interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9314) Connect put() and poll() retries not conforming to KIP-298

2019-12-18 Thread Nigel Liang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nigel Liang updated KAFKA-9314:
---
Description: 
KIP-298 outlines the retry policy of Connect when errors are encountered. In 
particular, it proposes to retry on {{RetriableException}} on put() in SinkTask 
and poll() in SourceTask.

However, the code does not reflect this change. For instance, WorkerSourceTask 
handles {{RetriableException}} thrown from {{poll()}} by entering into a tight 
retry loop without backoff. This has led to connectors having to workaround by 
simply not retrying and failing the task always. Users would need to manually 
restart the task to recover from even simple network glitches.

AFAICT from reading code, the same is true for {{WorkerSinkTask}} when calling 
{{put()}}.

  was:
KIP-298 outlines the retry policy of Connect when errors are encountered. In 
particular, it proposes to retry on {{RetriableException}} on put() in SinkTask 
and poll() in SourceTask.

However, the code does not reflect this change. For instance, WorkerSourceTask 
handles {{RetriableException}} thrown from {{poll()}} by entering into a tight 
retry loop without backoff. This has led to connectors having to workaround by 
simply not retrying and failing the task always 
(https://github.com/confluentinc/kafka-connect-jms/pull/88). Users would need 
to manually restart the task to recover from even simple network glitches.

AFAICT from reading code, the same is true for {{WorkerSinkTask}} when calling 
{{put()}}.


> Connect put() and poll() retries not conforming to KIP-298
> --
>
> Key: KAFKA-9314
> URL: https://issues.apache.org/jira/browse/KAFKA-9314
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Nigel Liang
>Assignee: Nigel Liang
>Priority: Major
>
> KIP-298 outlines the retry policy of Connect when errors are encountered. In 
> particular, it proposes to retry on {{RetriableException}} on put() in 
> SinkTask and poll() in SourceTask.
> However, the code does not reflect this change. For instance, 
> WorkerSourceTask handles {{RetriableException}} thrown from {{poll()}} by 
> entering into a tight retry loop without backoff. This has led to connectors 
> having to workaround by simply not retrying and failing the task always. 
> Users would need to manually restart the task to recover from even simple 
> network glitches.
> AFAICT from reading code, the same is true for {{WorkerSinkTask}} when 
> calling {{put()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9314) Connect put() and poll() retries not conforming to KIP-298

2019-12-18 Thread Nigel Liang (Jira)
Nigel Liang created KAFKA-9314:
--

 Summary: Connect put() and poll() retries not conforming to KIP-298
 Key: KAFKA-9314
 URL: https://issues.apache.org/jira/browse/KAFKA-9314
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Nigel Liang
Assignee: Nigel Liang


KIP-298 outlines the retry policy of Connect when errors are encountered. In 
particular, it proposes to retry on {{RetriableException}} on put() in SinkTask 
and poll() in SourceTask.

However, the code does not reflect this change. For instance, WorkerSourceTask 
handles {{RetriableException}} thrown from {{poll()}} by entering into a tight 
retry loop without backoff. This has led to connectors having to workaround by 
simply not retrying and failing the task always 
(https://github.com/confluentinc/kafka-connect-jms/pull/88). Users would need 
to manually restart the task to recover from even simple network glitches.

AFAICT from reading code, the same is true for {{WorkerSinkTask}} when calling 
{{put()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9304) Image on Kafka docs shows incorrect message ID segments

2019-12-18 Thread Guozhang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999413#comment-16999413
 ] 

Guozhang Wang commented on KAFKA-9304:
--

[~orangesnap] I looked at the diagram and I agree with you that the second 
segment file should start at 82xx and end at 83xx offsets. It does seem a 
copy-past error. The image is under /docs/images/kafka_log.png and if you are 
interested in fixing this by submitting a PR to just replace this with a fixed 
image that's much appreciated!

> Image on Kafka docs shows incorrect message ID segments
> ---
>
> Key: KAFKA-9304
> URL: https://issues.apache.org/jira/browse/KAFKA-9304
> Project: Kafka
>  Issue Type: Bug
>Reporter: Victoria Bialas
>Assignee: Victoria Bialas
>Priority: Minor
>
>  
> Docs page: [https://kafka.apache.org/documentation/#log]
> Link to Tweet: [https://twitter.com/Preety48408391/status/1205764249995202560]
> Hi Kafka team, looks like there is issue with below depicting image on Kafka 
> documentation section 5.4. In 2nd segment 82xx.kafka, Message IDs are 
> incorrectly mentioned. Message should start from 82xx but starting from 34xx 
> like in 1st segment. Please correct.
> [~guozhang] , [~mjsax] , this came through on a Tweet, I'll try to fix on the 
> docs. May need some guidance from you, as the problem description isn't 
> completely clear to me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9313) Set default for client.dns.lookup to use_all_dns_ips

2019-12-18 Thread Yeva Byzek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeva Byzek updated KAFKA-9313:
--
Description: 
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}} .  Consequently, by default, if there are multiple IP 
addresses and the first one fails, the connection will fail.

 

It is desirable to change the default to be 
{{client.dns.lookup=use_all_dns_ips}} for two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 

  was:
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}} .  Consequently, by default, if there are multiple IP 
addresses and the first one fails, the connection will fail.

 

It is desirable to set the default to be {{client.dns.lookup=use_all_dns_ips}} 
for two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 


> Set default for client.dns.lookup to use_all_dns_ips
> 
>
> Key: KAFKA-9313
> URL: https://issues.apache.org/jira/browse/KAFKA-9313
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Yeva Byzek
>Priority: Minor
>
> The default setting of the configuration parameter {{client.dns.lookup}} is 
> *not* {{use_all_dns_ips}} .  Consequently, by default, if there are multiple 
> IP addresses and the first one fails, the connection will fail.
>  
> It is desirable to change the default to be 
> {{client.dns.lookup=use_all_dns_ips}} for two reasons:
>  # reduce connection failure rates by 
>  # users are often surprised that this is not already the default
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9313) Set default for client.dns.lookup to use_all_dns_ips

2019-12-18 Thread Yeva Byzek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeva Byzek updated KAFKA-9313:
--
Description: 
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}} .  Consequently, by default, if there are multiple IP 
addresses and the first one fails, the connection will fail.

 

It is desirable to set the default to be {{client.dns.lookup=use_all_dns_ips}} 
for two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 

  was:
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
the first one fails, the connection will fail.

 

It is desirable to set the default to be {{client.dns.lookup=use_all_dns_ips}} 
for two reasons:
 # reduce connection failure rates 
 # users are often surprised that this is not already the default

 

 


> Set default for client.dns.lookup to use_all_dns_ips
> 
>
> Key: KAFKA-9313
> URL: https://issues.apache.org/jira/browse/KAFKA-9313
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Yeva Byzek
>Priority: Minor
>
> The default setting of the configuration parameter {{client.dns.lookup}} is 
> *not* {{use_all_dns_ips}} .  Consequently, by default, if there are multiple 
> IP addresses and the first one fails, the connection will fail.
>  
> It is desirable to set the default to be 
> {{client.dns.lookup=use_all_dns_ips}} for two reasons:
>  # reduce connection failure rates by 
>  # users are often surprised that this is not already the default
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9313) Set default for client.dns.lookup to use_all_dns_ips

2019-12-18 Thread Yeva Byzek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeva Byzek updated KAFKA-9313:
--
Description: 
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
the first one fails, the connection will fail.

 

It is desirable to set the default to be client.dns.lookup=use_all_dns_ips for 
two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 

  was:
The default behavior of client.dns.lookup is not use_all_dns_ips, which means 
if there are multiple IP addresses and the first one fails, the connection will 
fail.

 

It is desirable to set the default to be client.dns.lookup=use_all_dns_ips for 
two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 


> Set default for client.dns.lookup to use_all_dns_ips
> 
>
> Key: KAFKA-9313
> URL: https://issues.apache.org/jira/browse/KAFKA-9313
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Yeva Byzek
>Priority: Minor
>
> The default setting of the configuration parameter {{client.dns.lookup}} is 
> *not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
> the first one fails, the connection will fail.
>  
> It is desirable to set the default to be client.dns.lookup=use_all_dns_ips 
> for two reasons:
>  # reduce connection failure rates by 
>  # users are often surprised that this is not already the default
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9313) Set default for client.dns.lookup to use_all_dns_ips

2019-12-18 Thread Yeva Byzek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeva Byzek updated KAFKA-9313:
--
Description: 
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
the first one fails, the connection will fail.

 

It is desirable to set the default to be {{client.dns.lookup=use_all_dns_ips}} 
for two reasons:
 # reduce connection failure rates 
 # users are often surprised that this is not already the default

 

 

  was:
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
the first one fails, the connection will fail.

 

It is desirable to set the default to be {{client.dns.lookup=use_all_dns_ips}} 
for two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 


> Set default for client.dns.lookup to use_all_dns_ips
> 
>
> Key: KAFKA-9313
> URL: https://issues.apache.org/jira/browse/KAFKA-9313
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Yeva Byzek
>Priority: Minor
>
> The default setting of the configuration parameter {{client.dns.lookup}} is 
> *not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
> the first one fails, the connection will fail.
>  
> It is desirable to set the default to be 
> {{client.dns.lookup=use_all_dns_ips}} for two reasons:
>  # reduce connection failure rates 
>  # users are often surprised that this is not already the default
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (KAFKA-9313) Set default for client.dns.lookup to use_all_dns_ips

2019-12-18 Thread Yeva Byzek (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yeva Byzek updated KAFKA-9313:
--
Description: 
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
the first one fails, the connection will fail.

 

It is desirable to set the default to be {{client.dns.lookup=use_all_dns_ips}} 
for two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 

  was:
The default setting of the configuration parameter {{client.dns.lookup}} is 
*not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
the first one fails, the connection will fail.

 

It is desirable to set the default to be client.dns.lookup=use_all_dns_ips for 
two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 


> Set default for client.dns.lookup to use_all_dns_ips
> 
>
> Key: KAFKA-9313
> URL: https://issues.apache.org/jira/browse/KAFKA-9313
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Yeva Byzek
>Priority: Minor
>
> The default setting of the configuration parameter {{client.dns.lookup}} is 
> *not* {{use_all_dns_ips}}, which means if there are multiple IP addresses and 
> the first one fails, the connection will fail.
>  
> It is desirable to set the default to be 
> {{client.dns.lookup=use_all_dns_ips}} for two reasons:
>  # reduce connection failure rates by 
>  # users are often surprised that this is not already the default
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9313) Set default for client.dns.lookup to use_all_dns_ips

2019-12-18 Thread Yeva Byzek (Jira)
Yeva Byzek created KAFKA-9313:
-

 Summary: Set default for client.dns.lookup to use_all_dns_ips
 Key: KAFKA-9313
 URL: https://issues.apache.org/jira/browse/KAFKA-9313
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Yeva Byzek


The default behavior of client.dns.lookup is not use_all_dns_ips, which means 
if there are multiple IP addresses and the first one fails, the connection will 
fail.

 

It is desirable to set the default to be client.dns.lookup=use_all_dns_ips for 
two reasons:
 # reduce connection failure rates by 
 # users are often surprised that this is not already the default

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9062) Handle stalled writes to RocksDB

2019-12-18 Thread Jendrik Poloczek (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999353#comment-16999353
 ] 

Jendrik Poloczek edited comment on KAFKA-9062 at 12/18/19 5:06 PM:
---

I think this is actually a deadlocked thread by the invocation of the first PUT 
(after restoration) which triggers an auto compaction (default Kafka Streams 
settings). The auto compaction however, is triggered inside PUT and the method 
is blocking. We profiled our application with yourkit to identify this problem. 
We disabled auto compaction (using the RocksDBConfigSetter interface, 
options.disableAutoCompactions()) and we don’t have the problem anymore (the 
timeout block on PUT). Compaction still happens at some point but it seems 
Kafka Streams is aware of this (is not deadlocked by it).
P.S. We're using Kafka Streams 2.2 and default RocksDB settings


was (Author: jpzk):
I think this is actually a deadlocked thread by the invocation of the first PUT 
(after restoration) which triggers an auto compaction (default Kafka Streams 
settings). The auto compaction however, is triggered inside PUT and the method 
is blocking. We profiled our application with yourkit to identify this problem. 
We disabled auto compaction (using the and we don’t have the problem anymore 
(the timeout block on PUT). Compaction still happens at some point but it seems 
Kafka Streams is aware of this (is not deadlocked by it).
P.S. We're using Kafka Streams 2.2 and default RocksDB settings

> Handle stalled writes to RocksDB
> 
>
> Key: KAFKA-9062
> URL: https://issues.apache.org/jira/browse/KAFKA-9062
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> RocksDB may stall writes at times when background compactions or flushes are 
> having trouble keeping up. This means we can effectively end up blocking 
> indefinitely during a StateStore#put call within Streams, and may get kicked 
> from the group if the throttling does not ease up within the max poll 
> interval.
> Example: when restoring large amounts of state from scratch, we use the 
> strategy recommended by RocksDB of turning off automatic compactions and 
> dumping everything into L0. We do batch somewhat, but do not sort these small 
> batches before loading into the db, so we end up with a large number of 
> unsorted L0 files.
> When restoration is complete and we toggle the db back to normal (not bulk 
> loading) settings, a background compaction is triggered to merge all these 
> into the next level. This background compaction can take a long time to merge 
> unsorted keys, especially when the amount of data is quite large.
> Any new writes while the number of L0 files exceeds the max will be stalled 
> until the compaction can finish, and processing after restoring from scratch 
> can block beyond the polling interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-9062) Handle stalled writes to RocksDB

2019-12-18 Thread Jendrik Poloczek (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999353#comment-16999353
 ] 

Jendrik Poloczek edited comment on KAFKA-9062 at 12/18/19 5:04 PM:
---

I think this is actually a deadlocked thread by the invocation of the first PUT 
(after restoration) which triggers an auto compaction (default Kafka Streams 
settings). The auto compaction however, is triggered inside PUT and the method 
is blocking. We profiled our application with yourkit to identify this problem. 
We disabled auto compaction (using the and we don’t have the problem anymore 
(the timeout block on PUT). Compaction still happens at some point but it seems 
Kafka Streams is aware of this (is not deadlocked by it).
P.S. We're using Kafka Streams 2.2 and default RocksDB settings


was (Author: jpzk):
I think this is actually a Kafka Streams bug, since it’s a deadlocked thread by 
the invocation of the first PUT (after restoration) which triggers an auto 
compaction (default Kafka Streams settings). The auto compaction however, is 
triggered inside PUT and the method is blocking. We profiled our application 
with yourkit to identify this problem. We disabled auto compaction (using the 
and we don’t have the problem anymore (the timeout block on PUT). Compaction 
still happens at some point but it seems Kafka Streams is aware of this (is not 
deadlocked by it).
P.S. We're using Kafka Streams 2.2 and default RocksDB settings

> Handle stalled writes to RocksDB
> 
>
> Key: KAFKA-9062
> URL: https://issues.apache.org/jira/browse/KAFKA-9062
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> RocksDB may stall writes at times when background compactions or flushes are 
> having trouble keeping up. This means we can effectively end up blocking 
> indefinitely during a StateStore#put call within Streams, and may get kicked 
> from the group if the throttling does not ease up within the max poll 
> interval.
> Example: when restoring large amounts of state from scratch, we use the 
> strategy recommended by RocksDB of turning off automatic compactions and 
> dumping everything into L0. We do batch somewhat, but do not sort these small 
> batches before loading into the db, so we end up with a large number of 
> unsorted L0 files.
> When restoration is complete and we toggle the db back to normal (not bulk 
> loading) settings, a background compaction is triggered to merge all these 
> into the next level. This background compaction can take a long time to merge 
> unsorted keys, especially when the amount of data is quite large.
> Any new writes while the number of L0 files exceeds the max will be stalled 
> until the compaction can finish, and processing after restoring from scratch 
> can block beyond the polling interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9062) Handle stalled writes to RocksDB

2019-12-18 Thread Jendrik Poloczek (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999353#comment-16999353
 ] 

Jendrik Poloczek commented on KAFKA-9062:
-

I think this is actually a Kafka Streams bug, since it’s a deadlocked thread by 
the invocation of the first PUT (after restoration) which triggers an auto 
compaction (default Kafka Streams settings). The auto compaction however, is 
triggered inside PUT and the method is blocking. We profiled our application 
with yourkit to identify this problem. We disabled auto compaction (using the 
and we don’t have the problem anymore (the timeout block on PUT). Compaction 
still happens at some point but it seems Kafka Streams is aware of this (is not 
deadlocked by it).
P.S. We're using Kafka Streams 2.2 and default RocksDB settings

> Handle stalled writes to RocksDB
> 
>
> Key: KAFKA-9062
> URL: https://issues.apache.org/jira/browse/KAFKA-9062
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Sophie Blee-Goldman
>Priority: Major
>
> RocksDB may stall writes at times when background compactions or flushes are 
> having trouble keeping up. This means we can effectively end up blocking 
> indefinitely during a StateStore#put call within Streams, and may get kicked 
> from the group if the throttling does not ease up within the max poll 
> interval.
> Example: when restoring large amounts of state from scratch, we use the 
> strategy recommended by RocksDB of turning off automatic compactions and 
> dumping everything into L0. We do batch somewhat, but do not sort these small 
> batches before loading into the db, so we end up with a large number of 
> unsorted L0 files.
> When restoration is complete and we toggle the db back to normal (not bulk 
> loading) settings, a background compaction is triggered to merge all these 
> into the next level. This background compaction can take a long time to merge 
> unsorted keys, especially when the amount of data is quite large.
> Any new writes while the number of L0 files exceeds the max will be stalled 
> until the compaction can finish, and processing after restoring from scratch 
> can block beyond the polling interval



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9311) Jumbled table content for broker config doc

2019-12-18 Thread Mickael Maison (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999178#comment-16999178
 ] 

Mickael Maison commented on KAFKA-9311:
---

I'm sure the new design can be improved further but surely nobody is missing 
the tables!

> Jumbled table content for broker config doc
> ---
>
> Key: KAFKA-9311
> URL: https://issues.apache.org/jira/browse/KAFKA-9311
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.4.0
>Reporter: Joel Hamill
>Priority: Major
> Attachments: image-2019-12-17-14-48-19-927.png, 
> image-2019-12-17-14-48-50-800.png
>
>
> The current version of the broker configs has broken table formatting:
> [https://kafka.apache.org/documentation/#brokerconfigs]
> !image-2019-12-17-14-48-50-800.png!
> Previous version: 
> [https://kafka.apache.org/23/documentation/#brokerconfigs]
> !image-2019-12-17-14-48-19-927.png!
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-9301) KafkaProducer#flush should block until all the sent records get completed

2019-12-18 Thread Tomoyuki Saito (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoyuki Saito resolved KAFKA-9301.
---
Resolution: Duplicate

> KafkaProducer#flush should block until all the sent records get completed
> -
>
> Key: KAFKA-9301
> URL: https://issues.apache.org/jira/browse/KAFKA-9301
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Tomoyuki Saito
>Priority: Major
>
> h2. ProducerBatch split makes ProducerBatch.produceFuture completed 
> KAFKA-3995 introduced ProducerBatch split and resend when 
> RecordTooLargeException happens on broker side. When ProducerBatch split 
> happens, ProducerBatch.produceFuture becomes completed, even though records 
> in a batch will be resent to a broker.
> h2. KafkaProducer#flush implementation
> With the current implementation, KafkaProducer#flush blocks until accumulated 
> ProducerBatches get completed. As described above, that does not ensure all 
> the sent records get completed.
> This issue is also mentioned in: https://github.com/apache/kafka/pull/6469



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9301) KafkaProducer#flush should block until all the sent records get completed

2019-12-18 Thread Tomoyuki Saito (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999171#comment-16999171
 ] 

Tomoyuki Saito commented on KAFKA-9301:
---

Duplicate of KAFKA-9312.
I'll close this issue.

> KafkaProducer#flush should block until all the sent records get completed
> -
>
> Key: KAFKA-9301
> URL: https://issues.apache.org/jira/browse/KAFKA-9301
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.11.0.0
>Reporter: Tomoyuki Saito
>Priority: Major
>
> h2. ProducerBatch split makes ProducerBatch.produceFuture completed 
> KAFKA-3995 introduced ProducerBatch split and resend when 
> RecordTooLargeException happens on broker side. When ProducerBatch split 
> happens, ProducerBatch.produceFuture becomes completed, even though records 
> in a batch will be resent to a broker.
> h2. KafkaProducer#flush implementation
> With the current implementation, KafkaProducer#flush blocks until accumulated 
> ProducerBatches get completed. As described above, that does not ensure all 
> the sent records get completed.
> This issue is also mentioned in: https://github.com/apache/kafka/pull/6469



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-12-18 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999090#comment-16999090
 ] 

Levani Kokhreidze edited comment on KAFKA-6718 at 12/18/19 12:29 PM:
-

Thanks for the ideas [~vvcephei] will definitely look into it. My initial idea 
was around how Kubernetes [affinity rules|#affinity-and-anti-affinity]] work. 
It's similar to the elasticsearch mechanisms you have provided (tbh, I think 
most of the distributed systems share same idea behind rack awareness and 
standby tasks). I think with this task we have a possibility to let users 
specify something similar to affinity/anti-affinity rules based on some 
interface implementation which can have some default implementation out of the 
box (rack awareness maybe by default) but maybe can be extended to some other 
metrics? Would be amazing to have disk space awareness as well, but since disk 
space can't be static value, like rack.id it maybe challenging to implement it, 
but definitely it would be useful to have such feature.

Actually, required disk space can be estimated for standby tasks, so in theory 
it can be encoded (taking disk space as an example can be any other reasonable 
metric as well) to the assignment so interface that defines affinity rule, 
would get Standby tasks with additional encoded metrics and either accept the 
tasks, or reject them. If rejected, it will be routed to other Kafka Streams 
instance, until succeeds. If appropriate Kafka Streams instance won't be found 
that corresponds to `num.stanby.replicas` config, we can log warning like it's 
right now when num(Kafka Streams instance) is less than `num.stanby.replicas`. 
This is rough idea, which may not work at all since I've not looked in details 
how standby tasks work atm in Kafka Streams :) Sorry if this all doesn't make 
sense. But would definitely love to try to make this interface as extendable as 
possible so we won't be limited to only rack awareness.


was (Author: lkokhreidze):
Thanks for the ideas [~vvcephei] will definitely look into it. My initial idea 
was around how Kubernetes [affinity rules|#affinity-and-anti-affinity]] work. 
It's similar to the elasticsearch mechanisms you have provided (tbh, I think 
most of the distributed systems share same idea behind rack awareness and 
standby tasks). I think with this task we have a possibility to let users 
specify something similar to affinity/anti-affinity rules based on some 
interface implementation which can have some default implementation out of the 
box (rack awareness maybe by default) but maybe can be extended to some other 
metrics? Would be amazing to have disk space awareness as well, but since disk 
space can't be static value, like rack.id it maybe challenging to implement it, 
but definitely it would be useful to have such feature. What's your take on it?

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-12-18 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999090#comment-16999090
 ] 

Levani Kokhreidze edited comment on KAFKA-6718 at 12/18/19 12:13 PM:
-

Thanks for the ideas [~vvcephei] will definitely look into it. My initial idea 
was around how Kubernetes [affinity rules|#affinity-and-anti-affinity]] work. 
It's similar to the elasticsearch mechanisms you have provided (tbh, I think 
most of the distributed systems share same idea behind rack awareness and 
standby tasks). I think with this task we have a possibility to let users 
specify something similar to affinity/anti-affinity rules based on some 
interface implementation which can have some default implementation out of the 
box (rack awareness maybe by default) but maybe can be extended to some other 
metrics? Would be amazing to have disk space awareness as well, but since disk 
space can't be static value, like rack.id it maybe challenging to implement it, 
but definitely it would be useful to have such feature. What's your take on it?


was (Author: lkokhreidze):
Thanks for the ideas [~vvcephei] will definitely look into it. My initial idea 
was around how Kubernetes [affinity rules|#affinity-and-anti-affinity]] work. 
It's similar to the elasticsearch mechanisms you have provided (tbh, I think 
most of the distributed systems share same idea behind rack awareness and 
standby tasks). I think with this task we have a possibility to let users 
specify something similar to affinity/anti-affinity rules based on some 
interface implementation which can have some default implementation out of the 
box (rack awareness maybe by default) but maybe can be extended to some other 
metrics? Would be amazing to have disk space awareness as well, but since disk 
space can't be static value, like rack.id it maybe challenging to implement it, 
but definitely it would be useful to have such feature.

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-12-18 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999090#comment-16999090
 ] 

Levani Kokhreidze edited comment on KAFKA-6718 at 12/18/19 12:10 PM:
-

Thanks for the ideas [~vvcephei] will definitely look into it. My initial idea 
was around how Kubernetes [affinity rules|#affinity-and-anti-affinity]] work. 
It's similar to the elasticsearch mechanisms you have provided (tbh, I think 
most of the distributed systems share same idea behind rack awareness and 
standby tasks). I think with this task we have a possibility to let users 
specify something similar to affinity/anti-affinity rules based on some 
interface implementation which can have some default implementation out of the 
box (rack awareness maybe by default) but maybe can be extended to some other 
metrics? Would be amazing to have disk space awareness as well, but since disk 
space can't be static value, like rack.id it maybe challenging to implement it, 
but definitely it would be useful to have such feature.


was (Author: lkokhreidze):
Thanks for the ideas [~vvcephei] will definitely look into it. My initial idea 
was around how Kubernetes [affinity 
rules|[https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity]]
 work. It's similar to the elasticsearch mechanisms you have provided (tbh, I 
think most of the distributed systems share same idea behind rack awareness and 
standby tasks). I think with this task we have a possibility to let users 
specify something similar to affinity/anti-affinity rules based on some 
interface implementation which can have some default implementation out of the 
box (rack awareness maybe by default) but maybe can be extended to some other 
metrics? Like most probably you don't want to have standby task on the Kafka 
streams process were disk space is running low or CPU usage is high? Wdyt about 
this? From my experience working with Kafka streams and standby tasks, this is 
something i would really happy to have :)

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2019-12-18 Thread Levani Kokhreidze (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999090#comment-16999090
 ] 

Levani Kokhreidze commented on KAFKA-6718:
--

Thanks for the ideas [~vvcephei] will definitely look into it. My initial idea 
was around how Kubernetes [affinity 
rules|[https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity]]
 work. It's similar to the elasticsearch mechanisms you have provided (tbh, I 
think most of the distributed systems share same idea behind rack awareness and 
standby tasks). I think with this task we have a possibility to let users 
specify something similar to affinity/anti-affinity rules based on some 
interface implementation which can have some default implementation out of the 
box (rack awareness maybe by default) but maybe can be extended to some other 
metrics? Like most probably you don't want to have standby task on the Kafka 
streams process were disk space is running low or CPU usage is high? Wdyt about 
this? From my experience working with Kafka streams and standby tasks, this is 
something i would really happy to have :)

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Priority: Major
>  Labels: needs-kip
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9146) Add option to force delete members in stream reset tool

2019-12-18 Thread feyman (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998987#comment-16998987
 ] 

feyman commented on KAFKA-9146:
---

Thanks, [~mjsax] !

This is very helpful ! 

> Add option to force delete members in stream reset tool
> ---
>
> Key: KAFKA-9146
> URL: https://issues.apache.org/jira/browse/KAFKA-9146
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer, streams
>Reporter: Boyang Chen
>Assignee: feyman
>Priority: Major
>  Labels: newbie
>
> Sometimes people want to reset the stream application sooner, but blocked by 
> the left-over members inside group coordinator, which only expire after 
> session timeout. When user configures a really long session timeout, it could 
> prevent the group from clearing. We should consider adding the support to 
> cleanup members by forcing them to leave the group. To do that, 
>  # If the stream application is already on static membership, we could call 
> directly from adminClient.removeMembersFromGroup
>  # If the application is on dynamic membership, we should modify 
> adminClient.removeMembersFromGroup interface to allow deletion based on 
> member.id.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9308) Misses SAN after certificate creation

2019-12-18 Thread Agostino Sarubbo (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998932#comment-16998932
 ] 

Agostino Sarubbo commented on KAFKA-9308:
-

Hi,

the main purpose of this bug, as you said, was to update the documentation.
Anyway I fixed by myself (now the certificate shows the SAN) but I'm unable to 
connect without 'ssl.endpoint.identification.algorithm=' because of failed 
handshake. Do I need to open another bug for that?

 

> Misses SAN after certificate creation
> -
>
> Key: KAFKA-9308
> URL: https://issues.apache.org/jira/browse/KAFKA-9308
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.3.1
>Reporter: Agostino Sarubbo
>Priority: Minor
>
> Hello,
> I followed the documentation to use kafka with ssl, however the entire 
> 'procedure' loses at the end the specified SAN.
> To test, run (after the first keytool command and after the latest):
>  
> {code:java}
> keytool -list -v -keystore server.keystore.jks
> {code}
> Reference:
>  [http://kafka.apache.org/documentation.html#security_ssl]
>  
> {code:java}
> #!/bin/bash
> #Step 1
> keytool -keystore server.keystore.jks -alias localhost -validity 365 -keyalg 
> RSA -genkey -ext SAN=DNS:test.test.com
> #Step 2
> openssl req -new -x509 -keyout ca-key -out ca-cert -days 365
> keytool -keystore server.truststore.jks -alias CARoot -import -file ca-cert
> keytool -keystore client.truststore.jks -alias CARoot -import -file ca-cert
> #Step 3
> keytool -keystore server.keystore.jks -alias localhost -certreq -file 
> cert-file 
> openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file -out cert-signed 
> -days 365 -CAcreateserial -passin pass:test1234 
> keytool -keystore server.keystore.jks -alias CARoot -import -file ca-cert 
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}
>  
> In the detail, the SAN is losed after:
> {code:java}
> keytool -keystore server.keystore.jks -alias localhost -import -file 
> cert-signed
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9243) Update the javadocs from KeyValueStore to TimestampKeyValueStore

2019-12-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998928#comment-16998928
 ] 

ASF GitHub Bot commented on KAFKA-9243:
---

miroswan commented on pull request #7848: KAFKA-9243 Update the javadocs for 
KTable.java
URL: https://github.com/apache/kafka/pull/7848
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update the javadocs from KeyValueStore to TimestampKeyValueStore
> 
>
> Key: KAFKA-9243
> URL: https://issues.apache.org/jira/browse/KAFKA-9243
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 2.3.0
>Reporter: Walker Carlson
>Assignee: Demitri Swan
>Priority: Minor
>  Labels: beginner, newbie
>
> As of version 2.3, the DSL uses `TimestampedStores` to represent KTables. 
> However, the JavaDocs of all table-related operators still refer to plain 
> `KeyValueStores` etc instead of `TimestampedKeyValueStore` etc. Hence, all 
> those JavaDocs should be updated (the JavaDocs are technically not incorrect, 
> because one can access a TimestampedKeyValueStore as a KeyValueStore, too – 
> hence this ticket is not a "bug" but an improvement.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-9304) Image on Kafka docs shows incorrect message ID segments

2019-12-18 Thread Victoria Bialas (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998908#comment-16998908
 ] 

Victoria Bialas commented on KAFKA-9304:


Okay, thanks [~mjsax] I'll try to catch up with [~guozhang] tomorrow.

> Image on Kafka docs shows incorrect message ID segments
> ---
>
> Key: KAFKA-9304
> URL: https://issues.apache.org/jira/browse/KAFKA-9304
> Project: Kafka
>  Issue Type: Bug
>Reporter: Victoria Bialas
>Assignee: Victoria Bialas
>Priority: Minor
>
>  
> Docs page: [https://kafka.apache.org/documentation/#log]
> Link to Tweet: [https://twitter.com/Preety48408391/status/1205764249995202560]
> Hi Kafka team, looks like there is issue with below depicting image on Kafka 
> documentation section 5.4. In 2nd segment 82xx.kafka, Message IDs are 
> incorrectly mentioned. Message should start from 82xx but starting from 34xx 
> like in 1st segment. Please correct.
> [~guozhang] , [~mjsax] , this came through on a Tweet, I'll try to fix on the 
> docs. May need some guidance from you, as the problem description isn't 
> completely clear to me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)