Re: Please add my wiki username contributor permission list

2016-12-30 Thread Aarti Gupta
Thanks Ewen!

I was able to edit the page and add two new KIPs

Aarti

On Fri, Dec 30, 2016 at 12:47 PM, Ewen Cheslack-Postava 
wrote:

> I've granted you permissions, you should now be able to edit the main KIP
> page and add a new KIP.
>
> -Ewen
>
> On Fri, Dec 30, 2016 at 2:58 PM, Aarti Gupta 
> wrote:
>
> > Hey devs,
> >
> > My jira username is aartigupta
> > and my wiki username happens to be  aartiguptaa
> > (not sure how that happened)
> >
> > I already have JIRA access, but no permissions to edit the KIP wiki.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/
> > Kafka+Improvement+Proposals
> >
> > Can someone grant my username (aartiguptaa) permissions to edit the
> > confluence wiki ?
> >
> > Thanks!
> > aarti
> >
>


[DISCUSS] KIP-105: Addition of Record Level for Sensors

2016-12-30 Thread Aarti Gupta
Hi all,

I would like to start the discussion on KIP-105: Addition of Record Level
for Sensors
*https://cwiki.apache.org/confluence/pages/viewpage.action?
*
*pageId=67636483*

Looking forward to your feedback.

Thanks,
Aarti and Eno


[DISCUSS] KIP-104: Granular Sensors for Streams

2016-12-30 Thread Aarti Gupta
Hi all,

I would like to start the discussion on KIP-104: Granular Sensors for
Streams


*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67636480
*

Looking forward to your feedback.

Thanks,
Aarti and Eno


[jira] [Commented] (KAFKA-4519) Delete old unused branches in git repo

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788849#comment-15788849
 ] 

Ewen Cheslack-Postava commented on KAFKA-4519:
--

There are probably more of these we should retire. Besides feature branches 
like {{consumer_redesign}} and {{transactional_messaging}}, there are a ton of 
version branches that we should retire, e.g. {{0.7.0}}, {{0.7.1}}, {{0.7.2}}, 
etc. In fact, except for 2 or 3 branches, almost all branches should probably 
be retired as they are no longer active. We've recently started to follow a bit 
more strict policy re: tagging RCs and releases; we might want to also come up 
with a policy for retiring & tagging the final version of branches.

> Delete old unused branches in git repo
> --
>
> Key: KAFKA-4519
> URL: https://issues.apache.org/jira/browse/KAFKA-4519
> Project: Kafka
>  Issue Type: Task
>Reporter: Jeff Widman
>Priority: Trivial
>
> Delete these old git branches, as they're quite outdated and not relevant for 
> various version branches:
> * consumer_redesign
> * transactional_messaging
> * 0.8.0-beta1-candidate1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-4517:
-
Priority: Blocker  (was: Minor)

> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788842#comment-15788842
 ] 

Ewen Cheslack-Postava commented on KAFKA-4517:
--

Upgraded to blocker for 0.11.0.0 so we don't lose track of removing this in the 
future.

> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Priority: Blocker
> Fix For: 0.11.0.0
>
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4517) Remove kafka-consumer-offset-checker.sh script since already deprecated in Kafka 9

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-4517:
-
Fix Version/s: 0.11.0.0

> Remove kafka-consumer-offset-checker.sh script since already deprecated in 
> Kafka 9
> --
>
> Key: KAFKA-4517
> URL: https://issues.apache.org/jira/browse/KAFKA-4517
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 0.10.1.0, 0.10.0.0, 0.10.0.1
>Reporter: Jeff Widman
>Priority: Minor
> Fix For: 0.11.0.0
>
>
> Kafka 9 deprecated kafka-consumer-offset-checker.sh 
> (kafka.tools.ConsumerOffsetChecker) in favor of kafka-consumer-groups.sh 
> (kafka.admin.ConsumerGroupCommand). 
> Since this was deprecated in 9, and the full functionality of the old script 
> appears to be available in the new script, can we remove the old shell script 
> in 10? 
> From an Ops perspective, it's confusing when I'm trying to check consumer 
> offsets that I open the bin directory, and see a script that seems to do 
> exactly what I want, only to later discover that I'm not supposed to use it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4530) cant stop kafka server

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-4530.
--
Resolution: Duplicate
  Assignee: Ewen Cheslack-Postava

> cant stop kafka server
> --
>
> Key: KAFKA-4530
> URL: https://issues.apache.org/jira/browse/KAFKA-4530
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Affects Versions: 0.10.1.0, 0.10.0.1
>Reporter: xin
>Assignee: Ewen Cheslack-Postava
>
> ps command cant find kafka.Kafka,
> start server clath path too long to "kafka.Kafka" be found ??
> 183-manager:/home/xxx/kafka_2.10-0.10.0.1/bin # ./kafka-server-stop.sh 
> No kafka server to stop
> 183-manager:/home/xxx/kafka_2.10-0.10.0.1/bin # ps -ef|grep kafka
> root 28517 77538  7 16:07 pts/600:00:03 /usr/java/jdk/bin/java -Xmx1G 
> -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 
> -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC 
> -Djava.awt.headless=true 
> -Xloggc:/home/xxx/kafka_2.10-0.10.0.1/bin/../logs/kafkaServer-gc.log 
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps 
> -Dcom.sun.management.jmxremote 
> -Dcom.sun.management.jmxremote.authenticate=false 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dkafka.logs.dir=/home/xxx/kafka_2.10-0.10.0.1/bin/../logs 
> -Dlog4j.configuration=file:./../config/log4j.properties -cp 
> 

[jira] [Commented] (KAFKA-4531) Rationalise client configuration validation

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788707#comment-15788707
 ] 

Ewen Cheslack-Postava commented on KAFKA-4531:
--

Could we bake this into {{ConfigDef}} that already has a {{validate()}} method? 
And maybe tie it into {{AbstractConfig}} so we always do this cross-field 
validation?

> Rationalise client configuration validation 
> 
>
> Key: KAFKA-4531
> URL: https://issues.apache.org/jira/browse/KAFKA-4531
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Edoardo Comar
>Assignee: Vahid Hashemian
>
> The broker-side configuration has a {{validateValues()}} method that could be 
> introduced also in the client-side {{ProducerConfig}} and {{ConsumerConfig}} 
> classes.
> The rationale is to centralise constraints between values, like e.g. this one 
> currently in the {{KafkaConsumer}} constructor:
> {code}
> if (this.requestTimeoutMs <= sessionTimeOutMs || 
> this.requestTimeoutMs <= fetchMaxWaitMs)
> throw new 
> ConfigException(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG + " should be 
> greater than " + ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG + " and " + 
> ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG);
> {code}
> or custom validation of the provided values, e.g. this one in the 
> {{KafkaProducer}} :
> {code}
> private static int parseAcks(String acksString) {
> try {
> return acksString.trim().equalsIgnoreCase("all") ? -1 : 
> Integer.parseInt(acksString.trim());
> } catch (NumberFormatException e) {
> throw new ConfigException("Invalid configuration value for 
> 'acks': " + acksString);
> }
> }
> {code}
> also some new KIPs, e.g. KIP-81 propose constraints among different values,
> so it would be good not to scatter them around.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission

2016-12-30 Thread Jeff Widman (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788686#comment-15788686
 ] 

Jeff Widman commented on KAFKA-3135:


It's not currently a critical issue for my company. Typically when we're 
considering upgrading we look at outstanding bugs to evaluate whether to 
upgrade or wait, so I just wanted the tags to be corrected. Thanks [~ewencp] 
for handling.


> Unexpected delay before fetch response transmission
> ---
>
> Key: KAFKA-3135
> URL: https://issues.apache.org/jira/browse/KAFKA-3135
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> From the user list, Krzysztof Ciesielski reports the following:
> {quote}
> Scenario description:
> First, a producer writes 50 elements into a topic
> Then, a consumer starts to read, polling in a loop.
> When "max.partition.fetch.bytes" is set to a relatively small value, each
> "consumer.poll()" returns a batch of messages.
> If this value is left as default, the output tends to look like this:
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> As we can see, there are weird "gaps" when poll returns 0 elements for some
> time. What is the reason for that? Maybe there are some good practices
> about setting "max.partition.fetch.bytes" which I don't follow?
> {quote}
> The gist to reproduce this problem is here: 
> https://gist.github.com/kciesielski/054bb4359a318aa17561.
> After some initial investigation, the delay appears to be in the server's 
> networking layer. Basically I see a delay of 5 seconds from the time that 
> Selector.send() is invoked in SocketServer.Processor with the fetch response 
> to the time that the send is completed. Using netstat in the middle of the 
> delay shows the following output:
> {code}
> tcp4   0  0  10.191.0.30.55455  10.191.0.30.9092   ESTABLISHED
> tcp4   0 102400  10.191.0.30.9092   10.191.0.30.55454  ESTABLISHED
> {code}
> From this, it looks like the data reaches the send buffer, but needs to be 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3856) Move inner classes accessible only functions in TopologyBuilder out of public APIs

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3856:
-
Assignee: Matthias J. Sax  (was: Jeyhun Karimov)

> Move inner classes accessible only functions in TopologyBuilder out of public 
> APIs
> --
>
> Key: KAFKA-3856
> URL: https://issues.apache.org/jira/browse/KAFKA-3856
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>  Labels: api
>
> In {{TopologyBuilder}} there are a couple of public functions that are 
> actually only used in the internal classes such as StreamThread and 
> StreamPartitionAssignor, and some accessible only in high-level DSL inner 
> classes, examples include {{addInternalTopic}}, {{sourceGroups}} and 
> {{copartitionGroups}}, etc. But they are still listed in Javadocs since this 
> class is part of public APIs.
> We should think about moving them out of the public functions. Unfortunately 
> there is no "friend" access mode as in C++, so we need to think of another 
> way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3856) Move inner classes accessible only functions in TopologyBuilder out of public APIs

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788683#comment-15788683
 ] 

Ewen Cheslack-Postava commented on KAFKA-3856:
--

[~mjsax] 6mo seems long enough. I'll reassign to you, but if [~jeyhunkarimov] 
wants to chime back in and take over he can just respond and we'll figure out 
how to proceed.

> Move inner classes accessible only functions in TopologyBuilder out of public 
> APIs
> --
>
> Key: KAFKA-3856
> URL: https://issues.apache.org/jira/browse/KAFKA-3856
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Jeyhun Karimov
>  Labels: api
>
> In {{TopologyBuilder}} there are a couple of public functions that are 
> actually only used in the internal classes such as StreamThread and 
> StreamPartitionAssignor, and some accessible only in high-level DSL inner 
> classes, examples include {{addInternalTopic}}, {{sourceGroups}} and 
> {{copartitionGroups}}, etc. But they are still listed in Javadocs since this 
> class is part of public APIs.
> We should think about moving them out of the public functions. Unfortunately 
> there is no "friend" access mode as in C++, so we need to think of another 
> way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788679#comment-15788679
 ] 

Ewen Cheslack-Postava commented on KAFKA-3135:
--

Also, /cc [~hachikuji] since he was the one that made the most progress on this.

> Unexpected delay before fetch response transmission
> ---
>
> Key: KAFKA-3135
> URL: https://issues.apache.org/jira/browse/KAFKA-3135
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> From the user list, Krzysztof Ciesielski reports the following:
> {quote}
> Scenario description:
> First, a producer writes 50 elements into a topic
> Then, a consumer starts to read, polling in a loop.
> When "max.partition.fetch.bytes" is set to a relatively small value, each
> "consumer.poll()" returns a batch of messages.
> If this value is left as default, the output tends to look like this:
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> As we can see, there are weird "gaps" when poll returns 0 elements for some
> time. What is the reason for that? Maybe there are some good practices
> about setting "max.partition.fetch.bytes" which I don't follow?
> {quote}
> The gist to reproduce this problem is here: 
> https://gist.github.com/kciesielski/054bb4359a318aa17561.
> After some initial investigation, the delay appears to be in the server's 
> networking layer. Basically I see a delay of 5 seconds from the time that 
> Selector.send() is invoked in SocketServer.Processor with the fetch response 
> to the time that the send is completed. Using netstat in the middle of the 
> delay shows the following output:
> {code}
> tcp4   0  0  10.191.0.30.55455  10.191.0.30.9092   ESTABLISHED
> tcp4   0 102400  10.191.0.30.9092   10.191.0.30.55454  ESTABLISHED
> {code}
> From this, it looks like the data reaches the send buffer, but needs to be 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3135) Unexpected delay before fetch response transmission

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788677#comment-15788677
 ] 

Ewen Cheslack-Postava commented on KAFKA-3135:
--

[~jeffwidman] I've updated the affected versions, and I'll track this for the 
0.10.2.0 release. Do we have an idea of how critical this issue is? It seems 
like it should be hit fairly commonly (based on ideas that it's due to TCP 
persist timer issues), but we've seen few comments since Mar/Apr of earlier 
this year. Is this still a critical issue or should we downgrade it to 
something that would be a good improvement and make it a 
major/minor/nice-to-have?

> Unexpected delay before fetch response transmission
> ---
>
> Key: KAFKA-3135
> URL: https://issues.apache.org/jira/browse/KAFKA-3135
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> From the user list, Krzysztof Ciesielski reports the following:
> {quote}
> Scenario description:
> First, a producer writes 50 elements into a topic
> Then, a consumer starts to read, polling in a loop.
> When "max.partition.fetch.bytes" is set to a relatively small value, each
> "consumer.poll()" returns a batch of messages.
> If this value is left as default, the output tends to look like this:
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> As we can see, there are weird "gaps" when poll returns 0 elements for some
> time. What is the reason for that? Maybe there are some good practices
> about setting "max.partition.fetch.bytes" which I don't follow?
> {quote}
> The gist to reproduce this problem is here: 
> https://gist.github.com/kciesielski/054bb4359a318aa17561.
> After some initial investigation, the delay appears to be in the server's 
> networking layer. Basically I see a delay of 5 seconds from the time that 
> Selector.send() is invoked in SocketServer.Processor with the fetch response 
> to the time that the send is completed. Using netstat in the middle of the 
> delay shows the following output:
> {code}
> tcp4   0  0  10.191.0.30.55455  10.191.0.30.9092   ESTABLISHED
> tcp4   0 102400  10.191.0.30.9092   10.191.0.30.55454  ESTABLISHED
> {code}
> From this, it looks like the data reaches the send buffer, but needs to be 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3135) Unexpected delay before fetch response transmission

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava updated KAFKA-3135:
-
Affects Version/s: 0.10.1.0
   0.9.0.1
   0.10.0.0
   0.10.0.1
   0.10.1.1

> Unexpected delay before fetch response transmission
> ---
>
> Key: KAFKA-3135
> URL: https://issues.apache.org/jira/browse/KAFKA-3135
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0, 0.10.1.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.1
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Critical
> Fix For: 0.10.2.0
>
>
> From the user list, Krzysztof Ciesielski reports the following:
> {quote}
> Scenario description:
> First, a producer writes 50 elements into a topic
> Then, a consumer starts to read, polling in a loop.
> When "max.partition.fetch.bytes" is set to a relatively small value, each
> "consumer.poll()" returns a batch of messages.
> If this value is left as default, the output tends to look like this:
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 0 elements
> Poll returned 13793 elements
> Poll returned 13793 elements
> As we can see, there are weird "gaps" when poll returns 0 elements for some
> time. What is the reason for that? Maybe there are some good practices
> about setting "max.partition.fetch.bytes" which I don't follow?
> {quote}
> The gist to reproduce this problem is here: 
> https://gist.github.com/kciesielski/054bb4359a318aa17561.
> After some initial investigation, the delay appears to be in the server's 
> networking layer. Basically I see a delay of 5 seconds from the time that 
> Selector.send() is invoked in SocketServer.Processor with the fetch response 
> to the time that the send is completed. Using netstat in the middle of the 
> delay shows the following output:
> {code}
> tcp4   0  0  10.191.0.30.55455  10.191.0.30.9092   ESTABLISHED
> tcp4   0 102400  10.191.0.30.9092   10.191.0.30.55454  ESTABLISHED
> {code}
> From this, it looks like the data reaches the send buffer, but needs to be 
> flushed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3539) KafkaProducer.send() may block even though it returns the Future

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788664#comment-15788664
 ] 

Ewen Cheslack-Postava commented on KAFKA-3539:
--

http://mail-archives.apache.org/mod_mbox/kafka-users/201604.mbox/%3C01978027-5580-4AAB-A254-3585870FBD69%40hortonworks.com%3E
 is the original message that's the source of this issue.

> KafkaProducer.send() may block even though it returns the Future
> 
>
> Key: KAFKA-3539
> URL: https://issues.apache.org/jira/browse/KAFKA-3539
> Project: Kafka
>  Issue Type: Bug
>Reporter: Oleg Zhurakousky
>Assignee: Manikumar Reddy
>Priority: Critical
>
> You can get more details from the us...@kafka.apache.org by searching on the 
> thread with the subject "KafkaProducer block on send".
> The bottom line is that method that returns Future must never block, since it 
> essentially violates the Future contract as it was specifically designed to 
> return immediately passing control back to the user to check for completion, 
> cancel etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4550) current trunk unstable

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788627#comment-15788627
 ] 

Ewen Cheslack-Postava commented on KAFKA-4550:
--

One of the failures reported in this issue is already being addressed in 
KAFKA-4528

> current trunk unstable
> --
>
> Key: KAFKA-4550
> URL: https://issues.apache.org/jira/browse/KAFKA-4550
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.10.2.0
>Reporter: radai rosenblatt
> Attachments: run1.log, run2.log, run3.log, run4.log, run5.log
>
>
> on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9)
> when running the exact same build 5 times, I get:
> 3 failures (on 3 separate runs):
>kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: 
> No request is complete
>org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> queryOnRebalance[1] FAILED java.lang.AssertionError: Condition not met within 
> timeout 6. Did not receive 1 number of records
>kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout 
> FAILED java.lang.AssertionError: Message set should have 1 message
> 1 success
> 1 stall (build hung)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4550) current trunk unstable

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788626#comment-15788626
 ] 

Ewen Cheslack-Postava commented on KAFKA-4550:
--

Seems like the second issue is still unresolved and reported as KAFKA-4528, so 
we should defer to that more specific issue. First issue had a JIRA associated 
with it, but it has already been resolved, so it seems this may be a new issue.

> current trunk unstable
> --
>
> Key: KAFKA-4550
> URL: https://issues.apache.org/jira/browse/KAFKA-4550
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.10.2.0
>Reporter: radai rosenblatt
> Attachments: run1.log, run2.log, run3.log, run4.log, run5.log
>
>
> on latest trunk (commit hash 908b6d1148df963d21a70aaa73a7a87571b965a9)
> when running the exact same build 5 times, I get:
> 3 failures (on 3 separate runs):
>kafka.api.SslProducerSendTest > testFlush FAILED java.lang.AssertionError: 
> No request is complete
>org.apache.kafka.streams.integration.QueryableStateIntegrationTest > 
> queryOnRebalance[1] FAILED java.lang.AssertionError: Condition not met within 
> timeout 6. Did not receive 1 number of records
>kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout 
> FAILED java.lang.AssertionError: Message set should have 1 message
> 1 success
> 1 stall (build hung)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4413) Kakfa should support default SSLContext

2016-12-30 Thread Wenjie Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenjie Zhang resolved KAFKA-4413.
-
Resolution: Invalid

It seems like Kafka does support it, closing this ticket as invalid. 

> Kakfa should support default SSLContext
> ---
>
> Key: KAFKA-4413
> URL: https://issues.apache.org/jira/browse/KAFKA-4413
> Project: Kafka
>  Issue Type: Improvement
>  Components: security
>Affects Versions: 0.10.0.1
> Environment: All
>Reporter: Wenjie Zhang
>  Labels: SSLContext, SslFactory, https, ssl
>
> Currently, to enable SSL in either consumer or producer, we have to provide 
> trustStore file and password. Ideally, if the Kafka server configured with CA 
> signed certificate, since JRE includes certain CA ROOT certs inside 
> "cacerts", Kafka should support SSL without any trustStore file, basically, 
> we should update 
> `org.apache.kafka.common.security.ssl.SslFactory.createSSLContext` to use 
> `SSLContext.getDefault()` when trustStore file is not needed, not sure if 
> there is any other places needs to be updated for this enhancement 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4556) unordered messages when multiple topics are combined in single topic through stream

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788347#comment-15788347
 ] 

Ewen Cheslack-Postava commented on KAFKA-4556:
--

Can you be more specific about what your streams topology looks like and is 
doing? First, timestamps are not strictly guaranteed to be monotonically 
increasing (especially when specified by the producer). But more importantly, 
ordering in Kafka is only guaranteed within a partition. In the output topic, 
are you writing the messages using keys such that the messages would end up on 
the same partition and preserve ordering? If you aren't, then they will be 
produced to different partitions and it is expected that downstream consumers 
might see them in a different order. (Also note that to preserve ordering in 
the face of errors, you'd also need to make sure your producer uses settings to 
avoid reordering due to errors, i.e. max.in.flight.requests.per.connection=1, 
acks=all, retries=infinite.)

> unordered messages when multiple topics are combined in single topic through 
> stream
> ---
>
> Key: KAFKA-4556
> URL: https://issues.apache.org/jira/browse/KAFKA-4556
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, producer , streams
>Affects Versions: 0.10.0.1
>Reporter: Savdeep Singh
>
> When binding builder with multiple topics, single resultant topic has 
> unordered set of messages.
> This issue is at millisecond level. When messages with same milisecond level 
> are added in topics.
> Scenario :  (1 producer : p1 , 2 topics t1 and t2, streams pick form these 2 
> topics and save in resulting t3 topic, 4 partitions of t3 and 4 consumers of 
> 4 partitions )
> Case: When p1 adds messages with same millisecond timestamp into t1 and t2 . 
> Stream combine and form t3. When this t3 is consumed by consumer, it has 
> different order of same millisecond messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Please add my wiki username contributor permission list

2016-12-30 Thread Ewen Cheslack-Postava
I've granted you permissions, you should now be able to edit the main KIP
page and add a new KIP.

-Ewen

On Fri, Dec 30, 2016 at 2:58 PM, Aarti Gupta  wrote:

> Hey devs,
>
> My jira username is aartigupta
> and my wiki username happens to be  aartiguptaa
> (not sure how that happened)
>
> I already have JIRA access, but no permissions to edit the KIP wiki.
>
> https://cwiki.apache.org/confluence/display/KAFKA/
> Kafka+Improvement+Proposals
>
> Can someone grant my username (aartiguptaa) permissions to edit the
> confluence wiki ?
>
> Thanks!
> aarti
>


Re: [DISCUSS] KIP-66 Kafka Connect Transformers for messages

2016-12-30 Thread Ewen Cheslack-Postava
On Thu, Dec 15, 2016 at 7:41 PM, Shikhar Bhushan 
wrote:

> There is no decision being proposed on the final list of transformations
> that will ever be in Kafka :-) Just the initial set we should roll with.
>

I'd second this comment as well. I'm very wary of the slippery slope, which
is why I wasn't in favor of including any connectors except for very simple
demos.

But it might be useful to have some initial guidelines, and might even make
sense to include them in the KIP so they are easy for others to find. I
think both the examples Gwen gave are easily excluded with a simple rule:
SMTs that are shipped with Kafka should be general enough to apply to many
data sources & serialization formats. email is a very specific type of data
(email headers and HL7 are pretty similar) and Avro is a specific
serialization format where, presumably, the Connect data type you'd have to
receive to do this transformation is just a byte array of the original Avro
file. In contrast, the included transformations in the current KIP are
*really* broadly applicable; apart from timestamps, I think they pretty
much all could potentially be applied to *any* stream of data.

I think the more interesting cases that we'll probably end up debating are
around serialization formats that "fit" within other connectors, in
particular I'm thinking of CSV and line-oriented JSON parsing. Individual
connectors may avoid this (or not be aware that the data has this
structure), but users will want that type of transformation to be easy and
baked in.

-Ewen


>
> On Thu, Dec 15, 2016 at 3:34 PM Gwen Shapira  wrote:
>
> You are absolutely right that the vast majority of NiFi's processors are
> not what we would consider SMT.
>
> I went over the list and I think the still contain just short of 50 legit
> SMTs:
> https://cwiki.apache.org/confluence/display/KAFKA/Analyzing+
> NiFi+Transformations
>
> You are right that ExtractHL7 is an extreme that clearly doesn't belong in
> Apache Kafka, but just before that we have ExtractAvroMetadata that may
> fit? and ExtractEmailHeaders doesn't sound totally outlandish either...
>
> Nothing in the baked-in list by Shikhar looks out of place. I am concerned
> about slipperly slope. Or the arbitrariness of the decision if we say that
> this list is final and nothing else will ever make it into Kafka.
>
> Gwen
>
> On Thu, Dec 15, 2016 at 3:00 PM, Ewen Cheslack-Postava 
> wrote:
>
> > I think there are a couple of factors that make transformations and
> > connectors different.
> >
> > First, NiFi's 150 processors is a bit misleading. In NiFi, processors
> cover
> > data sources, data sinks, serialization/deserialization, *and*
> > transformations. I haven't filtered the list to see how many fall into
> the
> > first 3 categories, but it's a *lot* of the processors they have.
> >
> > Second, since transformations only apply to a single message and I'd
> think
> > they generally shouldn't be interacting with external services (i.e. I
> > think trying to do enrichment in SMT is probably a bad idea), the scope
> of
> > possible transformations is reasonably limited and the transformations
> > themselves tend to be small and easily maintainable. I think this is a
> > dramatic difference from connectors, which are each substantial projects
> in
> > their own right.
> >
> > While I get the slippery slope argument re: including specific
> > transformations, I think we can come up with a reasonable policy (and via
> > KIPs we can, as a community, come to an agreement based purely on taste
> if
> > it comes down to that). In particular, I'd say keep the core general
> (i.e.
> > no domain-specific transformations/parsing like HL7), pure data
> > manipulation (i.e. no enrichment), and nothing that could just as well be
> > done as a converter/serializer/deserializer/source connector/sink
> > connector.
> >
> > I was very staunchly against including connectors (aside from a simple
> > example) directly in Kafka, so this may seem like a reversal of position.
> > But I think the % of use cases covered will look very different between
> > connectors and transformations. Sure, some connectors are very popular,
> and
> > moreso right now because they are the most thoroughly developed, tested,
> > etc. But the top 3 most common transformations will probably be used
> across
> > all the top 20 most popular connectors. I have no doubt people will end
> up
> > writing custom ones (which is why it's nice to make them pluggable rather
> > than choosing a fixed set), but they'll either be very niche (like people
> > write custom connectors for their internal systems) or be more broadly
> > applicable but very domain specific such that they are easy to reject for
> > inclusion.
> >
> > @Gwen if we filtered the list of NiFi processors to ones that fit that
> > criteria, would that still be too long a list for your taste? Similarly,
> > let's say we were going to include some 

[jira] [Commented] (KAFKA-4572) Kafka connect for windows is missing

2016-12-30 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788258#comment-15788258
 ] 

Vahid Hashemian commented on KAFKA-4572:


[~ewencp] I'll take a closer look. Thanks.

> Kafka connect for windows is missing
> 
>
> Key: KAFKA-4572
> URL: https://issues.apache.org/jira/browse/KAFKA-4572
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.1
> Environment: Windows.
>Reporter: Aravind Krishnan
>Assignee: Ewen Cheslack-Postava
>
> Unable to find the connect-standalone for windows. When created as below
> IF [%1] EQU [] (
>   echo USAGE: %0 connect-standalone.properties
>   EXIT /B 1
> )
> SetLocal
> rem Using pushd popd to set BASE_DIR to the absolute path
> pushd %~dp0..\..
> set BASE_DIR=%CD%
> popd
> rem Log4j settings
> IF ["%KAFKA_LOG4J_OPTS%"] EQU [""] (
>   set 
> KAFKA_LOG4J_OPTS=-Dlog4j.configuration=file:%BASE_DIR%/config/tools-log4j.properties
> )
> %~dp0kafka-run-class.bat org.apache.kafka.connect.cli.ConnectStandalone %*
> EndLocal
> Not able to see any of the below logs in FileStreamSourceTask
>   log.trace("Read {} bytes from {}", nread, logFilename());
> (OR)
> log.trace("Read a line from {}", logFilename());



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Please add my wiki username contributor permission list

2016-12-30 Thread Aarti Gupta
Hey devs,

My jira username is aartigupta
and my wiki username happens to be  aartiguptaa
(not sure how that happened)

I already have JIRA access, but no permissions to edit the KIP wiki.

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

Can someone grant my username (aartiguptaa) permissions to edit the
confluence wiki ?

Thanks!
aarti


[jira] [Commented] (KAFKA-4572) Kafka connect for windows is missing

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788234#comment-15788234
 ] 

Ewen Cheslack-Postava commented on KAFKA-4572:
--

[~vahid] The version he included looks basically the same as the one checked 
in, so perhaps there's still some issue with log4j paths?

> Kafka connect for windows is missing
> 
>
> Key: KAFKA-4572
> URL: https://issues.apache.org/jira/browse/KAFKA-4572
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.1
> Environment: Windows.
>Reporter: Aravind Krishnan
>Assignee: Ewen Cheslack-Postava
>
> Unable to find the connect-standalone for windows. When created as below
> IF [%1] EQU [] (
>   echo USAGE: %0 connect-standalone.properties
>   EXIT /B 1
> )
> SetLocal
> rem Using pushd popd to set BASE_DIR to the absolute path
> pushd %~dp0..\..
> set BASE_DIR=%CD%
> popd
> rem Log4j settings
> IF ["%KAFKA_LOG4J_OPTS%"] EQU [""] (
>   set 
> KAFKA_LOG4J_OPTS=-Dlog4j.configuration=file:%BASE_DIR%/config/tools-log4j.properties
> )
> %~dp0kafka-run-class.bat org.apache.kafka.connect.cli.ConnectStandalone %*
> EndLocal
> Not able to see any of the below logs in FileStreamSourceTask
>   log.trace("Read {} bytes from {}", nread, logFilename());
> (OR)
> log.trace("Read a line from {}", logFilename());



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3297) More optimally balanced partition assignment strategy (new consumer)

2016-12-30 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788229#comment-15788229
 ] 

Ewen Cheslack-Postava commented on KAFKA-3297:
--

Yeah, part of the improvement of KIP-49 seems to be included in step 2 of the 
algorithm (start from the least loaded consumers rather than just using a 
CircularIterator). But there are a few more improvements (specifically in 
optimizing the order of processing the topics) that would still probably need 
to be incorporated.

> More optimally balanced partition assignment strategy (new consumer)
> 
>
> Key: KAFKA-3297
> URL: https://issues.apache.org/jira/browse/KAFKA-3297
> Project: Kafka
>  Issue Type: Improvement
>  Components: consumer
>Reporter: Andrew Olson
>Assignee: Andrew Olson
> Fix For: 0.10.2.0
>
>
> While the roundrobin partition assignment strategy is an improvement over the 
> range strategy, when the consumer topic subscriptions are not identical 
> (previously disallowed but will be possible as of KAFKA-2172) it can produce 
> heavily skewed assignments. As suggested 
> [here|https://issues.apache.org/jira/browse/KAFKA-2172?focusedCommentId=14530767=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14530767]
>  it would be nice to have a strategy that attempts to assign an equal number 
> of partitions to each consumer in a group, regardless of how similar their 
> individual topic subscriptions are. We can accomplish this by tracking the 
> number of partitions assigned to each consumer, and having the partition 
> assignment loop assign each partition to a consumer interested in that topic 
> with the least number of partitions assigned. 
> Additionally, we can optimize the distribution fairness by adjusting the 
> partition assignment order:
> * Topics with fewer consumers are assigned first.
> * In the event of a tie for least consumers, the topic with more partitions 
> is assigned first.
> The general idea behind these two rules is to keep the most flexible 
> assignment choices available as long as possible by starting with the most 
> constrained partitions/consumers.
> This JIRA addresses the new consumer. For the original high-level consumer, 
> see KAFKA-2435.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4577) NPE in ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers

2016-12-30 Thread Scott Reynolds (JIRA)
Scott Reynolds created KAFKA-4577:
-

 Summary: NPE in 
ControllerChannelManager.scala::addUpdateMetadataRequestForBrokers 
 Key: KAFKA-4577
 URL: https://issues.apache.org/jira/browse/KAFKA-4577
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.1.0
Reporter: Scott Reynolds


Seems as if either deleteTopicManager or deleteTopicManager. 
partitionsToBeDeleted wasn't set ?
{code}
java.lang.NullPointerException
at 
kafka.controller.ControllerBrokerRequestBatch.addUpdateMetadataRequestForBrokers
 (ControllerChannelManager.scala:331)
at kafka.controller.KafkaController.sendUpdateMetadataRequest 
(KafkaController.scala:1023)
at 
kafka.controller.IsrChangeNotificationListener.kafka$controller$IsrChangeNotificationListener$$processUpdateNotifications
 (KafkaController.scala:1371)
at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply$mcV$sp
 (KafkaController.scala:1358)
at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
 (KafkaController.scala:1351)
at 
kafka.controller.IsrChangeNotificationListener$$anonfun$handleChildChange$1.apply
 (KafkaController.scala:1351)
at kafka.utils.CoreUtils$.inLock (CoreUtils.scala:234)
at kafka.controller.IsrChangeNotificationListener.handleChildChange 
(KafkaController.scala:1351)
at org.I0Itec.zkclient.ZkClient$10.run (ZkClient.java:843)
at org.I0Itec.zkclient.ZkEventThread.run (ZkEventThread.java:71)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4572) Kafka connect for windows is missing

2016-12-30 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788100#comment-15788100
 ] 

Vahid Hashemian commented on KAFKA-4572:


[~aravindk47] This is already addressed in 
[KAFKA-4272|https://issues.apache.org/jira/browse/KAFKA-4272]. The scripts now 
exist in trunk 
([here|https://github.com/apache/kafka/blob/trunk/bin/windows/connect-distributed.bat]
 and 
[here|https://github.com/apache/kafka/blob/trunk/bin/windows/connect-standalone.bat]).
 Should we mark this as a duplicate?

> Kafka connect for windows is missing
> 
>
> Key: KAFKA-4572
> URL: https://issues.apache.org/jira/browse/KAFKA-4572
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.10.1.1
> Environment: Windows.
>Reporter: Aravind Krishnan
>Assignee: Ewen Cheslack-Postava
>
> Unable to find the connect-standalone for windows. When created as below
> IF [%1] EQU [] (
>   echo USAGE: %0 connect-standalone.properties
>   EXIT /B 1
> )
> SetLocal
> rem Using pushd popd to set BASE_DIR to the absolute path
> pushd %~dp0..\..
> set BASE_DIR=%CD%
> popd
> rem Log4j settings
> IF ["%KAFKA_LOG4J_OPTS%"] EQU [""] (
>   set 
> KAFKA_LOG4J_OPTS=-Dlog4j.configuration=file:%BASE_DIR%/config/tools-log4j.properties
> )
> %~dp0kafka-run-class.bat org.apache.kafka.connect.cli.ConnectStandalone %*
> EndLocal
> Not able to see any of the below logs in FileStreamSourceTask
>   log.trace("Read {} bytes from {}", nread, logFilename());
> (OR)
> log.trace("Read a line from {}", logFilename());



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4576) Log segments close to max size break on fetch

2016-12-30 Thread Ivan Babrou (JIRA)
Ivan Babrou created KAFKA-4576:
--

 Summary: Log segments close to max size break on fetch
 Key: KAFKA-4576
 URL: https://issues.apache.org/jira/browse/KAFKA-4576
 Project: Kafka
  Issue Type: Bug
  Components: log
Affects Versions: 0.10.1.1
Reporter: Ivan Babrou


We are running Kafka 0.10.1.1~rc1 (it's the same as 0.10.1.1).

Max segment size is set to 2147483647 globally, that's 1 byte less than max 
signed int32.

Every now and then we see failures like this:

{noformat}
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: ERROR [Replica Manager on 
Broker 1006]: Error processing fetch operation on partition [mytopic,11], 
offset 483579108587 (kafka.server.ReplicaManager)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: 
java.lang.IllegalStateException: Failed to read complete buffer for 
targetOffset 483686627237 startPosition 2145701130 in 
/disk/data0/kafka-logs/mytopic-11/483571890786.log
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.log.FileMessageSet.searchForOffsetWithSize(FileMessageSet.scala:145)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.log.LogSegment.translateOffset(LogSegment.scala:128)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.log.LogSegment.read(LogSegment.scala:180)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.log.Log.read(Log.scala:563)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:567)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:606)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:605)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
scala.collection.Iterator$class.foreach(Iterator.scala:893)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:605)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:469)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:534)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.KafkaApis.handle(KafkaApis.scala:79)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
Dec 25 18:20:47 myhost kafka-run-class.sh[2054]: at 
java.lang.Thread.run(Thread.java:745)
{noformat}

{noformat}
...
-rw-r--r-- 1 kafka kafka  0 Dec 25 15:15 483557418204.timeindex
-rw-r--r-- 1 kafka kafka   9496 Dec 25 15:26 483564654488.index
-rw-r--r-- 1 kafka kafka 2145763964 Dec 25 15:26 483564654488.log
-rw-r--r-- 1 kafka kafka  0 Dec 25 15:26 483564654488.timeindex
-rw-r--r-- 1 kafka kafka   9576 Dec 25 15:37 483571890786.index
-rw-r--r-- 1 kafka kafka 2147483644 Dec 25 15:37 483571890786.log
-rw-r--r-- 1 kafka kafka  0 Dec 25 15:37 483571890786.timeindex
-rw-r--r-- 1 kafka kafka   9568 Dec 25 15:48 483579135712.index
-rw-r--r-- 1 kafka kafka 2146791360 Dec 25 15:48 483579135712.log
-rw-r--r-- 1 kafka kafka  0 Dec 25 15:48 483579135712.timeindex
-rw-r--r-- 1 kafka kafka   9408 Dec 25 15:59 483586374164.index
...
{noformat}

Here 483571890786.log is just 3 bytes below the max size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: custom offsets in ProduceRequest

2016-12-30 Thread Andrey L. Neporada
Hi!

> On 30 Dec 2016, at 09:52, radai  wrote:
> 
> or even better - if topic creation is done dynamically by the replicator,
> setting the initial offsets for partitions could be made part of topic
> creation ? even less API changes this way
> 
> On Thu, Dec 29, 2016 at 10:49 PM, radai  wrote:
> 
>> ah, I didnt realize we are limiting the discussion to master --> slave.
>> 
>> but - if we're talking about master-slave replication, and under the
>> conditions i outlined earlier (src and dest match in #partitions, no
>> foreign writes to dest) it "just works", seems to me the only thing youre
>> really missing is not an explicit desired offset param on each and every
>> request, but just the ability to "reset" the starting offset on the dest
>> cluster at topic creation.
>> 

Unfortunately, resetting starting offset at topic creation is not enough.
Currently we do support log compaction - in other words, message offsets can be 
non-contigous inside partition.

Adding new request ResetNextOffset(partition, offset) will do the job, however 
I think it would be easier to extend ProduceRequest than to introduce 
ResetNextOffset.
Let me try to explain it in more detail.

Current ProduceRequest works as follows:

1) get messageSet from client
2) locate the offset of the last message in log (say N)
3) validate messageSet and assign the offsets to messageSet starting from N+1
4) save messageSet to log

Proposed ProduceRequest should work this way:

1) get messageSet along with messageSetStartOffset (say S) from client
2) locate the offset of the last message in log (say N)
3) if S < N + 1, report error to client
4) otherwise validate messageSet and assign the offsets to messageSet starting 
from S
5) save messageSet to log


The other alternative is to add ResetNextOffset(partition, nextOffset) API.
However, since ResetNextOffset changes partition state, it should be somehow 
persisted on disk and replicated from leader to followers.
This will require to introduce changes to on-disk partition format - for 
example we could probably add special “marker” messages with the meaning “next 
message should get offset S”, etc.

So I believe that extending ProduceRequest to accept messageSetStart offset is 
easier than to introduce ResetNextOffset.


>> let me try and run through a more detailed scenario:
>> 
>> 1. suppose i set up the original cluster (src). no remote cluster yet.
>> lets say over some period of time i produce 1 million msgs to topic X on
>> this src cluster.
>> 2. company grows, 2nd site is opened, dest cluster is created, topic X is
>> created on (brand new) dest cluster.
>> 3. offsets are manually set on every partition of X on the dest cluster to
>> match either the oldest retained or current offset of the matching
>> partition of X in src. in pseudo code:
>> 
>> for (partI in numPartitions) {
>>partIOffset
>>if (replicateAllRetainedHistory) {
>>   partIOffset = src.getOldestRetained(partI)
>>} else {
>>   partIOffset = src.getCurrent(partI) //will not copy over history
>>}
>>dest.resetStartingOffset(partI, partIOffset)   < new mgmt API
>> }
>> 
>> 4. now you are free to start replicating. under master --> slave
>> assumptions offsets will match from this point forward
>> 
>> seems to me something like this could be made part of the replicator
>> component (mirror maker, or whatever else you want to use) - if topic X
>> does not exist in destination, create it, reset initial offsets to match
>> source, start replication
>> 
>> On Thu, Dec 29, 2016 at 12:41 PM, Andrey L. Neporada <
>> anepor...@yandex-team.ru> wrote:
>> 
>>> 
 On 29 Dec 2016, at 20:43, radai  wrote:
 
 so, if i follow your suggested logic correctly, there would be some
>>> sort of
 :
 
 produce(partition, msg, requestedOffset)
 
>>> 
 which would fail if requestedOffset is already taken (by another
>>> previous
 such explicit call or by another regular call that just happened to get
 assigned that offset by the partition leader on the target cluster).
 
>>> 
>>> Yes. More formally, my proposal is to extend ProduceRequest by adding
>>> MessageSetStartOffset:
>>> 
>>> ProduceRequest => RequiredAcks Timeout [TopicName [Partition
>>> MessageSetStartOffset MessageSetSize MessageSet]]
>>>  RequiredAcks => int16
>>>  Timeout => int32
>>>  Partition => int32
>>>  MessageSetSize => int32
>>>  MessageSetStartOffset => int64
>>> 
>>> If MessageSetStartOffset is -1, ProduceRequest should work exactly as
>>> before - i.e. assign next available offset to given MessageSet.
>>> 
>>> 
 how would you meaningfully handle this failure?
 
 suppose this happens to some cross-cluster replicator (like mirror
>>> maker).
 there is no use in retrying. the options would be:
 
 1. get the next available offset - which would violate what youre
>>>