Re: Help on 'Error while writing to checkpoint file' Issue

2018-10-30 Thread Dasun Nirmitha
Hello Dhruvil
Ok sure, I opened a Jira and attached a rar containing the error
screenshots.
Would you please look into it :
https://issues.apache.org/jira/browse/KAFKA-7575

On Wed, Oct 31, 2018 at 11:06 AM Dhruvil Shah  wrote:

> Hi Dasun, seems like the screenshots were not attached. Could you please
> open a Jira here: https://issues.apache.org/jira/projects/KAFKA
>
> Thanks,
> Dhruvil
>
> On Tue, Oct 30, 2018 at 10:29 PM Dasun Nirmitha 
> wrote:
>
> > Hello Guys
> > I'm currently testing a Java Kafka producer application coded to retrieve
> > a db value from a local mysql db and produce to a single topic. Locally
> > I've got a Zookeeper server and a Kafka single broker running.
> > My issue is I need to produce this from the Kafka producer each second,
> > and that works for around 2 hours until broker throws an 'Error while
> > writing to checkpoint file' and shuts down. Producing with a 1 minute
> > interval works with no issues but unfortunately I need the produce
> interval
> > to be 1 second.
> > I have attached screenshots of the Errors thrown from the Broker and my
> > application.
> > Any help would be really appreciated.
> >
> > Best Regards
> > Dasun
> >
>


[jira] [Created] (KAFKA-7575) 'Error while writing to checkpoint file' Issue

2018-10-30 Thread Dasun Nirmitha (JIRA)
Dasun Nirmitha created KAFKA-7575:
-

 Summary: 'Error while writing to checkpoint file' Issue
 Key: KAFKA-7575
 URL: https://issues.apache.org/jira/browse/KAFKA-7575
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 1.1.1
 Environment: Windows 10, Kafka 1.1.1
Reporter: Dasun Nirmitha
 Attachments: Dry run error.rar

I'm currently testing a Java Kafka producer application coded to retrieve a db 
value from a local mysql db and produce to a single topic. Locally I've got a 
Zookeeper server and a Kafka single broker running.
My issue is I need to produce this from the Kafka producer each second, and 
that works for around 2 hours until broker throws an 'Error while writing to 
checkpoint file' and shuts down. Producing with a 1 minute interval works with 
no issues but unfortunately I need the produce interval to be 1 second.
I have attached a rar containing screenshots of the Errors thrown from the 
Broker and my application.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Help on 'Error while writing to checkpoint file' Issue

2018-10-30 Thread Dhruvil Shah
Hi Dasun, seems like the screenshots were not attached. Could you please
open a Jira here: https://issues.apache.org/jira/projects/KAFKA

Thanks,
Dhruvil

On Tue, Oct 30, 2018 at 10:29 PM Dasun Nirmitha 
wrote:

> Hello Guys
> I'm currently testing a Java Kafka producer application coded to retrieve
> a db value from a local mysql db and produce to a single topic. Locally
> I've got a Zookeeper server and a Kafka single broker running.
> My issue is I need to produce this from the Kafka producer each second,
> and that works for around 2 hours until broker throws an 'Error while
> writing to checkpoint file' and shuts down. Producing with a 1 minute
> interval works with no issues but unfortunately I need the produce interval
> to be 1 second.
> I have attached screenshots of the Errors thrown from the Broker and my
> application.
> Any help would be really appreciated.
>
> Best Regards
> Dasun
>


Help on 'Error while writing to checkpoint file' Issue

2018-10-30 Thread Dasun Nirmitha
Hello Guys
I'm currently testing a Java Kafka producer application coded to retrieve a
db value from a local mysql db and produce to a single topic. Locally I've
got a Zookeeper server and a Kafka single broker running.
My issue is I need to produce this from the Kafka producer each second, and
that works for around 2 hours until broker throws an 'Error while writing
to checkpoint file' and shuts down. Producing with a 1 minute interval
works with no issues but unfortunately I need the produce interval to be 1
second.
I have attached screenshots of the Errors thrown from the Broker and my
application.
Any help would be really appreciated.

Best Regards
Dasun


Jenkins build is back to normal : kafka-trunk-jdk11 #71

2018-10-30 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-7574) Kafka unable to delete segment file in Linux

2018-10-30 Thread Ben Maas (JIRA)
Ben Maas created KAFKA-7574:
---

 Summary: Kafka unable to delete segment file in Linux
 Key: KAFKA-7574
 URL: https://issues.apache.org/jira/browse/KAFKA-7574
 Project: Kafka
  Issue Type: Bug
  Components: core, log
Affects Versions: 1.0.0
 Environment: Debian Stretch
Reporter: Ben Maas
 Attachments: kafka-config.log, kafka-delete_error_pattern.log

The initial error is that Kafka believes it is unable to delete a log segment 
file. This causes Kafka to mark the log directory as unavailable and eventually 
shut down without flushing data to disk. There are no indications in the OS 
logs of a failed filesystem or any other OS level issue. We have verified that 
filesystem is consistent.

Is there a IO timeout of some kind that can be adjusted or is something else 
happening? Potential duplicate of race condition seen in KAFKA-6194.

See attached files for config and example log pattern.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-380: Detect outdated control requests and bounced brokers using broker generation

2018-10-30 Thread Patrick Huang
Hi,

Since we have three binding votes (Jun, Dong, Harsha), if there are no other 
concerns, I will go ahead to conclude this voting thread and mark the KIP as 
accepted.

For implementation review and discussion, we can go to the PR for this KIP: 
https://github.com/apache/kafka/pull/5821

Thanks all!


Best,
Zhanxiang (Patrick) Huang


From: Harsha Chintalapani 
Sent: Tuesday, October 30, 2018 16:10
To: Patrick Huang; dev@kafka.apache.org
Cc: dev@kafka.apache.org
Subject: Re: [VOTE] KIP-380: Detect outdated control requests and bounced 
brokers using broker generation

Thanks for the KIP. +1 (binding)

-Harsha
On Oct 29, 2018, 5:08 PM -0700, Jun Rao , wrote:
Hi, Patrick,

Thanks for the updated KIP. +1

Jun

On Wed, Oct 24, 2018 at 4:52 PM, Patrick Huang  wrote:

Hi Jun,

Sure. I already updated the KIP. Thanks!

Best,
Zhanxiang (Patrick) Huang

--
*From:* Jun Rao 
*Sent:* Wednesday, October 24, 2018 14:17
*To:* dev
*Subject:* Re: [VOTE] KIP-380: Detect outdated control requests and
bounced brokers using broker generation

Hi, Patrick,

Could you update the KIP with the changes to ControlledShutdownRequest
based on the discussion thread?

Thanks,

Jun


On Sun, Oct 21, 2018 at 2:25 PM, Mickael Maison 
wrote:

+1( non-binding)
Thanks for the KIP!

On Sun, Oct 21, 2018, 03:31 Harsha Chintalapani  wrote:

+1(binding). LGTM.
-Harsha
On Oct 20, 2018, 4:49 PM -0700, Dong Lin , wrote:
Thanks much for the KIP Patrick. Looks pretty good.

+1 (binding)

On Fri, Oct 19, 2018 at 10:17 AM Patrick Huang 
wrote:

Hi All,

I would like to call for a vote on KIP-380:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-
380%3A+Detect+outdated+control+requests+and+bounced+
brokers+using+broker+
generation

Here is the discussion thread:


https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3E
KIP-380
<
https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3EKIP-380
:
Detect outdated control requests and bounced ...<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-
380%3A+Detect+outdated+control+requests+and+bounced+
brokers+using+broker+
generation

Note: Normalizing the schema is a good-to-have optimization because
the
memory footprint for the control requests hinders the controller
from
scaling up if we have many topics with large partition counts.
cwiki.apache.org



Thanks,
Zhanxiang (Patrick) Huang






[jira] [Resolved] (KAFKA-7567) Clean up internal metadata usage for consistency and extensibility

2018-10-30 Thread Jason Gustafson (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-7567.

   Resolution: Fixed
Fix Version/s: 2.2.0

> Clean up internal metadata usage for consistency and extensibility
> --
>
> Key: KAFKA-7567
> URL: https://issues.apache.org/jira/browse/KAFKA-7567
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 2.2.0
>
>
> This refactor has two objectives to improve metadata handling logic and 
> testing:
> 1. We want to reduce dependence on the public object `Cluster` for internal 
> metadata propagation since it is not easy to evolve. As an example, we need 
> to propagate leader epochs from the metadata response to `Metadata`, but it 
> is not straightforward to do this without exposing it in `PartitionInfo` 
> since that is what `Cluster` uses internally. By doing this change, we are 
> able to remove some redundant `Cluster` building logic. 
> 2. We want to make the metadata handling in `MockClient` simpler and more 
> consistent. Currently we have mix of metadata update mechanisms which are 
> internally inconsistent with each other and also do not match the 
> implementation in `NetworkClient`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7573) Add an interface that allows broker to intercept every request/response pair

2018-10-30 Thread Lincong Li (JIRA)
Lincong Li created KAFKA-7573:
-

 Summary: Add an interface that allows broker to intercept every 
request/response pair
 Key: KAFKA-7573
 URL: https://issues.apache.org/jira/browse/KAFKA-7573
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 2.0.0
Reporter: Lincong Li
Assignee: Lincong Li


This interface is called "observer" and it opens up several opportunities. One 
major opportunity is that it enables an auditing system to be built for Kafka 
deployment. Details are discussed in a KIP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7572) Producer should not send requests with negative partition id

2018-10-30 Thread Yaodong Yang (JIRA)
Yaodong Yang created KAFKA-7572:
---

 Summary: Producer should not send requests with negative partition 
id
 Key: KAFKA-7572
 URL: https://issues.apache.org/jira/browse/KAFKA-7572
 Project: Kafka
  Issue Type: Bug
  Components: clients
Affects Versions: 1.0.1
Reporter: Yaodong Yang


h3. Issue:

In one Kafka producer log from our users, we found the following weird one:

timestamp="2018-10-09T17:37:41,237-0700",level="ERROR", Message="Write to Kafka 
failed with: ",exception="java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for 
topicName--2: 30042 ms has passed since batch creation plus linger time
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.valueOrError(FutureRecordMetadata.java:94)
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:64)
 at 
org.apache.kafka.clients.producer.internals.FutureRecordMetadata.get(FutureRecordMetadata.java:29)
 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 1 
record(s) for topicName--2: 30042 ms has passed since batch creation plus 
linger time"

After a few hours debugging, we finally understood the root cause of this issue:
 # The producer used a buggy custom Partitioner, which sometimes generates 
negative partition ids for new records.
 # The corresponding produce requests were rejected by brokers, because it's 
illegal to have a partition with a negative id.
 # The client kept refreshing its local cluster metadata, but could not send 
produce requests successfully.
 # From the above log, we found a suspicious string "topicName--2":
 # According to the source code, the format of this string in the log is 
TopicName+"-"+PartitionId.
 # It's not easy to notice that there were 2 consecutive dash in the above log.
 # Eventually, we found that the second dash was a negative sign. Therefore, 
the partition id is -2, rather than 2.
 # The bug the custom Partitioner.

h3. Proposal:
 # Producer code should check the partitionId before sending requests to 
brokers.
 # If there is a negative partition Id, just throw an IllegalStateException{{ 
}}exception.
 # Such a quick check can save lots of time for people debugging their producer 
code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-380: Detect outdated control requests and bounced brokers using broker generation

2018-10-30 Thread Harsha Chintalapani
Thanks for the KIP. +1 (binding)

-Harsha
On Oct 29, 2018, 5:08 PM -0700, Jun Rao , wrote:
> Hi, Patrick,
>
> Thanks for the updated KIP. +1
>
> Jun
>
> On Wed, Oct 24, 2018 at 4:52 PM, Patrick Huang  wrote:
>
> > Hi Jun,
> >
> > Sure. I already updated the KIP. Thanks!
> >
> > Best,
> > Zhanxiang (Patrick) Huang
> >
> > --
> > *From:* Jun Rao 
> > *Sent:* Wednesday, October 24, 2018 14:17
> > *To:* dev
> > *Subject:* Re: [VOTE] KIP-380: Detect outdated control requests and
> > bounced brokers using broker generation
> >
> > Hi, Patrick,
> >
> > Could you update the KIP with the changes to ControlledShutdownRequest
> > based on the discussion thread?
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Sun, Oct 21, 2018 at 2:25 PM, Mickael Maison 
> > wrote:
> >
> > > +1( non-binding)
> > > Thanks for the KIP!
> > >
> > > On Sun, Oct 21, 2018, 03:31 Harsha Chintalapani  wrote:
> > >
> > > > +1(binding). LGTM.
> > > > -Harsha
> > > > On Oct 20, 2018, 4:49 PM -0700, Dong Lin , wrote:
> > > > > Thanks much for the KIP Patrick. Looks pretty good.
> > > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Fri, Oct 19, 2018 at 10:17 AM Patrick Huang 
> > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to call for a vote on KIP-380:
> > > > > >
> > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 380%3A+Detect+outdated+control+requests+and+bounced+
> > brokers+using+broker+
> > > generation
> > > > > >
> > > > > > Here is the discussion thread:
> > > > > >
> > > > > >
> > > > https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
> > > f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3E
> > > > > > KIP-380
> > > > > > <
> > > > https://lists.apache.org/thread.html/2497114df64993342eaf9c78c0f14b
> > > f8c1795bc3305f13b03dd39afd@%3Cdev.kafka.apache.org%3EKIP-380
> > > > > :
> > > > > > Detect outdated control requests and bounced ...<
> > > > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 380%3A+Detect+outdated+control+requests+and+bounced+
> > brokers+using+broker+
> > > generation
> > > > > > >
> > > > > > Note: Normalizing the schema is a good-to-have optimization because
> > > the
> > > > > > memory footprint for the control requests hinders the controller
> > from
> > > > > > scaling up if we have many topics with large partition counts.
> > > > > > cwiki.apache.org
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > > Zhanxiang (Patrick) Huang
> > > > > >
> > > >
> > >
> >


Re: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a round robin fashion

2018-10-30 Thread Matthias J. Sax
I added you to the list of contributes. You can now self-assign ticket
to yourself.

Before you start working on this, we need to understand what the actual
issue is in detail. Note, that sending fetch request to partitions and
what poll() returns is two different things by design.

You might also want to read
https://cwiki.apache.org/confluence/display/KAFKA/KIP-227%3A+Introduce+Incremental+FetchRequests+to+Increase+Partition+Scalability

I am not sure, if KAFKA-3923 requests to change the fetch request logic
(what was done already) or what poll() returns from the buffer, or maybe
both. It would be good to ask for clarification on the ticket to see
what the reported actually means, and if the current behavior meets
there requirements and if not, why.

Overall, this change seems to require a KIP anyway. Hope this helps.


-Matthias


On 10/30/18 9:38 AM, ChienHsing Wu wrote:
> I just looked at the release schedule. I guess the 2.2 is around Feb/2019, 
> right?  --CH
> 
> -Original Message-
> From: ChienHsing Wu  
> Sent: Tuesday, October 30, 2018 10:56 AM
> To: dev@kafka.apache.org
> Subject: RE: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a 
> round robin fashion
> 
> Hi Matthias,
> 
> Sorry about the late reply.
> 
> I have a Jira account. It's chienhsw. I am using the latest version 2.0. 
> Would it be possible to add that to 2.0 as a minor release?
> 
> Thanks, ChienHsing
> 
> -Original Message-
> From: Matthias J. Sax 
> Sent: Wednesday, October 24, 2018 6:41 PM
> To: dev@kafka.apache.org
> Subject: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a round 
> robin fashion
> 
> CH,
> 
> Thanks for contributing to Kafka. Do you have a Jira account already? If yes, 
> what is your account id? If not, you need to create one first and share your 
> id so we can grant permission to self-assign tickets.
> 
> I was just looking into the ticket itself, and it's marked as 0.10.0.0.
> You say you encountered this issues. Do you use 0.10.0.x version? AFAIK, the 
> consumer was updated in later versions, and the behavior should be different. 
> Before you start working on the ticket, we should verify that it is not 
> already fixed. For this case, we would just resolve the ticket with 
> corresponding fixed version.
> 
> Note, that the behavior (at least from my point of view) is not a bug, but 
> addressing it would be an improvement. Thus, if you work on it, the patch 
> would be released with 2.2.0 version, but _not_ with a potential
> 0.10.0.2 release.
> 
> Does this make sense?
> 
> 
> -Matthias
> 
> On 10/24/18 6:27 AM, ChienHsing Wu wrote:
>> I don't see any comments/concerns. I would like to implement and commit to 
>> this ticket. Could anyone let me know how to request for the permission to 
>> assign that ticket to me?
>>
>> Thanks, CH
>>
>> From: ChienHsing Wu
>> Sent: Monday, October 22, 2018 1:40 PM
>> To: 'dev@kafka.apache.org' 
>> Subject: KAFKA-3932 - Consumer fails to consume in a round robin 
>> fashion
>>
>>
>> Hi,
>>
>>
>>
>> I encountered the issue documented in the jira 
>> KAFKA-3932.
>>  Upon studying the source code and the 
>> PIP,
>>  I think the issues is the statement in PIP: "As before, we'd keep track of 
>> which partition we left off at so that the next iteration would begin 
>> there." I think it should NOT use the last partition in the next iteration; 
>> it should pick the next one instead.
>>
>> If this behavior is agreeable, the simplest solution to impose the order to 
>> pick the next one is to use the order the consumer.internals.Fetcher 
>> receives the partition messages, as determined by completedFetches queue in 
>> that class. To avoid parsing the partition messages repeatedly. we can save 
>> those parsed fetches to a list and maintain the next partition to get 
>> messages there.
>>
>> Does it sound like a good approach? If this is not the right place to 
>> discuss the design please let me know where to engage. If this is agreeable 
>> I can contribute the implementation.
>>
>>
>>
>> Thanks, CH
>>
>>
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (KAFKA-7571) Add system tests for downgrading Kafka

2018-10-30 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7571:
--

 Summary: Add system tests for downgrading Kafka
 Key: KAFKA-7571
 URL: https://issues.apache.org/jira/browse/KAFKA-7571
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Jason Gustafson


We have system tests which verify client behavior when upgrading Kafka. We 
should add similar tests for supported downgrades.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7570) Make internal offsets/transaction schemas forward compatible

2018-10-30 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-7570:
--

 Summary: Make internal offsets/transaction schemas forward 
compatible
 Key: KAFKA-7570
 URL: https://issues.apache.org/jira/browse/KAFKA-7570
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson


Currently changes to the data stored in the internal topics (__consumer_offsets 
and __transaction_state) are not forward compatible. This means that once users 
have upgraded to a Kafka version which includes a bumped schema, then it is no 
longer possible to downgrade. The changes to these schemas tend to be 
incremental, so we should consider options that at least allow new fields to be 
added without breaking downgrades.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


RE: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a round robin fashion

2018-10-30 Thread ChienHsing Wu
I just looked at the release schedule. I guess the 2.2 is around Feb/2019, 
right?  --CH

-Original Message-
From: ChienHsing Wu  
Sent: Tuesday, October 30, 2018 10:56 AM
To: dev@kafka.apache.org
Subject: RE: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a round 
robin fashion

Hi Matthias,

Sorry about the late reply.

I have a Jira account. It's chienhsw. I am using the latest version 2.0. Would 
it be possible to add that to 2.0 as a minor release?

Thanks, ChienHsing

-Original Message-
From: Matthias J. Sax 
Sent: Wednesday, October 24, 2018 6:41 PM
To: dev@kafka.apache.org
Subject: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a round 
robin fashion

CH,

Thanks for contributing to Kafka. Do you have a Jira account already? If yes, 
what is your account id? If not, you need to create one first and share your id 
so we can grant permission to self-assign tickets.

I was just looking into the ticket itself, and it's marked as 0.10.0.0.
You say you encountered this issues. Do you use 0.10.0.x version? AFAIK, the 
consumer was updated in later versions, and the behavior should be different. 
Before you start working on the ticket, we should verify that it is not already 
fixed. For this case, we would just resolve the ticket with corresponding fixed 
version.

Note, that the behavior (at least from my point of view) is not a bug, but 
addressing it would be an improvement. Thus, if you work on it, the patch would 
be released with 2.2.0 version, but _not_ with a potential
0.10.0.2 release.

Does this make sense?


-Matthias

On 10/24/18 6:27 AM, ChienHsing Wu wrote:
> I don't see any comments/concerns. I would like to implement and commit to 
> this ticket. Could anyone let me know how to request for the permission to 
> assign that ticket to me?
> 
> Thanks, CH
> 
> From: ChienHsing Wu
> Sent: Monday, October 22, 2018 1:40 PM
> To: 'dev@kafka.apache.org' 
> Subject: KAFKA-3932 - Consumer fails to consume in a round robin 
> fashion
> 
> 
> Hi,
> 
> 
> 
> I encountered the issue documented in the jira 
> KAFKA-3932.
>  Upon studying the source code and the 
> PIP,
>  I think the issues is the statement in PIP: "As before, we'd keep track of 
> which partition we left off at so that the next iteration would begin there." 
> I think it should NOT use the last partition in the next iteration; it should 
> pick the next one instead.
> 
> If this behavior is agreeable, the simplest solution to impose the order to 
> pick the next one is to use the order the consumer.internals.Fetcher receives 
> the partition messages, as determined by completedFetches queue in that 
> class. To avoid parsing the partition messages repeatedly. we can save those 
> parsed fetches to a list and maintain the next partition to get messages 
> there.
> 
> Does it sound like a good approach? If this is not the right place to discuss 
> the design please let me know where to engage. If this is agreeable I can 
> contribute the implementation.
> 
> 
> 
> Thanks, CH
> 
> 



RE: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a round robin fashion

2018-10-30 Thread ChienHsing Wu
Hi Matthias,

Sorry about the late reply.

I have a Jira account. It's chienhsw. I am using the latest version 2.0. Would 
it be possible to add that to 2.0 as a minor release?

Thanks, ChienHsing

-Original Message-
From: Matthias J. Sax  
Sent: Wednesday, October 24, 2018 6:41 PM
To: dev@kafka.apache.org
Subject: [EXTERNAL] - Re: KAFKA-3932 - Consumer fails to consume in a round 
robin fashion

CH,

Thanks for contributing to Kafka. Do you have a Jira account already? If yes, 
what is your account id? If not, you need to create one first and share your id 
so we can grant permission to self-assign tickets.

I was just looking into the ticket itself, and it's marked as 0.10.0.0.
You say you encountered this issues. Do you use 0.10.0.x version? AFAIK, the 
consumer was updated in later versions, and the behavior should be different. 
Before you start working on the ticket, we should verify that it is not already 
fixed. For this case, we would just resolve the ticket with corresponding fixed 
version.

Note, that the behavior (at least from my point of view) is not a bug, but 
addressing it would be an improvement. Thus, if you work on it, the patch would 
be released with 2.2.0 version, but _not_ with a potential
0.10.0.2 release.

Does this make sense?


-Matthias

On 10/24/18 6:27 AM, ChienHsing Wu wrote:
> I don't see any comments/concerns. I would like to implement and commit to 
> this ticket. Could anyone let me know how to request for the permission to 
> assign that ticket to me?
> 
> Thanks, CH
> 
> From: ChienHsing Wu
> Sent: Monday, October 22, 2018 1:40 PM
> To: 'dev@kafka.apache.org' 
> Subject: KAFKA-3932 - Consumer fails to consume in a round robin 
> fashion
> 
> 
> Hi,
> 
> 
> 
> I encountered the issue documented in the jira 
> KAFKA-3932.
>  Upon studying the source code and the 
> PIP,
>  I think the issues is the statement in PIP: "As before, we'd keep track of 
> which partition we left off at so that the next iteration would begin there." 
> I think it should NOT use the last partition in the next iteration; it should 
> pick the next one instead.
> 
> If this behavior is agreeable, the simplest solution to impose the order to 
> pick the next one is to use the order the consumer.internals.Fetcher receives 
> the partition messages, as determined by completedFetches queue in that 
> class. To avoid parsing the partition messages repeatedly. we can save those 
> parsed fetches to a list and maintain the next partition to get messages 
> there.
> 
> Does it sound like a good approach? If this is not the right place to discuss 
> the design please let me know where to engage. If this is agreeable I can 
> contribute the implementation.
> 
> 
> 
> Thanks, CH
> 
> 



[VOTE] - KIP-213 Support non-key joining in KTable

2018-10-30 Thread Adam Bellemare
Hi All

I would like to call a vote on
https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable.
This allows a Kafka Streams DSL user to perform KTable to KTable
foreign-key joins on their data. I have been using this in production for
some time and I have composed a PR that enables this. It is a fairly
extensive PR, but I believe it will add considerable value to the Kafka
Streams DSL.

The PR can be found here:
https://github.com/apache/kafka/pull/5527

See http://mail-archives.apache.org/mod_mbox/kafka-dev/201810.mbox/browser
for previous discussion thread.

I would also like to give a shout-out to Jan Filipiak who helped me out
greatly in this project, and who led the initial work into this problem.
Without Jan's help and insight I do not think this would have been possible
to get to this point.

Adam


Re: [VOTE] 2.1.0 RC0

2018-10-30 Thread Eno Thereska
2 tests failed for me and 4 were skipped when doing ./gradlew test:

Failed tests:
DeleteTopicTest. testAddPartitionDuringDeleteTopic
SaslOAuthBearerSslEndToEndAuthorizationTest.
testNoConsumeWithDescribeAclViaSubscribe

Ignored tests:
ConsumerBounceTest. testConsumptionWithBrokerFailures
UncleanLeaderElectionTest. testCleanLeaderElectionDisabledByTopicOverride
UncleanLeaderElectionTest. testUncleanLeaderElectionDisabled
DynamicBrokerReconfigurationTest. testAddRemoveSslListener

Thanks
Eno

On Mon, Oct 29, 2018 at 8:49 AM Magnus Edenhill  wrote:

> +1 (non-binding)
>
> passes librdkafka integration test suite
>
> Den fre 26 okt. 2018 kl 15:58 skrev Manikumar :
>
> > minor observation: config sections are empty in the documentation page.
> > http://kafka.apache.org/21/documentation.html#producerconfigs
> >
> > On Wed, Oct 24, 2018 at 10:49 PM Ted Yu  wrote:
> >
> > > +1
> > >
> > > InternalTopicIntegrationTest failed during test suite run but passed
> with
> > > rerun.
> > >
> > > On Wed, Oct 24, 2018 at 3:48 AM Andras Beni  > > .invalid>
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Verified signatures and checksums of release artifacts
> > > > Performed quickstart steps on rc artifacts (both scala 2.11 and 2.12)
> > and
> > > > one built from tag 2.1.0-rc0
> > > >
> > > > Andras
> > > >
> > > > On Wed, Oct 24, 2018 at 10:17 AM Dong Lin 
> wrote:
> > > >
> > > > > Hello Kafka users, developers and client-developers,
> > > > >
> > > > > This is the first candidate for feature release of Apache Kafka
> > 2.1.0.
> > > > >
> > > > > This is a major version release of Apache Kafka. It includes 28 new
> > > KIPs
> > > > > and
> > > > >
> > > > > critical bug fixes. Please see the Kafka 2.1.0 release plan for
> more
> > > > > details:
> > > > >
> > > > > *
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044*
> > > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=91554044
> > > > > >
> > > > >
> > > > > Here are a few notable highlights:
> > > > >
> > > > > - Java 11 support
> > > > > - Support for Zstandard, which achieves compression comparable to
> > gzip
> > > > with
> > > > > higher compression and especially decompression speeds(KIP-110)
> > > > > - Avoid expiring committed offsets for active consumer group
> > (KIP-211)
> > > > > - Provide Intuitive User Timeouts in The Producer (KIP-91)
> > > > > - Kafka's replication protocol now supports improved fencing of
> > > zombies.
> > > > > Previously, under certain rare conditions, if a broker became
> > > partitioned
> > > > > from Zookeeper but not the rest of the cluster, then the logs of
> > > > replicated
> > > > > partitions could diverge and cause data loss in the worst case
> > > (KIP-320)
> > > > > - Streams API improvements (KIP-319, KIP-321, KIP-330, KIP-353,
> > > KIP-356)
> > > > > - Admin script and admin client API improvements to simplify admin
> > > > > operation (KIP-231, KIP-308, KIP-322, KIP-324, KIP-338, KIP-340)
> > > > > - DNS handling improvements (KIP-235, KIP-302)
> > > > >
> > > > > Release notes for the 2.1.0 release:
> > > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/RELEASE_NOTES.html
> > > > >
> > > > > *** Please download, test and vote ***
> > > > >
> > > > > * Kafka's KEYS file containing PGP keys we use to sign the release:
> > > > > http://kafka.apache.org/KEYS
> > > > >
> > > > > * Release artifacts to be voted upon (source and binary):
> > > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/
> > > > >
> > > > > * Maven artifacts to be voted upon:
> > > > > https://repository.apache.org/content/groups/staging/
> > > > >
> > > > > * Javadoc:
> > > > > http://home.apache.org/~lindong/kafka-2.1.0-rc0/javadoc/
> > > > >
> > > > > * Tag to be voted upon (off 2.1 branch) is the 2.1.0-rc0 tag:
> > > > > https://github.com/apache/kafka/tree/2.1.0-rc0
> > > > >
> > > > > * Documentation:
> > > > > *http://kafka.apache.org/21/documentation.html*
> > > > > 
> > > > >
> > > > > * Protocol:
> > > > > http://kafka.apache.org/21/protocol.html
> > > > >
> > > > > * Successful Jenkins builds for the 2.1 branch:
> > > > > Unit/integration tests: *
> > > > https://builds.apache.org/job/kafka-2.1-jdk8/38/
> > > > > *
> > > > >
> > > > > Please test and verify the release artifacts and submit a vote for
> > this
> > > > RC,
> > > > > or report any issues so we can fix them and get a new RC out ASAP.
> > > > Although
> > > > > this release vote requires PMC votes to pass, testing, votes, and
> bug
> > > > > reports are valuable and appreciated from everyone.
> > > > >
> > > > > Cheers,
> > > > > Dong
> > > > >
> > > >
> > >
> >
>


[jira] [Created] (KAFKA-7569) Kafka doesnt appear to cleanup dangling partitions

2018-10-30 Thread Dao Quang Minh (JIRA)
Dao Quang Minh created KAFKA-7569:
-

 Summary: Kafka doesnt appear to cleanup dangling partitions
 Key: KAFKA-7569
 URL: https://issues.apache.org/jira/browse/KAFKA-7569
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Dao Quang Minh


In our current cluster running kafka 1.0.0, we recently observed that Kafka 
doesnt cleanup dangling partitions ( i.e. partion data on disk, but partition 
is not assigned to the current broker anymore ).

For example of the dangling partition data, we have:

{code}
total 26G
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:19 352433304663.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:16 352414164340.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:24 352466972892.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:23 352457368236.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:17 352423709566.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:21 352447702369.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:20 352442921890.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:22 352452551548.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:17 352418945305.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:18 352428477361.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:15 352409416538.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:24 352462192103.log
-rw-r--r-- 1 kafka kafka 1.9G Aug 6 16:20 352438136012.log
-rw-r--r-- 1 kafka kafka 1.8G Aug 6 17:43 352471757311.log
-rw-r--r-- 1 kafka kafka 10M Oct 16 21:44 352471757311.index
-rw-r--r-- 1 kafka kafka 10M Oct 16 21:44 352471757311.timeindex
drwxr-xr-x 2 kafka kafka 4.0K Oct 8 15:27 .
drwxr-xr-x 49 kafka kafka 4.0K Oct 30 11:21 ..
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352414164340.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352423709566.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352433304663.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352447702369.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352457368236.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352466972892.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352409416538.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352418945305.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352428477361.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352438136012.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352442921890.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352452551548.timeindex
-rw-r--r-- 1 kafka kafka 2.3K Oct 16 21:44 352462192103.timeindex
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352414164340.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352423709566.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352433304663.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352447702369.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352457368236.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352466972892.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352409416538.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352418945305.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352428477361.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352438136012.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352442921890.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352452551548.index
-rw-r--r-- 1 kafka kafka 1.5K Oct 16 21:44 352462192103.index
-rw-r--r-- 1 kafka kafka 20 Aug 6 16:23 leader-epoch-checkpoint
-rw-r--r-- 1 kafka kafka 10 Aug 6 16:24 352466972892.snapshot
-rw-r--r-- 1 kafka kafka 10 Aug 6 16:24 352471757311.snapshot
-rw-r--r-- 1 kafka kafka 10 Oct 8 15:27 352476186724.snapshot
{code}

I'm unsure how we ended up in this situation as partition data should be marked 
as removed and eventually remove when it's not assigned to the broker anymore. 
But in this edge case, should Kafka detect that automatically when it loads the 
partition and re-mark it as to be deleted again ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] 2.0.1 RC0

2018-10-30 Thread Eno Thereska
Ok, +1 (non-binding)

Eno

On Mon, Oct 29, 2018 at 5:09 PM Manikumar  wrote:

> Hi Eno,
>
> This looks like an existing issue occuring only on source artifacts.  We
> are able to generate aggregate docs on cloned repo.
> I am getting similar error on previous release and 2.1.0 RC0 src artifacts.
> maybe related to gradle task ordering.
> I will look into it and try to fix it on trunk.
>
> Similar issue reported here:
> https://jira.apache.org/jira/browse/KAFKA-6500
>
> Thanks,
>
>
> On Mon, Oct 29, 2018 at 5:28 PM Eno Thereska 
> wrote:
>
> > Thanks. Tested basic building and running of unit and integration tests.
> > They work.
> > Tested docs. The following fails. Is it a known issue?
> >
> > "
> > ./gradlew aggregatedJavadoc
> > with info:
> > > Configure project :
> > Building project 'core' with Scala version 2.11.12
> > Building project 'streams-scala' with Scala version 2.11.12
> >
> > > Task :aggregatedJavadoc FAILED
> >
> > FAILURE: Build failed with an exception.
> >
> > * What went wrong:
> > A problem was found with the configuration of task ':aggregatedJavadoc'.
> > > No value has been specified for property 'outputDirectory'.
> >
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or
> > --debug option to get more log output. Run with --scan to get full
> > insights.
> >
> > * Get more help at https://help.gradle.org
> >
> > BUILD FAILED in 3s
> > "
> > Eno
> >
> > On Fri, Oct 26, 2018 at 3:29 AM Manikumar 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first candidate for release of Apache Kafka 2.0.1.
> > >
> > > This is a bug fix release closing 49 tickets:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> > >
> > > Release notes for the 2.0.1 release:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by  Tuesday, October 30, end of day
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> > >
> > > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> > >
> > > * Documentation:
> > > http://kafka.apache.org/20/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/20/protocol.html
> > >
> > > * Successful Jenkins builds for the 2.0 branch:
> > > Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.0-jdk8/177/
> > >
> > > /**
> > >
> > > Thanks,
> > > Manikumar
> > >
> >
>


Re: [VOTE] 2.0.1 RC0

2018-10-30 Thread Magnus Edenhill
+1 (non-binding)
Passes librdkafka integration test suite

Den mån 29 okt. 2018 kl 18:08 skrev Manikumar :

> Hi Eno,
>
> This looks like an existing issue occuring only on source artifacts.  We
> are able to generate aggregate docs on cloned repo.
> I am getting similar error on previous release and 2.1.0 RC0 src artifacts.
> maybe related to gradle task ordering.
> I will look into it and try to fix it on trunk.
>
> Similar issue reported here:
> https://jira.apache.org/jira/browse/KAFKA-6500
>
> Thanks,
>
>
> On Mon, Oct 29, 2018 at 5:28 PM Eno Thereska 
> wrote:
>
> > Thanks. Tested basic building and running of unit and integration tests.
> > They work.
> > Tested docs. The following fails. Is it a known issue?
> >
> > "
> > ./gradlew aggregatedJavadoc
> > with info:
> > > Configure project :
> > Building project 'core' with Scala version 2.11.12
> > Building project 'streams-scala' with Scala version 2.11.12
> >
> > > Task :aggregatedJavadoc FAILED
> >
> > FAILURE: Build failed with an exception.
> >
> > * What went wrong:
> > A problem was found with the configuration of task ':aggregatedJavadoc'.
> > > No value has been specified for property 'outputDirectory'.
> >
> > * Try:
> > Run with --stacktrace option to get the stack trace. Run with --info or
> > --debug option to get more log output. Run with --scan to get full
> > insights.
> >
> > * Get more help at https://help.gradle.org
> >
> > BUILD FAILED in 3s
> > "
> > Eno
> >
> > On Fri, Oct 26, 2018 at 3:29 AM Manikumar 
> > wrote:
> >
> > > Hello Kafka users, developers and client-developers,
> > >
> > > This is the first candidate for release of Apache Kafka 2.0.1.
> > >
> > > This is a bug fix release closing 49 tickets:
> > > https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+2.0.1
> > >
> > > Release notes for the 2.0.1 release:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/RELEASE_NOTES.html
> > >
> > > *** Please download, test and vote by  Tuesday, October 30, end of day
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > http://kafka.apache.org/KEYS
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/
> > >
> > > * Maven artifacts to be voted upon:
> > > https://repository.apache.org/content/groups/staging/
> > >
> > > * Javadoc:
> > > http://home.apache.org/~manikumar/kafka-2.0.1-rc0/javadoc/
> > >
> > > * Tag to be voted upon (off 2.0 branch) is the 2.0.1 tag:
> > > https://github.com/apache/kafka/releases/tag/2.0.1-rc0
> > >
> > > * Documentation:
> > > http://kafka.apache.org/20/documentation.html
> > >
> > > * Protocol:
> > > http://kafka.apache.org/20/protocol.html
> > >
> > > * Successful Jenkins builds for the 2.0 branch:
> > > Unit/integration tests:
> > https://builds.apache.org/job/kafka-2.0-jdk8/177/
> > >
> > > /**
> > >
> > > Thanks,
> > > Manikumar
> > >
> >
>


Re: [VOTE] KIP-377: TopicCommand to use AdminClient

2018-10-30 Thread Viktor Somogyi-Vass
Hi All,

I'm closing the vote as there has been a few days without any more
feedback and the KIP collected 3 binding votes (Gwen, Harsha and
Colin) and 2 non-binding (Mickael and Kevin) - therefore it got
accepted.
I'll update its status and work on the implementation (I have a WIP PR
but that still needs some changes to be done). As soon as it's ready
I'll ping this thread if anyone interested in the review.

Thank you all for participating the discussion and voting.

Viktor
On Thu, Oct 25, 2018 at 10:53 PM Gwen Shapira  wrote:
>
> +1 (binding)
> Thanks for working on this.
>
> On Wed, Oct 24, 2018, 7:28 AM Viktor Somogyi-Vass 
> wrote:
>
> > Hi All,
> >
> > I'd like to start a vote on KIP-377:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-377%3A+TopicCommand+to+use+AdminClient
> > .
> >
> > Summary:
> > The KIP basically proposes to add --bootstrap-server and
> > --command-config option to TopicsCommand and implement topic
> > administration with AdminClient in a backwards compatible way (so
> > wouldn't drop or change the --zookeeper option usage).
> >
> > I'd appreciate any votes or feedback.
> >
> > Viktor
> >