Build failed in Jenkins: kafka-trunk-jdk8 #2845

2018-07-27 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-7192 Follow-up: update checkpoint to the reset beginning offset

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-2 (ubuntu trusty) in workspace 

Cloning the remote Git repository
Cloning repository https://github.com/apache/kafka.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # 
 > timeout=10
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision c8c3a7dc48dc1ec6220798294da123bb617a6cd7 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c8c3a7dc48dc1ec6220798294da123bb617a6cd7
Commit message: "KAFKA-7192 Follow-up: update checkpoint to the reset beginning 
offset (#5430)"
 > git rev-list --no-walk a61594dee143ba089300546d1994b875a16ba521 # timeout=10
Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/jenkins8457133795257669593.sh
+ rm -rf 
+ /home/jenkins/tools/gradle/4.8.1/bin/gradle
/tmp/jenkins8457133795257669593.sh: line 4: 
/home/jenkins/tools/gradle/4.8.1/bin/gradle: No such file or directory
Build step 'Execute shell' marked build as failure
[FINDBUGS] Collecting findbugs analysis files...
Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
[FINDBUGS] Searching for all files in 
 that match the pattern 
**/build/reports/findbugs/*.xml
[FINDBUGS] No files found. Configuration error?
Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
 Using GitBlamer to create author and commit information for all 
warnings.
 GIT_COMMIT=c8c3a7dc48dc1ec6220798294da123bb617a6cd7, 
workspace=
[FINDBUGS] Computing warning deltas based on reference build #2844
Recording test results
Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting GRADLE_4_8_1_HOME=/home/jenkins/tools/gradle/4.8.1
Not sending mail to unregistered user nore...@github.com


Build failed in Jenkins: kafka-trunk-jdk10 #331

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H23 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/15)   remote: Compressing objects: 
 13% (2/15)   remote: Compressing objects:  20% (3/15)   
remote: Compressing objects:  26% (4/15)   remote: Compressing objects: 
 33% (5/15)   remote: Compressing objects:  40% (6/15)   
remote: Compressing objects:  46% (7/15)   remote: Compressing objects: 
 53% (8/15)   remote: Compressing objects:  60% (9/15)   
remote: Compressing objects:  66% (10/15)   remote: Compressing 
objects:  73% (11/15)   remote: Compressing objects:  80% (12/15)   
remote: Compressing objects:  86% (13/15)   remote: Compressing 
objects:  93% (14/15)   remote: Compressing objects: 100% (15/15)   
remote: Compressing objects: 100% (15/15), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  52% (3181/6116)   Receiving objects:  53% 

Re: Discussion: New components in JIRA?

2018-07-27 Thread Guozhang Wang
Hello Ray,

Any PMC member of the project can add more components in the JIRA system.
If there is no objection in the next 72 hours I can just go ahead and add
them.


Guozhang


On Thu, Jul 26, 2018 at 1:50 PM, Ray Chiang  wrote:

> Thanks Guozhang.  I'm good with the way the documentation is now.
>
> Is there any other procedure to follow to get "logging" and "mirrormaker"
> added as components or can we just request a JIRA admin to do that on this
> list?
>
> -Ray
>
>
> On 7/23/18 4:56 PM, Guozhang Wang wrote:
>
>> I've just updated the web docs on http://kafka.apache.org/contributing
>> accordingly.
>>
>> On Mon, Jul 23, 2018 at 3:30 PM, khaireddine Rezgui <
>> khaireddine...@gmail.com> wrote:
>>
>> Good job Ray for the wiki, it's clear enough.
>>>
>>> Le 23 juil. 2018 10:17 PM, "Ray Chiang"  a écrit :
>>>
>>> Okay, I've created a wiki page Reporting Issues in Apache Kafka
>>> <
>>> https://cwiki.apache.org/confluence/display/KAFKA/
>>> Reporting+Issues+in+Apache+Kafka>.
>>>
>>> I'd appreciate any feedback.  If this is good enough, I can file a JIRA
>>> to change the link under "Bugs" in the "Project information" page.
>>>
>>>
>>> -Ray
>>>
>>>
>>> On 7/23/18 11:28 AM, Ray Chiang wrote:
>>>
 Good point.  I'll look into adding some JIRA guidelines to the
 documentation/wiki.

 -Ray

 On 7/22/18 10:23 AM, Guozhang Wang wrote:

> Hello Ray,
>
> Thanks for brining this up. I'm generally +1 on the first two, while
> for
> the last category, personally I felt leaving them as part of `tools` is
> fine, but I'm also open for other opinions.
>
> A more general question though, is that today we do not have any
> guidelines
> to ask JIRA reporters to set the right component, i.e. it is purely
> best-effort, and we cannot disallow reporters to add any new component
> names. And so far the project does not really have a tradition to
> manage
> JIRA reports per-component, as the goal is to not "separate" the
> project
> into silos but recommending everyone to get hands on every aspect of
> the
> project.
>
>
> Guozhang
>
>
> On Fri, Jul 20, 2018 at 2:44 PM, Ray Chiang 
> wrote:
>
> I've been doing a little bit of component cleanup in JIRA.  What do
>> people
>> think of adding
>> one or more of the following components?
>>
>> - logging: For any consumer/producer/broker logging (i.e. log4j). This
>> should help disambiguate from the "log" component (i.e. Kafka
>> messages).
>>
>> - mirrormaker: There are enough requests specific to MirrorMaker
>> that it
>> could be put into its own component.
>>
>> - scripts: I'm a little more ambivalent about this one, but any of the
>> bin/*.sh script fixes could belong in their own category.  I'm not
>> sure if
>> other people feel strongly for how the "tools" component should be
>> used
>> w.r.t. the run scripts.
>>
>> Any thoughts?
>>
>> -Ray
>>
>>
>>
>>
>>
>


-- 
-- Guozhang


Build failed in Jenkins: kafka-trunk-jdk10 #330

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H23 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/15)   remote: Compressing objects: 
 13% (2/15)   remote: Compressing objects:  20% (3/15)   
remote: Compressing objects:  26% (4/15)   remote: Compressing objects: 
 33% (5/15)   remote: Compressing objects:  40% (6/15)   
remote: Compressing objects:  46% (7/15)   remote: Compressing objects: 
 53% (8/15)   remote: Compressing objects:  60% (9/15)   
remote: Compressing objects:  66% (10/15)   remote: Compressing 
objects:  73% (11/15)   remote: Compressing objects:  80% (12/15)   
remote: Compressing objects:  86% (13/15)   remote: Compressing 
objects:  93% (14/15)   remote: Compressing objects: 100% (15/15)   
remote: Compressing objects: 100% (15/15), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  52% (3181/6116)   Receiving objects:  53% 

Build failed in Jenkins: kafka-trunk-jdk10 #329

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H23 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/15)   remote: Compressing objects: 
 13% (2/15)   remote: Compressing objects:  20% (3/15)   
remote: Compressing objects:  26% (4/15)   remote: Compressing objects: 
 33% (5/15)   remote: Compressing objects:  40% (6/15)   
remote: Compressing objects:  46% (7/15)   remote: Compressing objects: 
 53% (8/15)   remote: Compressing objects:  60% (9/15)   
remote: Compressing objects:  66% (10/15)   remote: Compressing 
objects:  73% (11/15)   remote: Compressing objects:  80% (12/15)   
remote: Compressing objects:  86% (13/15)   remote: Compressing 
objects:  93% (14/15)   remote: Compressing objects: 100% (15/15)   
remote: Compressing objects: 100% (15/15), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  52% (3181/6116)   Receiving objects:  53% 

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-07-27 Thread Dong Lin
Hey Lucas,

Thanks for the update.

The current KIP propose new broker configs "listeners.for.controller" and
"advertised.listeners.for.controller". This is going to be a big change
since listeners are among the most important configs that every user needs
to change. According to the rejected alternative section, it seems that the
reason to add these two configs is to improve performance when the data
request queue is full rather than for correctness. It should be a very rare
scenario and I am not sure we should add configs for all users just to
improve the performance in such rare scenario.

Also, if the new design is based on the issues which are discovered in the
recent discussion, e.g. out of order processing if we don't use a dedicated
thread for controller request, it may be useful to explain the problem in
the motivation section.

Thanks,
Dong

On Fri, Jul 27, 2018 at 1:28 PM, Lucas Wang  wrote:

> A kind reminder for review of this KIP.
>
> Thank you very much!
> Lucas
>
> On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang 
> wrote:
>
> > Hi All,
> >
> > I've updated the KIP by adding the dedicated endpoints for controller
> > connections,
> > and pinning threads for controller requests.
> > Also I've updated the title of this KIP. Please take a look and let me
> > know your feedback.
> >
> > Thanks a lot for your time!
> > Lucas
> >
> > On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> > gharatmayures...@gmail.com> wrote:
> >
> >> Hi Lucas,
> >> I agree, if we want to go forward with a separate controller plane and
> >> data
> >> plane and completely isolate them, having a separate port for controller
> >> with a separate Acceptor and a Processor sounds ideal to me.
> >>
> >> Thanks,
> >>
> >> Mayuresh
> >>
> >>
> >> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin 
> wrote:
> >>
> >> > Hi Lucas,
> >> >
> >> > Yes, I agree that a dedicated end to end control flow would be ideal.
> >> >
> >> > Thanks,
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang 
> >> wrote:
> >> >
> >> > > Thanks for the comment, Becket.
> >> > > So far, we've been trying to avoid making any request handler thread
> >> > > special.
> >> > > But if we were to follow that path in order to make the two planes
> >> more
> >> > > isolated,
> >> > > what do you think about also having a dedicated processor thread,
> >> > > and dedicated port for the controller?
> >> > >
> >> > > Today one processor thread can handle multiple connections, let's
> say
> >> 100
> >> > > connections
> >> > >
> >> > > represented by connection0, ... connection99, among which
> >> connection0-98
> >> > > are from clients, while connection99 is from
> >> > >
> >> > > the controller. Further let's say after one selector polling, there
> >> are
> >> > > incoming requests on all connections.
> >> > >
> >> > > When the request queue is full, (either the data request being full
> in
> >> > the
> >> > > two queue design, or
> >> > >
> >> > > the one single queue being full in the deque design), the processor
> >> > thread
> >> > > will be blocked first
> >> > >
> >> > > when trying to enqueue the data request from connection0, then
> >> possibly
> >> > > blocked for the data request
> >> > >
> >> > > from connection1, ... etc even though the controller request is
> ready
> >> to
> >> > be
> >> > > enqueued.
> >> > >
> >> > > To solve this problem, it seems we would need to have a separate
> port
> >> > > dedicated to
> >> > >
> >> > > the controller, a dedicated processor thread, a dedicated controller
> >> > > request queue,
> >> > >
> >> > > and pinning of one request handler thread for controller requests.
> >> > >
> >> > > Thanks,
> >> > > Lucas
> >> > >
> >> > >
> >> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin 
> >> > wrote:
> >> > >
> >> > > > Personally I am not fond of the dequeue approach simply because it
> >> is
> >> > > > against the basic idea of isolating the controller plane and data
> >> > plane.
> >> > > > With a single dequeue, theoretically speaking the controller
> >> requests
> >> > can
> >> > > > starve the clients requests. I would prefer the approach with a
> >> > separate
> >> > > > controller request queue and a dedicated controller request
> handler
> >> > > thread.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Jiangjie (Becket) Qin
> >> > > >
> >> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang <
> lucasatu...@gmail.com>
> >> > > wrote:
> >> > > >
> >> > > > > Sure, I can summarize the usage of correlation id. But before I
> do
> >> > > that,
> >> > > > it
> >> > > > > seems
> >> > > > > the same out-of-order processing can also happen to Produce
> >> requests
> >> > > sent
> >> > > > > by producers,
> >> > > > > following the same example you described earlier.
> >> > > > > If that's the case, I think this probably deserves a separate
> doc
> >> and
> >> > > > > design independent of this KIP.
> >> > > > >
> >> > > > > Lucas
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Mon, 

[DISCUSS] add connect-related packages to "What is considered a "major change" that needs a KIP?"

2018-07-27 Thread Chia-Ping Tsai
hi Kafka

There is a section[1] listing the packages which have public interfaces. 
However, it doesn't include any connect-related packages. It would be better to 
have a list used to highlight the public interfaces for connector. Otherwise, 
connector user may misuse the internal interface in their code base. For 
example, TopicAdmin[2] is a nice wrap of AdminClient[3] so I have applied 
TopicAdmin in production code. However, I'm not sure whether TopicAdmin is a 
public interface. It implies that I should remove it from code base since the 
SC of TopicAdmin may be broken in minor release.

Not familiar with the rule of updating "Kafka Improvement Proposals" so I start 
this thread. All suggestions are welcome!

[1] 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-Whatisconsidereda%22majorchange%22thatneedsaKIP?

[2] 
https://github.com/apache/kafka/blob/trunk/connect/runtime/src/main/java/org/apache/kafka/connect/util/TopicAdmin.java

[3] 
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/admin/AdminClient.java

--
Chia-Ping


Build failed in Jenkins: kafka-trunk-jdk10 #328

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H23 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/15)   remote: Compressing objects: 
 13% (2/15)   remote: Compressing objects:  20% (3/15)   
remote: Compressing objects:  26% (4/15)   remote: Compressing objects: 
 33% (5/15)   remote: Compressing objects:  40% (6/15)   
remote: Compressing objects:  46% (7/15)   remote: Compressing objects: 
 53% (8/15)   remote: Compressing objects:  60% (9/15)   
remote: Compressing objects:  66% (10/15)   remote: Compressing 
objects:  73% (11/15)   remote: Compressing objects:  80% (12/15)   
remote: Compressing objects:  86% (13/15)   remote: Compressing 
objects:  93% (14/15)   remote: Compressing objects: 100% (15/15)   
remote: Compressing objects: 100% (15/15), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  52% (3181/6116)   Receiving objects:  53% 

Jenkins build is back to normal : kafka-trunk-jdk8 #2844

2018-07-27 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-27 Thread Dong Lin
Hey Jason,

Thanks for the update! I agree with the current proposal overall. I have
some minor comments related to naming etc.

1) I am not strong and will just leave it here for discussion. Would it be
better to rename "CurrentLeaderEpoch" to "ExpectedLeaderEpoch" for the new
field in the OffsetsForLeaderEpochRequest? The reason is that
"CurrentLeaderEpoch" may not necessarily be true current leader epoch if
the consumer has stale metadata. "ExpectedLeaderEpoch" shows that this
epoch is what consumer expects on the broker which may or may not be the
true value.

2) Currently we add the field "LeaderEpoch" to FetchRequest and the field
"CurrentLeaderEpoch" to OffsetsForLeaderEpochRequest. Given that both
fields are compared with the leaderEpoch in the broker, would it be better
to give them the same name?

3) Currently LogTruncationException.truncationOffset() returns
Optional to user. Should it return
Optional> to handle the scenario
where leaderEpoch of multiple partitions are different from the leaderEpoch
in the broker?

4) Currently LogTruncationException.truncationOffset() returns an Optional
value. Could you explain a bit more when it will return Optional.empty()? I
am trying to understand whether it is simpler and reasonable to
replace Optional.empty()
with OffsetMetadata(offset=last_fetched_offset, leaderEpoch=-1).

5) Do we also need to add a new retriable exception for error code
FENCED_LEADER_EPOCH? And do we need to define both FENCED_LEADER_EPOCH
and UNKNOWN_LEADER_EPOCH.
It seems that the current KIP uses these two error codes in the same way
and the exception for these two error codes is not exposed to the user.
Maybe we should combine them into one error, e.g. INVALID_LEADER_EPOCH?

6) For users who has turned off auto offset reset, when consumer.poll()
throw LogTruncationException, it seems that user will most likely call
seekToCommitted(offset,
leaderEpoch) where offset and leaderEpoch are obtained from
LogTruncationException.truncationOffset(). In this case, the offset used
here is not committed, which is inconsistent from the method name
seekToCommitted(...). Would it be better to rename the method to e.g.
seekToLastConsumedMessage()?

7) Per point 3 in Jun's comment, would it be useful to explicitly specify
in the KIP that we will log the truncation event if user has turned on auto
offset reset policy?


Thanks,
Dong


On Fri, Jul 27, 2018 at 12:39 PM, Jason Gustafson 
wrote:

> Thanks Anna, you are right on both points. I updated the KIP.
>
> -Jason
>
> On Thu, Jul 26, 2018 at 2:08 PM, Anna Povzner  wrote:
>
> > Hi Jason,
> >
> > Thanks for the update. I agree with the current proposal.
> >
> > Two minor comments:
> > 1) In “API Changes” section, first paragraph says that “users can catch
> the
> > more specific exception type and use the new `seekToNearest()` API
> defined
> > below.”. Since LogTruncationException “will include the partitions that
> > were truncated and the offset of divergence”., shouldn’t the client use
> > seek(offset) to seek to the offset of divergence in response to the
> > exception?
> > 2) In “Protocol Changes” section, OffsetsForLeaderEpoch subsection says
> > “Note
> > that consumers will send a sentinel value (-1) for the current epoch and
> > the broker will simply disregard that validation.”. Is that still true
> with
> > MetadataResponse containing leader epoch?
> >
> > Thanks,
> > Anna
> >
> > On Wed, Jul 25, 2018 at 1:44 PM Jason Gustafson 
> > wrote:
> >
> > > Hi All,
> > >
> > > I have made some updates to the KIP. As many of you know, a side
> project
> > of
> > > mine has been specifying the Kafka replication protocol in TLA. You can
> > > check out the code here if you are interested:
> > > https://github.com/hachikuji/kafka-specification. In addition to
> > > uncovering
> > > a couple unknown bugs in the replication protocol (e.g.
> > > https://issues.apache.org/jira/browse/KAFKA-7128), this has helped me
> > > validate the behavior in this KIP. In fact, the original version I
> > proposed
> > > had a weakness. I initially suggested letting the leader validate the
> > > expected epoch at the fetch offset. This made sense for the consumer in
> > the
> > > handling of unclean leader election, but it was not strong enough to
> > > protect the follower in all cases. In order to make advancement of the
> > high
> > > watermark safe, for example, the leader actually needs to be sure that
> > > every follower in the ISR matches its own epoch.
> > >
> > > I attempted to fix this problem by treating the epoch in the fetch
> > request
> > > slightly differently for consumers and followers. For consumers, it
> would
> > > be the expected epoch of the record at the fetch offset, and the leader
> > > would raise a LOG_TRUNCATION error if the expectation failed. For
> > > followers, it would be the current epoch and the leader would require
> > that
> > > it match its own epoch. This was unsatisfying both because of the
> > > inconsistency in behavior and 

Build failed in Jenkins: kafka-trunk-jdk10 #327

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H23 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/15)   remote: Compressing objects: 
 13% (2/15)   remote: Compressing objects:  20% (3/15)   
remote: Compressing objects:  26% (4/15)   remote: Compressing objects: 
 33% (5/15)   remote: Compressing objects:  40% (6/15)   
remote: Compressing objects:  46% (7/15)   remote: Compressing objects: 
 53% (8/15)   remote: Compressing objects:  60% (9/15)   
remote: Compressing objects:  66% (10/15)   remote: Compressing 
objects:  73% (11/15)   remote: Compressing objects:  80% (12/15)   
remote: Compressing objects:  86% (13/15)   remote: Compressing 
objects:  93% (14/15)   remote: Compressing objects: 100% (15/15)   
remote: Compressing objects: 100% (15/15), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  52% (3181/6116)   Receiving objects:  53% 

Build failed in Jenkins: kafka-trunk-jdk10 #326

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H23 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/15)   remote: Compressing objects: 
 13% (2/15)   remote: Compressing objects:  20% (3/15)   
remote: Compressing objects:  26% (4/15)   remote: Compressing objects: 
 33% (5/15)   remote: Compressing objects:  40% (6/15)   
remote: Compressing objects:  46% (7/15)   remote: Compressing objects: 
 53% (8/15)   remote: Compressing objects:  60% (9/15)   
remote: Compressing objects:  66% (10/15)   remote: Compressing 
objects:  73% (11/15)   remote: Compressing objects:  80% (12/15)   
remote: Compressing objects:  86% (13/15)   remote: Compressing 
objects:  93% (14/15)   remote: Compressing objects: 100% (15/15)   
remote: Compressing objects: 100% (15/15), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  52% (3181/6116)   Receiving objects:  53% 

Build failed in Jenkins: kafka-trunk-jdk10 #325

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H23 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/16)   remote: Compressing objects: 
 12% (2/16)   remote: Compressing objects:  18% (3/16)   
remote: Compressing objects:  25% (4/16)   remote: Compressing objects: 
 31% (5/16)   remote: Compressing objects:  37% (6/16)   
remote: Compressing objects:  43% (7/16)   remote: Compressing objects: 
 50% (8/16)   remote: Compressing objects:  56% (9/16)   
remote: Compressing objects:  62% (10/16)   remote: Compressing 
objects:  68% (11/16)   remote: Compressing objects:  75% (12/16)   
remote: Compressing objects:  81% (13/16)   remote: Compressing 
objects:  87% (14/16)   remote: Compressing objects:  93% (15/16)   
remote: Compressing objects: 100% (16/16)   remote: Compressing 
objects: 100% (16/16), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-27 Thread Jason Gustafson
Hey Boyang,

Thanks for the KIP. I think my main question is in the same vein as James'.
The problem is that the coordinator needs to be able to identify which
instance of a particular memberId is the active one. For EOS, each
transactionalId gets an epoch. When a new producer is started, it bumps the
epoch which allows the transaction coordinator to fence off any zombie
instances which may try to continue doing work with the old epoch. It seems
like we need a similar protection for consumer members.

Suppose for example that we distinguish between a registration id which is
provided by the user and a member id which is assigned uniquely by the
coordinator. In the JoinGroup request, both the registration id and the
member id are provided. When a consumer is first started, it doesn't know
the memberId, so it it provides only the registration id. The coordinator
can then assign a new memberId and invalidate the previous one that was
associated with the registration id. This would then fence off the previous
instance which was still trying to use the member id.

Taking a little bit of a step back, I think the main observation in this
KIP is that applications with heavy local state need to have a strong bias
toward being able to reuse that state. It is a bit like Kafka itself in the
sense that a replica is not moved just because the broker is shutdown as
the cost of moving the log is extremely high. I'm wondering if we need to
think about streams applications in a similar way. Should there be a static
notion of the members of the group so that streams can make rebalancing
decisions more easily without depending so heavily on transient membership?
I feel the hacks we've put in place in some cases to avoid rebalances are a
bit brittle. Delaying group joining for example is an example of this. If
you knew ahead of time who the stable members of the group were, then this
would not be needed. Anyway, just a thought.

Thanks,
Jason



On Fri, Jul 27, 2018 at 1:58 PM, James Cheng  wrote:

> When you say that it will "break", what does this breakage look like? Will
> the consumer-group be non-functional? Will just those instances be
> non-functional? Or will the group be functional, but the rebalancing be
> non-optimal and require more round-trips/data-transfer? (similar to the
> current algorithm)
>
> I'm trying to assess the potential for user-error and the impact of
> user-error.
>
> -James
>
> > On Jul 27, 2018, at 11:25 AM, Boyang Chen  wrote:
> >
> > Hey James,
> >
> >
> > the algorithm is relying on client side to provide unique consumer
> member id. It will break unless we enforce some sort of validation (host +
> port) on the server side. To simplify the first version, we do not plan to
> enforce validation. A good comparison would be the EOS producer which is in
> charge of generating unique transaction id sequence. IMO for broker logic,
> the tolerance of client side error is not unlimited.
> >
> >
> > Thank you!
> >
> >
> > 
> > From: James Cheng 
> > Sent: Saturday, July 28, 2018 1:26 AM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by
> specifying member id
> >
> >
> >> On Jul 26, 2018, at 11:09 PM, Guozhang Wang  wrote:
> >>
> >> Hi Boyang,
> >>
> >> Thanks for the proposed KIP. I made a pass over the wiki and here are
> some
> >> comments / questions:
> >>
> >> 1. In order to preserve broker compatibility, we need to make sure the
> >> broker version discovery logic can be integrated with this new logic.
> I.e.
> >> if a newer versioned consumer is talking to an older versioned broker
> who
> >> does not recognize V4, the client needs to downgrade its
> JoinGroupRequest
> >> version to V3 and not setting the member-id specifically. You can take a
> >> look at the ApiVersionsRequest and see how to work with it.
> >>
> >> 2. There may exist some manners to validate that two different clients
> do
> >> not send with the same member id, for example if we pass along the
> >> host:port information from KafkaApis to the GroupCoordinator interface.
> But
> >> I think this is overly complicated the logic and may not worthwhile than
> >> relying on users to specify unique member ids.
> >
> > Boyang,
> >
> > Thanks for the KIP! How will the algorithm behave if multiple consumers
> provide the same member id?
> >
> > -James
> >
> >> 3. Minor: you would need to bumping up the version of JoinGroupResponse
> to
> >> V4 as well.
> >>
> >> 4. Minor: in the wiki page, you need to specify the actual string value
> for
> >> `MEMBER_ID`, for example "member.id".
> >>
> >> 5. When this additional config it specified by users, we should consider
> >> setting the default of internal `LEAVE_GROUP_ON_CLOSE_CONFIG` to false,
> >> since otherwise its effectiveness would be less.
> >>
> >>
> >> Guozhang
> >>
> >>
> >>
> >>> On Thu, Jul 26, 2018 at 9:20 PM, Boyang Chen 
> wrote:
> >>>
> >>> Hey friends,
> >>>
> >>>
> >>> I would like to open a 

Build failed in Jenkins: kafka-trunk-jdk10 #324

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H30 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 24d0d3351a43f52f6f688143672d7c57c5f19595
error: Could not read 1e98b2aa5027eb60ee1a112a7f8ec45b9c47
error: Could not read 689bac3c6fcae927ccee89c8d4fd54c2fae83be1
remote: Counting objects: 2177, done.
remote: Compressing objects:   9% (1/11)   remote: Compressing objects: 
 18% (2/11)   remote: Compressing objects:  27% (3/11)   
remote: Compressing objects:  36% (4/11)   remote: Compressing objects: 
 45% (5/11)   remote: Compressing objects:  54% (6/11)   
remote: Compressing objects:  63% (7/11)   remote: Compressing objects: 
 72% (8/11)   remote: Compressing objects:  81% (9/11)   
remote: Compressing objects:  90% (10/11)   remote: Compressing 
objects: 100% (11/11)   remote: Compressing objects: 100% (11/11), 
done.
Receiving objects:   0% (1/2177)   Receiving objects:   1% (22/2177)   
Receiving objects:   2% (44/2177)   Receiving objects:   3% (66/2177)   
Receiving objects:   4% (88/2177)   Receiving objects:   5% (109/2177)   
Receiving objects:   6% (131/2177)   Receiving objects:   7% (153/2177)   
Receiving objects:   8% (175/2177)   Receiving objects:   9% (196/2177)   
Receiving objects:  10% (218/2177)   Receiving objects:  11% (240/2177)   
Receiving objects:  12% (262/2177)   Receiving objects:  13% (284/2177)   
Receiving objects:  14% (305/2177)   Receiving objects:  15% (327/2177)   
Receiving objects:  16% (349/2177)   Receiving objects:  17% (371/2177)   
Receiving objects:  18% (392/2177)   Receiving objects:  19% (414/2177)   
Receiving objects:  20% (436/2177)   Receiving objects:  21% (458/2177)   
Receiving objects:  22% (479/2177)   Receiving objects:  23% (501/2177)   
Receiving objects:  24% (523/2177)   Receiving objects:  25% (545/2177)   
Receiving objects:  26% (567/2177)   Receiving objects:  27% (588/2177)   
Receiving objects:  28% (610/2177)   Receiving objects:  29% (632/2177)   
Receiving objects:  30% (654/2177)   Receiving objects:  31% (675/2177)   
Receiving objects:  32% (697/2177)   Receiving objects:  33% (719/2177)   
Receiving objects:  34% (741/2177)   Receiving objects:  35% (762/2177)   
Receiving objects:  36% (784/2177)   Receiving objects:  37% (806/2177)   
Receiving objects:  38% (828/2177)   Receiving objects:  39% (850/2177)   
Receiving objects:  40% (871/2177)   Receiving objects:  41% (893/2177)   
Receiving objects:  42% (915/2177)   Receiving objects:  43% (937/2177)   
Receiving objects:  44% (958/2177)   Receiving objects:  45% (980/2177)   
Receiving objects:  46% (1002/2177)   Receiving objects:  47% (1024/2177)   
Receiving objects:  48% (1045/2177)   Receiving objects:  49% (1067/2177)   
Receiving objects:  50% (1089/2177)   Receiving objects:  51% (/2177)   
Receiving objects:  52% (1133/2177)   Receiving objects:  53% (1154/2177)   
Receiving objects:  54% (1176/2177)   Receiving objects:  55% (1198/2177)   
Receiving objects:  56% (1220/2177)   Receiving objects:  57% (1241/2177)   
Receiving objects:  58% (1263/2177)   Receiving objects:  59% (1285/2177)   

Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-07-27 Thread Stanislav Kozlovski
Hey, Ray

Thanks for pointing that out, it's fixed now

Best,
Stanislav

On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang  wrote:

> Thanks.  Can you fix the link in the "KIPs under discussion" table on
> the main KIP landing page
> <
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#>?
>
> I tried, but the Wiki won't let me.
>
> -Ray
>
> On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:
> > Hey guys,
> >
> > @Colin - good point. I added some sentences mentioning recent
> improvements
> > in the introductory section.
> >
> > *Disk Failure* - I tend to agree with what Colin said - once a disk
> fails,
> > you don't want to work with it again. As such, I've changed my mind and
> > believe that we should mark the LogDir (assume its a disk) as offline on
> > the first `IOException` encountered. This is the LogCleaner's current
> > behavior. We shouldn't change that.
> >
> > *Respawning Threads* - I believe we should never re-spawn a thread. The
> > correct approach in my mind is to either have it stay dead or never let
> it
> > die in the first place.
> >
> > *Uncleanable-partition-names metric* - Colin is right, this metric is
> > unneeded. Users can monitor the `uncleanable-partitions-count` metric and
> > inspect logs.
> >
> >
> > Hey Ray,
> >
> >> 2) I'm 100% with James in agreement with setting up the LogCleaner to
> >> skip over problematic partitions instead of dying.
> > I think we can do this for every exception that isn't `IOException`. This
> > will future-proof us against bugs in the system and potential other
> errors.
> > Protecting yourself against unexpected failures is always a good thing in
> > my mind, but I also think that protecting yourself against bugs in the
> > software is sort of clunky. What does everybody think about this?
> >
> >> 4) The only improvement I can think of is that if such an
> >> error occurs, then have the option (configuration setting?) to create a
> >> .skip file (or something similar).
> > This is a good suggestion. Have others also seen corruption be generally
> > tied to the same segment?
> >
> > On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah 
> wrote:
> >
> >> For the cleaner thread specifically, I do not think respawning will
> help at
> >> all because we are more than likely to run into the same issue again
> which
> >> would end up crashing the cleaner. Retrying makes sense for transient
> >> errors or when you believe some part of the system could have healed
> >> itself, both of which I think are not true for the log cleaner.
> >>
> >> On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino 
> wrote:
> >>
> >>> << >> an
> >>> infinite loop which consumes resources and fires off continuous log
> >>> messages.
> >>> Hi Colin.  In case it could be relevant, one way to mitigate this
> effect
> >> is
> >>> to implement a backoff mechanism (if a second respawn is to occur then
> >> wait
> >>> for 1 minute before doing it; then if a third respawn is to occur wait
> >> for
> >>> 2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to some
> max
> >>> wait time).
> >>>
> >>> I have no opinion on whether respawn is appropriate or not in this
> >> context,
> >>> but a mitigation like the increasing backoff described above may be
> >>> relevant in weighing the pros and cons.
> >>>
> >>> Ron
> >>>
> >>> On Wed, Jul 25, 2018 at 1:26 PM Colin McCabe 
> wrote:
> >>>
>  On Mon, Jul 23, 2018, at 23:20, James Cheng wrote:
> > Hi Stanislav! Thanks for this KIP!
> >
> > I agree that it would be good if the LogCleaner were more tolerant of
> > errors. Currently, as you said, once it dies, it stays dead.
> >
> > Things are better now than they used to be. We have the metric
> >kafka.log:type=LogCleanerManager,name=time-since-last-run-ms
> > which we can use to tell us if the threads are dead. And as of 1.1.0,
> >>> we
> > have KIP-226, which allows you to restart the log cleaner thread,
> > without requiring a broker restart.
> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration
> > <
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration
> 
> > I've only read about this, I haven't personally tried it.
>  Thanks for pointing this out, James!  Stanislav, we should probably
> >> add a
>  sentence or two mentioning the KIP-226 changes somewhere in the KIP.
> >>> Maybe
>  in the intro section?
> 
>  I think it's clear that requiring the users to manually restart the
> log
>  cleaner is not a very good solution.  But it's good to know that it's
> a
>  possibility on some older releases.
> 
> > Some comments:
> > * I like the idea of having the log cleaner continue to clean as many
> > partitions as it can, skipping over the problematic ones if possible.
> >
> > * If the log cleaner thread dies, I think it should automatically be
> > revived. Your KIP 

Build failed in Jenkins: kafka-trunk-jdk8 #2843

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H32 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6116, done.
remote: Compressing objects:   6% (1/16)   remote: Compressing objects: 
 12% (2/16)   remote: Compressing objects:  18% (3/16)   
remote: Compressing objects:  25% (4/16)   remote: Compressing objects: 
 31% (5/16)   remote: Compressing objects:  37% (6/16)   
remote: Compressing objects:  43% (7/16)   remote: Compressing objects: 
 50% (8/16)   remote: Compressing objects:  56% (9/16)   
remote: Compressing objects:  62% (10/16)   remote: Compressing 
objects:  68% (11/16)   remote: Compressing objects:  75% (12/16)   
remote: Compressing objects:  81% (13/16)   remote: Compressing 
objects:  87% (14/16)   remote: Compressing objects:  93% (15/16)   
remote: Compressing objects: 100% (16/16)   remote: Compressing 
objects: 100% (16/16), done.
Receiving objects:   0% (1/6116)   Receiving objects:   1% (62/6116)   
Receiving objects:   2% (123/6116)   Receiving objects:   3% (184/6116)   
Receiving objects:   4% (245/6116)   Receiving objects:   5% (306/6116)   
Receiving objects:   6% (367/6116)   Receiving objects:   7% (429/6116)   
Receiving objects:   8% (490/6116)   Receiving objects:   9% (551/6116)   
Receiving objects:  10% (612/6116)   Receiving objects:  11% (673/6116)   
Receiving objects:  12% (734/6116)   Receiving objects:  13% (796/6116)   
Receiving objects:  14% (857/6116)   Receiving objects:  15% (918/6116)   
Receiving objects:  16% (979/6116)   Receiving objects:  17% (1040/6116)   
Receiving objects:  18% (1101/6116)   Receiving objects:  19% (1163/6116)   
Receiving objects:  20% (1224/6116)   Receiving objects:  21% (1285/6116)   
Receiving objects:  22% (1346/6116)   Receiving objects:  23% (1407/6116)   
Receiving objects:  24% (1468/6116)   Receiving objects:  25% (1529/6116)   
Receiving objects:  26% (1591/6116)   Receiving objects:  27% (1652/6116)   
Receiving objects:  28% (1713/6116)   Receiving objects:  29% (1774/6116)   
Receiving objects:  30% (1835/6116)   Receiving objects:  31% (1896/6116)   
Receiving objects:  32% (1958/6116)   Receiving objects:  33% (2019/6116)   
Receiving objects:  34% (2080/6116)   Receiving objects:  35% (2141/6116)   
Receiving objects:  36% (2202/6116)   Receiving objects:  37% (2263/6116)   
Receiving objects:  38% (2325/6116)   Receiving objects:  39% (2386/6116)   
Receiving objects:  40% (2447/6116)   Receiving objects:  41% (2508/6116)   
Receiving objects:  42% (2569/6116)   Receiving objects:  43% (2630/6116)   
Receiving objects:  44% (2692/6116)   Receiving objects:  45% (2753/6116)   
Receiving objects:  46% (2814/6116)   Receiving objects:  47% (2875/6116)   
Receiving objects:  48% (2936/6116)   Receiving objects:  49% (2997/6116)   
Receiving objects:  50% (3058/6116)   Receiving objects:  51% (3120/6116)   
Receiving objects:  

Build failed in Jenkins: kafka-trunk-jdk10 #323

2018-07-27 Thread Apache Jenkins Server
See 

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H20 (ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
ERROR: Error fetching remote repo 'origin'
hudson.plugins.git.GitException: Failed to fetch from 
https://github.com/apache/kafka.git
at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
at hudson.scm.SCM.checkout(SCM.java:504)
at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
at jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
at 
hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
at hudson.model.Run.execute(Run.java:1794)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Caused by: hudson.plugins.git.GitException: Command "git fetch --tags 
--progress https://github.com/apache/kafka.git 
+refs/heads/*:refs/remotes/origin/*" returned status code 128:
stdout: 
stderr: error: Could not read 0f3affc0f40751dc8fd064b36b6e859728f63e37
error: Could not read 95fbb2e03f4fe79737c71632e0ef2dfdcfb85a69
error: Could not read 08c465028d057ac23cdfe6d57641fe40240359dd
remote: Counting objects: 6476, done.
remote: Compressing objects:   5% (1/18)   remote: Compressing objects: 
 11% (2/18)   remote: Compressing objects:  16% (3/18)   
remote: Compressing objects:  22% (4/18)   remote: Compressing objects: 
 27% (5/18)   remote: Compressing objects:  33% (6/18)   
remote: Compressing objects:  38% (7/18)   remote: Compressing objects: 
 44% (8/18)   remote: Compressing objects:  50% (9/18)   
remote: Compressing objects:  55% (10/18)   remote: Compressing 
objects:  61% (11/18)   remote: Compressing objects:  66% (12/18)   
remote: Compressing objects:  72% (13/18)   remote: Compressing 
objects:  77% (14/18)   remote: Compressing objects:  83% (15/18)   
remote: Compressing objects:  88% (16/18)   remote: Compressing 
objects:  94% (17/18)   remote: Compressing objects: 100% (18/18)   
remote: Compressing objects: 100% (18/18), done.
Receiving objects:   0% (1/6476)   Receiving objects:   1% (65/6476)   
Receiving objects:   2% (130/6476)   Receiving objects:   3% (195/6476)   
Receiving objects:   4% (260/6476)   Receiving objects:   5% (324/6476)   
Receiving objects:   6% (389/6476)   Receiving objects:   7% (454/6476)   
Receiving objects:   8% (519/6476)   Receiving objects:   9% (583/6476)   
Receiving objects:  10% (648/6476)   Receiving objects:  11% (713/6476)   
Receiving objects:  12% (778/6476)   Receiving objects:  13% (842/6476)   
Receiving objects:  14% (907/6476)   Receiving objects:  15% (972/6476)   
Receiving objects:  16% (1037/6476)   Receiving objects:  17% (1101/6476)   
Receiving objects:  18% (1166/6476)   Receiving objects:  19% (1231/6476)   
Receiving objects:  20% (1296/6476)   Receiving objects:  21% (1360/6476)   
Receiving objects:  22% (1425/6476)   Receiving objects:  23% (1490/6476)   
Receiving objects:  24% (1555/6476)   Receiving objects:  25% (1619/6476)   
Receiving objects:  26% (1684/6476)   Receiving objects:  27% (1749/6476)   
Receiving objects:  28% (1814/6476)   Receiving objects:  29% (1879/6476)   
Receiving objects:  30% (1943/6476)   Receiving objects:  31% (2008/6476)   
Receiving objects:  32% (2073/6476)   Receiving objects:  33% (2138/6476)   
Receiving objects:  34% (2202/6476)   Receiving objects:  35% (2267/6476)   
Receiving objects:  36% (2332/6476)   Receiving objects:  37% (2397/6476)   
Receiving objects:  38% (2461/6476)   Receiving objects:  39% (2526/6476)   
Receiving objects:  40% (2591/6476)   Receiving objects:  41% (2656/6476)   
Receiving objects:  42% (2720/6476)   Receiving objects:  43% (2785/6476)   
Receiving objects:  44% (2850/6476)   Receiving objects:  45% (2915/6476)   
Receiving objects:  46% (2979/6476)   Receiving objects:  47% (3044/6476)   
Receiving objects:  48% (3109/6476)   Receiving objects:  49% 

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-27 Thread James Cheng
When you say that it will "break", what does this breakage look like? Will the 
consumer-group be non-functional? Will just those instances be non-functional? 
Or will the group be functional, but the rebalancing be non-optimal and require 
more round-trips/data-transfer? (similar to the current algorithm)

I'm trying to assess the potential for user-error and the impact of user-error.

-James

> On Jul 27, 2018, at 11:25 AM, Boyang Chen  wrote:
> 
> Hey James,
> 
> 
> the algorithm is relying on client side to provide unique consumer member id. 
> It will break unless we enforce some sort of validation (host + port) on the 
> server side. To simplify the first version, we do not plan to enforce 
> validation. A good comparison would be the EOS producer which is in charge of 
> generating unique transaction id sequence. IMO for broker logic, the 
> tolerance of client side error is not unlimited.
> 
> 
> Thank you!
> 
> 
> 
> From: James Cheng 
> Sent: Saturday, July 28, 2018 1:26 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
> specifying member id
> 
> 
>> On Jul 26, 2018, at 11:09 PM, Guozhang Wang  wrote:
>> 
>> Hi Boyang,
>> 
>> Thanks for the proposed KIP. I made a pass over the wiki and here are some
>> comments / questions:
>> 
>> 1. In order to preserve broker compatibility, we need to make sure the
>> broker version discovery logic can be integrated with this new logic. I.e.
>> if a newer versioned consumer is talking to an older versioned broker who
>> does not recognize V4, the client needs to downgrade its JoinGroupRequest
>> version to V3 and not setting the member-id specifically. You can take a
>> look at the ApiVersionsRequest and see how to work with it.
>> 
>> 2. There may exist some manners to validate that two different clients do
>> not send with the same member id, for example if we pass along the
>> host:port information from KafkaApis to the GroupCoordinator interface. But
>> I think this is overly complicated the logic and may not worthwhile than
>> relying on users to specify unique member ids.
> 
> Boyang,
> 
> Thanks for the KIP! How will the algorithm behave if multiple consumers 
> provide the same member id?
> 
> -James
> 
>> 3. Minor: you would need to bumping up the version of JoinGroupResponse to
>> V4 as well.
>> 
>> 4. Minor: in the wiki page, you need to specify the actual string value for
>> `MEMBER_ID`, for example "member.id".
>> 
>> 5. When this additional config it specified by users, we should consider
>> setting the default of internal `LEAVE_GROUP_ON_CLOSE_CONFIG` to false,
>> since otherwise its effectiveness would be less.
>> 
>> 
>> Guozhang
>> 
>> 
>> 
>>> On Thu, Jul 26, 2018 at 9:20 PM, Boyang Chen  wrote:
>>> 
>>> Hey friends,
>>> 
>>> 
>>> I would like to open a discussion thread on KIP-345:
>>> 
>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A
>>> +Reduce+multiple+consumer+rebalances+by+specifying+member+id
>>> 
>>> 
>>> This KIP is trying to resolve multiple rebalances by maintaining the
>>> consumer member id across rebalance generations. I have verified the theory
>>> on our internal Stream application, and it could reduce rebalance time to a
>>> few seconds when service is rolling restart.
>>> 
>>> 
>>> Let me know your thoughts, thank you!
>>> 
>>> 
>>> Best,
>>> 
>>> Boyang
>>> 
>> 
>> 
>> 
>> --
>> -- Guozhang



Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-07-27 Thread Ray Chiang
Thanks.  Can you fix the link in the "KIPs under discussion" table on 
the main KIP landing page 
?  
I tried, but the Wiki won't let me.


-Ray

On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:

Hey guys,

@Colin - good point. I added some sentences mentioning recent improvements
in the introductory section.

*Disk Failure* - I tend to agree with what Colin said - once a disk fails,
you don't want to work with it again. As such, I've changed my mind and
believe that we should mark the LogDir (assume its a disk) as offline on
the first `IOException` encountered. This is the LogCleaner's current
behavior. We shouldn't change that.

*Respawning Threads* - I believe we should never re-spawn a thread. The
correct approach in my mind is to either have it stay dead or never let it
die in the first place.

*Uncleanable-partition-names metric* - Colin is right, this metric is
unneeded. Users can monitor the `uncleanable-partitions-count` metric and
inspect logs.


Hey Ray,


2) I'm 100% with James in agreement with setting up the LogCleaner to
skip over problematic partitions instead of dying.

I think we can do this for every exception that isn't `IOException`. This
will future-proof us against bugs in the system and potential other errors.
Protecting yourself against unexpected failures is always a good thing in
my mind, but I also think that protecting yourself against bugs in the
software is sort of clunky. What does everybody think about this?


4) The only improvement I can think of is that if such an
error occurs, then have the option (configuration setting?) to create a
.skip file (or something similar).

This is a good suggestion. Have others also seen corruption be generally
tied to the same segment?

On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah  wrote:


For the cleaner thread specifically, I do not think respawning will help at
all because we are more than likely to run into the same issue again which
would end up crashing the cleaner. Retrying makes sense for transient
errors or when you believe some part of the system could have healed
itself, both of which I think are not true for the log cleaner.

On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino  wrote:


<<
an

infinite loop which consumes resources and fires off continuous log
messages.
Hi Colin.  In case it could be relevant, one way to mitigate this effect

is

to implement a backoff mechanism (if a second respawn is to occur then

wait

for 1 minute before doing it; then if a third respawn is to occur wait

for

2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to some max
wait time).

I have no opinion on whether respawn is appropriate or not in this

context,

but a mitigation like the increasing backoff described above may be
relevant in weighing the pros and cons.

Ron

On Wed, Jul 25, 2018 at 1:26 PM Colin McCabe  wrote:


On Mon, Jul 23, 2018, at 23:20, James Cheng wrote:

Hi Stanislav! Thanks for this KIP!

I agree that it would be good if the LogCleaner were more tolerant of
errors. Currently, as you said, once it dies, it stays dead.

Things are better now than they used to be. We have the metric
   kafka.log:type=LogCleanerManager,name=time-since-last-run-ms
which we can use to tell us if the threads are dead. And as of 1.1.0,

we

have KIP-226, which allows you to restart the log cleaner thread,
without requiring a broker restart.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration



I've only read about this, I haven't personally tried it.

Thanks for pointing this out, James!  Stanislav, we should probably

add a

sentence or two mentioning the KIP-226 changes somewhere in the KIP.

Maybe

in the intro section?

I think it's clear that requiring the users to manually restart the log
cleaner is not a very good solution.  But it's good to know that it's a
possibility on some older releases.


Some comments:
* I like the idea of having the log cleaner continue to clean as many
partitions as it can, skipping over the problematic ones if possible.

* If the log cleaner thread dies, I think it should automatically be
revived. Your KIP attempts to do that by catching exceptions during
execution, but I think we should go all the way and make sure that a

new

one gets created, if the thread ever dies.

This is inconsistent with the way the rest of Kafka works.  We don't
automatically re-create other threads in the broker if they terminate.

In

general, if there is a serious bug in the code, respawning threads is
likely to make things worse, by putting you in an infinite loop which
consumes resources and fires off continuous log messages.


* It might be worth trying to re-clean the uncleanable partitions.

I've

seen cases where an uncleanable partition later became cleanable. I
unfortunately don't remember how 

Re: [DISCUSS] KIP-291: Have separate queues for control requests and data requests

2018-07-27 Thread Lucas Wang
A kind reminder for review of this KIP.

Thank you very much!
Lucas

On Wed, Jul 25, 2018 at 10:23 PM, Lucas Wang  wrote:

> Hi All,
>
> I've updated the KIP by adding the dedicated endpoints for controller
> connections,
> and pinning threads for controller requests.
> Also I've updated the title of this KIP. Please take a look and let me
> know your feedback.
>
> Thanks a lot for your time!
> Lucas
>
> On Tue, Jul 24, 2018 at 10:19 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
>> Hi Lucas,
>> I agree, if we want to go forward with a separate controller plane and
>> data
>> plane and completely isolate them, having a separate port for controller
>> with a separate Acceptor and a Processor sounds ideal to me.
>>
>> Thanks,
>>
>> Mayuresh
>>
>>
>> On Mon, Jul 23, 2018 at 11:04 PM Becket Qin  wrote:
>>
>> > Hi Lucas,
>> >
>> > Yes, I agree that a dedicated end to end control flow would be ideal.
>> >
>> > Thanks,
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On Tue, Jul 24, 2018 at 1:05 PM, Lucas Wang 
>> wrote:
>> >
>> > > Thanks for the comment, Becket.
>> > > So far, we've been trying to avoid making any request handler thread
>> > > special.
>> > > But if we were to follow that path in order to make the two planes
>> more
>> > > isolated,
>> > > what do you think about also having a dedicated processor thread,
>> > > and dedicated port for the controller?
>> > >
>> > > Today one processor thread can handle multiple connections, let's say
>> 100
>> > > connections
>> > >
>> > > represented by connection0, ... connection99, among which
>> connection0-98
>> > > are from clients, while connection99 is from
>> > >
>> > > the controller. Further let's say after one selector polling, there
>> are
>> > > incoming requests on all connections.
>> > >
>> > > When the request queue is full, (either the data request being full in
>> > the
>> > > two queue design, or
>> > >
>> > > the one single queue being full in the deque design), the processor
>> > thread
>> > > will be blocked first
>> > >
>> > > when trying to enqueue the data request from connection0, then
>> possibly
>> > > blocked for the data request
>> > >
>> > > from connection1, ... etc even though the controller request is ready
>> to
>> > be
>> > > enqueued.
>> > >
>> > > To solve this problem, it seems we would need to have a separate port
>> > > dedicated to
>> > >
>> > > the controller, a dedicated processor thread, a dedicated controller
>> > > request queue,
>> > >
>> > > and pinning of one request handler thread for controller requests.
>> > >
>> > > Thanks,
>> > > Lucas
>> > >
>> > >
>> > > On Mon, Jul 23, 2018 at 6:00 PM, Becket Qin 
>> > wrote:
>> > >
>> > > > Personally I am not fond of the dequeue approach simply because it
>> is
>> > > > against the basic idea of isolating the controller plane and data
>> > plane.
>> > > > With a single dequeue, theoretically speaking the controller
>> requests
>> > can
>> > > > starve the clients requests. I would prefer the approach with a
>> > separate
>> > > > controller request queue and a dedicated controller request handler
>> > > thread.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jiangjie (Becket) Qin
>> > > >
>> > > > On Tue, Jul 24, 2018 at 8:16 AM, Lucas Wang 
>> > > wrote:
>> > > >
>> > > > > Sure, I can summarize the usage of correlation id. But before I do
>> > > that,
>> > > > it
>> > > > > seems
>> > > > > the same out-of-order processing can also happen to Produce
>> requests
>> > > sent
>> > > > > by producers,
>> > > > > following the same example you described earlier.
>> > > > > If that's the case, I think this probably deserves a separate doc
>> and
>> > > > > design independent of this KIP.
>> > > > >
>> > > > > Lucas
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Mon, Jul 23, 2018 at 12:39 PM, Dong Lin 
>> > > wrote:
>> > > > >
>> > > > > > Hey Lucas,
>> > > > > >
>> > > > > > Could you update the KIP if you are confident with the approach
>> > which
>> > > > > uses
>> > > > > > correlation id? The idea around correlation id is kind of
>> scattered
>> > > > > across
>> > > > > > multiple emails. It will be useful if other reviews can read the
>> > KIP
>> > > to
>> > > > > > understand the latest proposal.
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Dong
>> > > > > >
>> > > > > > On Mon, Jul 23, 2018 at 12:32 PM, Mayuresh Gharat <
>> > > > > > gharatmayures...@gmail.com> wrote:
>> > > > > >
>> > > > > > > I like the idea of the dequeue implementation by Lucas. This
>> will
>> > > > help
>> > > > > us
>> > > > > > > avoid additional queue for controller and additional configs
>> in
>> > > > Kafka.
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Mayuresh
>> > > > > > >
>> > > > > > > On Sun, Jul 22, 2018 at 2:58 AM Becket Qin <
>> becket@gmail.com
>> > >
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Jun,
>> > > > > > > >
>> > > > > > > > The usage of correlation ID might still be useful to address
>> > the
>> > > > > cases
>> > > > > > > > 

Re: [VOTE] 2.0.0 RC3

2018-07-27 Thread Jason Gustafson
+1

I did the quickstart and verified the documentation (upgrade notes and
notable changes). Thanks Rajini!

On Fri, Jul 27, 2018 at 8:10 AM, Gwen Shapira  wrote:

> +1
>
> Quickstart on binaries, built from sources, verified signatures -- it
> really is Rajini. Thank you!
>
> Gwen
>
> On Thu, Jul 26, 2018 at 10:27 PM, Guozhang Wang 
> wrote:
>
> > +1.
> >
> > Validated the following:
> >
> > 1. quick start on binary (2.12)
> > 2. unit test on source
> > 3. javadoc
> > 4. web doc
> > 5. included jars (2.12).
> >
> >
> > Thanks Rajini!
> >
> >
> > Guozhang
> >
> >
> > On Wed, Jul 25, 2018 at 8:10 AM, Ron Dagostino 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > Built from source and exercised the new SASL/OAUTHBEARER functionality
> > with
> > > unsecured tokens.
> > >
> > > Thanks, Rajini -- apologies for KAFKA-7182.
> > >
> > > Ron
> > >
> > > On Tue, Jul 24, 2018 at 5:10 PM Vahid S Hashemian <
> > > vahidhashem...@us.ibm.com>
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Built from source and ran quickstart successfully with both Java 8
> and
> > > > Java 9 on Ubuntu.
> > > > Thanks Rajini!
> > > >
> > > > --Vahid
> > > >
> > > >
> > > >
> > > >
> > > > From:   Rajini Sivaram 
> > > > To: dev , Users ,
> > > > kafka-clients 
> > > > Date:   07/24/2018 08:33 AM
> > > > Subject:[VOTE] 2.0.0 RC3
> > > >
> > > >
> > > >
> > > > Hello Kafka users, developers and client-developers,
> > > >
> > > >
> > > > This is the fourth candidate for release of Apache Kafka 2.0.0.
> > > >
> > > >
> > > > This is a major version release of Apache Kafka. It includes 40 new
> > KIPs
> > > > and
> > > >
> > > > several critical bug fixes. Please see the 2.0.0 release plan for
> more
> > > > details:
> > > >
> > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > action?pageId=80448820
> > > >
> > > >
> > > >
> > > > A few notable highlights:
> > > >
> > > >- Prefixed wildcard ACLs (KIP-290), Fine grained ACLs for
> > CreateTopics
> > > >(KIP-277)
> > > >- SASL/OAUTHBEARER implementation (KIP-255)
> > > >- Improved quota communication and customization of quotas
> (KIP-219,
> > > >KIP-257)
> > > >- Efficient memory usage for down conversion (KIP-283)
> > > >- Fix log divergence between leader and follower during fast
> leader
> > > >failover (KIP-279)
> > > >- Drop support for Java 7 and remove deprecated code including old
> > > > scala
> > > >clients
> > > >- Connect REST extension plugin, support for externalizing secrets
> > and
> > > >improved error handling (KIP-285, KIP-297, KIP-298 etc.)
> > > >- Scala API for Kafka Streams and other Streams API improvements
> > > >(KIP-270, KIP-150, KIP-245, KIP-251 etc.)
> > > >
> > > >
> > > > Release notes for the 2.0.0 release:
> > > >
> > > > http://home.apache.org/~rsivaram/kafka-2.0.0-rc3/RELEASE_NOTES.html
> > > >
> > > >
> > > >
> > > > *** Please download, test and vote by Friday July 27, 4pm PT.
> > > >
> > > >
> > > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > > >
> > > > http://kafka.apache.org/KEYS
> > > >
> > > >
> > > >
> > > > * Release artifacts to be voted upon (source and binary):
> > > >
> > > > http://home.apache.org/~rsivaram/kafka-2.0.0-rc3/
> > > >
> > > >
> > > >
> > > > * Maven artifacts to be voted upon:
> > > >
> > > > https://repository.apache.org/content/groups/staging/
> > > >
> > > >
> > > >
> > > > * Javadoc:
> > > >
> > > > http://home.apache.org/~rsivaram/kafka-2.0.0-rc3/javadoc/
> > > >
> > > >
> > > >
> > > > * Tag to be voted upon (off 2.0 branch) is the 2.0.0 tag:
> > > >
> > > > https://github.com/apache/kafka/releases/tag/2.0.0-rc3
> > > >
> > > >
> > > > * Documentation:
> > > >
> > > > http://kafka.apache.org/20/documentation.html
> > > >
> > > >
> > > >
> > > > * Protocol:
> > > >
> > > > http://kafka.apache.org/20/protocol.html
> > > >
> > > >
> > > >
> > > > * Successful Jenkins builds for the 2.0 branch:
> > > >
> > > > Unit/integration tests:
> > > > https://builds.apache.org/job/kafka-2.0-jdk8/90/
> > > >
> > > >
> > > > System tests:
> > > > https://jenkins.confluent.io/job/system-test-kafka/job/2.0/41/
> > > >
> > > >
> > > >
> > > > /**
> > > >
> > > >
> > > > Thanks,
> > > >
> > > >
> > > >
> > > > Rajini
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 
>


Re: [DISCUSS] KIP-258: Allow to Store Record Timestamps in RocksDB

2018-07-27 Thread Bill Bejeck
Hi Matthias,

Thanks for the update and the working prototype, it helps with
understanding the KIP.

I took an initial pass over this PR, and overall I find the interfaces and
approach to be reasonable.

Regarding step 3C of the in-place upgrade (users needing to watch the
restore process), I'm wondering if we want to provide a type of
StateRestoreListener that could signal when the new stores have reached
parity with the existing old stores and that could be the signal to start
second rolling rebalance?

Although you solicited feedback on the interfaces involved, I wanted to put
down some thoughts that have come to mind reviewing this KIP again

1. Out of N instances, one fails midway through the process, would we allow
the other instances to complete or just fail the entire upgrade?
2. During the second rolling bounce, maybe we could rename the current
active directories vs. deleting them right away,  and when all the prepare
task directories are successfully migrated then delete the previous active
ones.
3. For the first rolling bounce we pause any processing any new records and
just allow the prepare tasks to restore, then once all prepare tasks have
restored, it's a signal for the second round of rolling bounces and then as
each task successfully renames its prepare directories and deletes the old
active task directories, normal processing of records resumes.

Thanks,
Bill



On Wed, Jul 25, 2018 at 9:42 PM Matthias J. Sax 
wrote:

> Hi,
>
> KIP-268 (rebalance meatadata) is finished and included in AK 2.0
> release. Thus, I want to pick up this KIP again to get the RocksDB
> upgrade done for 2.1.
>
> I updated the KIP accordingly and also have a "prove of concept" PR
> ready (for "in place" upgrade only):
> https://github.com/apache/kafka/pull/5422/
>
> There a still open questions, but I want to collect early feedback on
> the proposed interfaces we need for the store upgrade. Also note, that
> the KIP now also aim to define a generic upgrade path from any store
> format A to any other store format B. Adding timestamps is just a
> special case.
>
> I will continue to work on the PR and refine the KIP in the meantime, too.
>
> Looking forward to your feedback.
>
> -Matthias
>
>
> On 3/14/18 11:14 PM, Matthias J. Sax wrote:
> > After some more thoughts, I want to follow John's suggestion and split
> > upgrading the rebalance metadata from the store upgrade.
> >
> > I extracted the metadata upgrade into it's own KIP:
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade
> >
> > I'll update this KIP accordingly shortly. I also want to consider to
> > make the store format upgrade more flexible/generic. Atm, the KIP is too
> > much tailored to the DSL IMHO and does not encounter PAPI users that we
> > should not force to upgrade the stores. I need to figure out the details
> > and follow up later.
> >
> > Please give feedback for the new KIP-268 on the corresponding discussion
> > thread.
> >
> > @James: unfortunately, for upgrading to 1.2 I couldn't figure out a way
> > for a single rolling bounce upgrade. But KIP-268 proposes a fix for
> > future upgrades. Please share your thoughts.
> >
> > Thanks for all your feedback!
> >
> > -Matthias
> >
> > On 3/12/18 11:56 PM, Matthias J. Sax wrote:
> >> @John: yes, we would throw if configs are missing (it's an
> >> implementation details IMHO and thus I did not include it in the KIP)
> >>
> >> @Guozhang:
> >>
> >> 1) I understand know what you mean. We can certainly, allow all values
> >> "0.10.0.x", "0.10.1.x", "0.10.2.x", ... "1.1.x" for `upgrade.from`
> >> parameter. I had a similar though once but decided to collapse them into
> >> one -- will update the KIP accordingly.
> >>
> >> 2) The idea to avoid any config would be, to always send both request.
> >> If we add a config to eventually disable the old request, we don't gain
> >> anything with this approach. The question is really, if we are willing
> >> to pay this overhead from 1.2 on -- note, it would be limited to 2
> >> versions and not grow further in future releases. More details in (3)
> >>
> >> 3) Yes, this approach subsumes (2) for later releases and allows us to
> >> stay with 2 "assignment strategies" we need to register, as the new
> >> assignment strategy will allow to "upgrade itself" via "version
> >> probing". Thus, (2) would only be a workaround to avoid a config if
> >> people upgrade from pre-1.2 releases.
> >>
> >> Thus, I don't think we need to register new "assignment strategies" and
> >> send empty subscriptions for older version.
> >>
> >> 4) I agree that this is a tricky thing to get right with a single
> >> rebalance. I share the concern that an application might never catch up
> >> and thus the hot standby will never be ready.
> >>
> >> Maybe it's better to go with 2 rebalances for store upgrades. If we do
> >> this, we also don't need to go with (2) and can get (3) in place for
> >> future upgrades. I also think 

Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

2018-07-27 Thread Jason Gustafson
Thanks Anna, you are right on both points. I updated the KIP.

-Jason

On Thu, Jul 26, 2018 at 2:08 PM, Anna Povzner  wrote:

> Hi Jason,
>
> Thanks for the update. I agree with the current proposal.
>
> Two minor comments:
> 1) In “API Changes” section, first paragraph says that “users can catch the
> more specific exception type and use the new `seekToNearest()` API defined
> below.”. Since LogTruncationException “will include the partitions that
> were truncated and the offset of divergence”., shouldn’t the client use
> seek(offset) to seek to the offset of divergence in response to the
> exception?
> 2) In “Protocol Changes” section, OffsetsForLeaderEpoch subsection says
> “Note
> that consumers will send a sentinel value (-1) for the current epoch and
> the broker will simply disregard that validation.”. Is that still true with
> MetadataResponse containing leader epoch?
>
> Thanks,
> Anna
>
> On Wed, Jul 25, 2018 at 1:44 PM Jason Gustafson 
> wrote:
>
> > Hi All,
> >
> > I have made some updates to the KIP. As many of you know, a side project
> of
> > mine has been specifying the Kafka replication protocol in TLA. You can
> > check out the code here if you are interested:
> > https://github.com/hachikuji/kafka-specification. In addition to
> > uncovering
> > a couple unknown bugs in the replication protocol (e.g.
> > https://issues.apache.org/jira/browse/KAFKA-7128), this has helped me
> > validate the behavior in this KIP. In fact, the original version I
> proposed
> > had a weakness. I initially suggested letting the leader validate the
> > expected epoch at the fetch offset. This made sense for the consumer in
> the
> > handling of unclean leader election, but it was not strong enough to
> > protect the follower in all cases. In order to make advancement of the
> high
> > watermark safe, for example, the leader actually needs to be sure that
> > every follower in the ISR matches its own epoch.
> >
> > I attempted to fix this problem by treating the epoch in the fetch
> request
> > slightly differently for consumers and followers. For consumers, it would
> > be the expected epoch of the record at the fetch offset, and the leader
> > would raise a LOG_TRUNCATION error if the expectation failed. For
> > followers, it would be the current epoch and the leader would require
> that
> > it match its own epoch. This was unsatisfying both because of the
> > inconsistency in behavior and because the consumer was left with the
> weaker
> > fencing that we already knew was insufficient for the replicas.
> Ultimately
> > I decided that we should make the behavior consistent and that meant that
> > the consumer needed to act more like a following replica. Instead of
> > checking for truncation while fetching, the consumer should check for
> > truncation after leader changes. After checking for truncation, the
> > consumer can then use the current epoch when fetching and get the
> stronger
> > protection that it provides. What this means is that the Metadata API
> must
> > include the current leader epoch. Given the problems we have had around
> > stale metadata and how challenging they have been to debug, I'm convinced
> > that this is a good idea in any case and it resolves the inconsistent
> > behavior in the Fetch API. The downside is that there will be some
> > additional overhead upon leader changes, but I don't think it is a major
> > concern since leader changes are rare and the OffsetForLeaderEpoch
> request
> > is cheap.
> >
> > This approach leaves the door open for some interesting follow up
> > improvements. For example, now that we have the leader epoch in the
> > Metadata request, we can implement similar fencing for the Produce API.
> And
> > now that the consumer can reason about truncation, we could consider
> having
> > a configuration to expose records beyond the high watermark. This would
> let
> > users trade lower end-to-end latency for weaker durability semantics. It
> is
> > sort of like having an acks=0 option for the consumer. Neither of these
> > options are included in this KIP, I am just mentioning them as potential
> > work for the future.
> >
> > Finally, based on the discussion in this thread, I have added the
> > seekToCommitted API for the consumer. Please take a look and let me know
> > what you think.
> >
> > Thanks,
> > Jason
> >
> > On Fri, Jul 20, 2018 at 2:34 PM, Guozhang Wang 
> wrote:
> >
> > > Hi Jason,
> > >
> > > The proposed API seems reasonable to me too. Could you please also
> update
> > > the wiki page (
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 320%3A+Allow+fetchers+to+detect+and+handle+log+truncation)
> > > with a section say "workflow" on how the proposed API will be co-used
> > with
> > > others to:
> > >
> > > 1. consumer callers handling a LogTruncationException.
> > > 2. consumer internals for handling a retriable
> > UnknownLeaderEpochException.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Tue, Jul 17, 2018 at 10:23 AM, Anna 

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-27 Thread Boyang Chen
Hey James,


the algorithm is relying on client side to provide unique consumer member id. 
It will break unless we enforce some sort of validation (host + port) on the 
server side. To simplify the first version, we do not plan to enforce 
validation. A good comparison would be the EOS producer which is in charge of 
generating unique transaction id sequence. IMO for broker logic, the tolerance 
of client side error is not unlimited.


Thank you!



From: James Cheng 
Sent: Saturday, July 28, 2018 1:26 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by 
specifying member id


> On Jul 26, 2018, at 11:09 PM, Guozhang Wang  wrote:
>
> Hi Boyang,
>
> Thanks for the proposed KIP. I made a pass over the wiki and here are some
> comments / questions:
>
> 1. In order to preserve broker compatibility, we need to make sure the
> broker version discovery logic can be integrated with this new logic. I.e.
> if a newer versioned consumer is talking to an older versioned broker who
> does not recognize V4, the client needs to downgrade its JoinGroupRequest
> version to V3 and not setting the member-id specifically. You can take a
> look at the ApiVersionsRequest and see how to work with it.
>
> 2. There may exist some manners to validate that two different clients do
> not send with the same member id, for example if we pass along the
> host:port information from KafkaApis to the GroupCoordinator interface. But
> I think this is overly complicated the logic and may not worthwhile than
> relying on users to specify unique member ids.

Boyang,

Thanks for the KIP! How will the algorithm behave if multiple consumers provide 
the same member id?

-James

> 3. Minor: you would need to bumping up the version of JoinGroupResponse to
> V4 as well.
>
> 4. Minor: in the wiki page, you need to specify the actual string value for
> `MEMBER_ID`, for example "member.id".
>
> 5. When this additional config it specified by users, we should consider
> setting the default of internal `LEAVE_GROUP_ON_CLOSE_CONFIG` to false,
> since otherwise its effectiveness would be less.
>
>
> Guozhang
>
>
>
>> On Thu, Jul 26, 2018 at 9:20 PM, Boyang Chen  wrote:
>>
>> Hey friends,
>>
>>
>> I would like to open a discussion thread on KIP-345:
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A
>> +Reduce+multiple+consumer+rebalances+by+specifying+member+id
>>
>>
>> This KIP is trying to resolve multiple rebalances by maintaining the
>> consumer member id across rebalance generations. I have verified the theory
>> on our internal Stream application, and it could reduce rebalance time to a
>> few seconds when service is rolling restart.
>>
>>
>> Let me know your thoughts, thank you!
>>
>>
>> Best,
>>
>> Boyang
>>
>
>
>
> --
> -- Guozhang


Re: [DISCUSS] KIP-289: Improve the default group id behavior in KafkaConsumer

2018-07-27 Thread Jason Gustafson
Hey Colin,

The problem is both that the empty group id is the default value and that
it is actually accepted by the broker for offset commits. Combine that with
the fact that auto commit is enabled by default and you users get
surprising behavior. If you look at a random Kafka cluster, you'll probably
find a bunch of inadvertent offset commits for the empty group id. I was
hoping we could distinguish between users who are using the empty group id
as an accident of the default configuration and those who use it
intentionally. By default, there will be no group id and the consumer will
not commit offsets. If a user has actually intentionally used the empty
group id, however, it will continue to work. I actually think there are
probably very few people doing this (maybe even no one), but I thought we
might err on the side of compatibility.

The big incompatible change here is having brokers reject using assign(...)
> with empty / null group.id.


This is not correct. In the proposal, the broker will only reject the empty
group id for the new version of OffsetCommit. Older clients, which cannot
be changed, will continue to work because the old versions of the
OffsetCommit API still accept the empty group id. The null group id is
different from the empty group id: it is not allowed in any version of the
API. It is basically a way to indicate that the consumer has no dependence
on the coordinator at all, which we actually have a surprising number of
use cases for. Furthermore, if a user has an actual need for the empty
group id, it will still be allowed. We are just deprecating it.

-Jason

On Fri, Jul 27, 2018 at 9:56 AM, Colin McCabe  wrote:

> Sorry if this is a silly question, but what's the rationale for switching
> to using null for the default group id, rather than the empty string?
> Continuing to use the empty string seems like less churn.  And after all,
> we're not using the empty string group name for anything else.
>
> The big incompatible change here is having brokers reject using
> assign(...) with empty / null group.id.  If I understand correctly, the
> KIP proposes that this change be made on the brokers on the next
> incompatible Kafka release.  But that has nothing to do with client
> versions.  Why not just have a broker config which controls this?  Maybe "
> allow.assign.empty.group.id", or something like that.  At first, the
> default will be true, and then eventually we can flip it over to false.
>
> It seems like the main rationale for tying this behavior to the Kafka
> client version is to force people to stop using the empty group id so that
> they can upgrade their clients.  But it's also possible that people will
> stop upgrading their Kafka clients instead.  That would be pretty negative
> since  they'd miss out on any efficiency and feature improvements in the
> new clients and eventually have to do more protocol downgrading, etc.
>
> best,
> Colin
>
>
> On Thu, Jul 26, 2018, at 11:50, Vahid S Hashemian wrote:
> > Hi Jason,
> >
> > That makes sense.
> > I have updated the KIP based on the recent feedback.
> >
> > Thanks!
> > --Vahid
> >
> >
> >
> >
> > From:   Jason Gustafson 
> > To: dev 
> > Date:   07/25/2018 02:23 PM
> > Subject:Re: [DISCUSS] KIP-289: Improve the default group id
> > behavior in KafkaConsumer
> >
> >
> >
> > Hi Vahid,
> >
> > I was thinking we'd only use the old API version if we had to. That is,
> > only if the user has explicitly configured "" as the group.id.
> Otherwise,
> > we'd just use the new one. Another option is to just drop support in the
> > client for the empty group id, but usually we allow a deprecation period
> > for changes like this.
> >
> > -Jason
> >
> > On Wed, Jul 25, 2018 at 12:49 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> > > Hi Jason,
> > >
> > > Thanks for additional clarification.
> > >
> > > So the next version of the OffsetCommit API will return an
> > > INVALID_GROUP_ID error for empty group ids; but on the client side we
> > call
> > > the older version of the client until the next major release.
> > > The table below should summarize this.
> > >
> > > +-+
> > >   | Client (group.id="") |
> > > +-+
> > >   | pre-2.1 |   2.1  |   3.0 |
> > >
> > +-+---+-++--
> +
> > > | | V5 (cur.) | works   | works  | works |
> > > + API
> > +---+-++--+
> > > | | V6| N/A | N/A (calls V5/warning) |
> INVALID_GROUP_ID
> > |
> > >
> > +-+---+-++--
> +
> > >
> > > Assumptions:
> > > * 2.1: The target release version for this KIP
> > > * 3.0: The next major release
> > >
> > > Please advise if you see an issue; otherwise, I'll update the 

Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-27 Thread James Cheng


> On Jul 26, 2018, at 11:09 PM, Guozhang Wang  wrote:
> 
> Hi Boyang,
> 
> Thanks for the proposed KIP. I made a pass over the wiki and here are some
> comments / questions:
> 
> 1. In order to preserve broker compatibility, we need to make sure the
> broker version discovery logic can be integrated with this new logic. I.e.
> if a newer versioned consumer is talking to an older versioned broker who
> does not recognize V4, the client needs to downgrade its JoinGroupRequest
> version to V3 and not setting the member-id specifically. You can take a
> look at the ApiVersionsRequest and see how to work with it.
> 
> 2. There may exist some manners to validate that two different clients do
> not send with the same member id, for example if we pass along the
> host:port information from KafkaApis to the GroupCoordinator interface. But
> I think this is overly complicated the logic and may not worthwhile than
> relying on users to specify unique member ids.

Boyang,

Thanks for the KIP! How will the algorithm behave if multiple consumers provide 
the same member id?

-James

> 3. Minor: you would need to bumping up the version of JoinGroupResponse to
> V4 as well.
> 
> 4. Minor: in the wiki page, you need to specify the actual string value for
> `MEMBER_ID`, for example "member.id".
> 
> 5. When this additional config it specified by users, we should consider
> setting the default of internal `LEAVE_GROUP_ON_CLOSE_CONFIG` to false,
> since otherwise its effectiveness would be less.
> 
> 
> Guozhang
> 
> 
> 
>> On Thu, Jul 26, 2018 at 9:20 PM, Boyang Chen  wrote:
>> 
>> Hey friends,
>> 
>> 
>> I would like to open a discussion thread on KIP-345:
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A
>> +Reduce+multiple+consumer+rebalances+by+specifying+member+id
>> 
>> 
>> This KIP is trying to resolve multiple rebalances by maintaining the
>> consumer member id across rebalance generations. I have verified the theory
>> on our internal Stream application, and it could reduce rebalance time to a
>> few seconds when service is rolling restart.
>> 
>> 
>> Let me know your thoughts, thank you!
>> 
>> 
>> Best,
>> 
>> Boyang
>> 
> 
> 
> 
> -- 
> -- Guozhang


Re: CVE-2018-1288: Authenticated Kafka clients may interfere with data replication

2018-07-27 Thread Mickael Maison
Thanks Rajini and the rest of the security team for handling this issue.

For people interested in more details about the issue and its
discovery we've published a blog post:
https://developer.ibm.com/dwblog/2018/anatomy-kafka-cve/

On Thu, Jul 26, 2018 at 10:25 AM, Rajini Sivaram
 wrote:
>
> CVE-2018-1288: Authenticated Kafka clients may interfere with data
> replication
>
>
>
> Severity: Moderate
>
>
>
> Vendor: The Apache Software Foundation
>
>
>
> Versions Affected:
>
> Apache Kafka 0.9.0.0 to 0.9.0.1, 0.10.0.0 to 0.10.2.1, 0.11.0.0 to 0.11.0.2,
> 1.0.0
>
>
>
> Description:
>
> Authenticated Kafka users may perform action reserved for the Broker via a
> manually created fetch request interfering with data replication, resulting
> in data loss.
>
>
>
> Mitigation:
>
> Apache Kafka users should upgrade to one of the following versions where
> this vulnerability has been fixed.
>
> 0.10.2.2 or higher
> 0.11.0.3 or higher
> 1.0.1 or higher
> 1.1.0 or higher
>
>
>
> Acknowledgements:
>
> We would like to thank Edoardo Comar and Mickael Maison for reporting this
> issue and providing a resolution.
>
>
>
> Regards,
>
>
> Rajini


Re: [DISCUSS] KIP-289: Improve the default group id behavior in KafkaConsumer

2018-07-27 Thread Colin McCabe
Sorry if this is a silly question, but what's the rationale for switching to 
using null for the default group id, rather than the empty string?  Continuing 
to use the empty string seems like less churn.  And after all, we're not using 
the empty string group name for anything else.

The big incompatible change here is having brokers reject using assign(...) 
with empty / null group.id.  If I understand correctly, the KIP proposes that 
this change be made on the brokers on the next incompatible Kafka release.  But 
that has nothing to do with client versions.  Why not just have a broker config 
which controls this?  Maybe "allow.assign.empty.group.id", or something like 
that.  At first, the default will be true, and then eventually we can flip it 
over to false.

It seems like the main rationale for tying this behavior to the Kafka client 
version is to force people to stop using the empty group id so that they can 
upgrade their clients.  But it's also possible that people will stop upgrading 
their Kafka clients instead.  That would be pretty negative since  they'd miss 
out on any efficiency and feature improvements in the new clients and 
eventually have to do more protocol downgrading, etc.

best,
Colin


On Thu, Jul 26, 2018, at 11:50, Vahid S Hashemian wrote:
> Hi Jason,
> 
> That makes sense.
> I have updated the KIP based on the recent feedback.
> 
> Thanks!
> --Vahid
> 
> 
> 
> 
> From:   Jason Gustafson 
> To: dev 
> Date:   07/25/2018 02:23 PM
> Subject:Re: [DISCUSS] KIP-289: Improve the default group id 
> behavior in KafkaConsumer
> 
> 
> 
> Hi Vahid,
> 
> I was thinking we'd only use the old API version if we had to. That is,
> only if the user has explicitly configured "" as the group.id. Otherwise,
> we'd just use the new one. Another option is to just drop support in the
> client for the empty group id, but usually we allow a deprecation period
> for changes like this.
> 
> -Jason
> 
> On Wed, Jul 25, 2018 at 12:49 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
> 
> > Hi Jason,
> >
> > Thanks for additional clarification.
> >
> > So the next version of the OffsetCommit API will return an
> > INVALID_GROUP_ID error for empty group ids; but on the client side we 
> call
> > the older version of the client until the next major release.
> > The table below should summarize this.
> >
> > +-+
> >   | Client (group.id="") |
> > +-+
> >   | pre-2.1 |   2.1  |   3.0 |
> > 
> +-+---+-++--+
> > | | V5 (cur.) | works   | works  | works |
> > + API 
> +---+-++--+
> > | | V6| N/A | N/A (calls V5/warning) | INVALID_GROUP_ID 
> |
> > 
> +-+---+-++--+
> >
> > Assumptions:
> > * 2.1: The target release version for this KIP
> > * 3.0: The next major release
> >
> > Please advise if you see an issue; otherwise, I'll update the KIP
> > accordingly.
> >
> > Thanks!
> > --Vahid
> >
> >
> >
> >
> > From:   Jason Gustafson 
> > To: dev 
> > Date:   07/25/2018 12:08 AM
> > Subject:***UNCHECKED*** Re: [DISCUSS] KIP-289: Improve the 
> default
> > group idbehavior in KafkaConsumer
> >
> >
> >
> > Hey Vahid,
> >
> > Sorry for the confusion. I think we all agree that going forward, we
> > shouldn't support the empty group id, so the question is just around
> > compatibility. I think we have to bump the OffsetCommit API version so
> > that
> > old clients which are unknowingly depending on the default empty group 
> id
> > will continue to work with new brokers. For new versions of the client, 
> we
> > can either drop support for the empty group id immediately or we can 
> give
> > users a grace period. I was thinking we would do the latter. We can 
> change
> > the default group.id, but in the case that a user has explicitly
> > configured
> > the empty group, then we can just use an old version of the OffsetCommit
> > API which still supports it. In a future release, we can drop this 
> support
> > and only use the latest OffsetCommit version. Does that make sense?
> >
> > Thanks,
> > Jason
> >
> >
> > On Tue, Jul 24, 2018 at 12:36 PM, Vahid S Hashemian <
> > vahidhashem...@us.ibm.com> wrote:
> >
> > > Hi Jason,
> > >
> > > Thanks for clarifying.
> > >
> > > So if we are going to continue supporting the empty group id as before
> > > (with only an addition of a deprecation warning), and disable
> > > enable.auto.commit for the new default (null) group id on the client
> > side,
> > > do we really need to bump up the OffsetCommit version?
> > >
> > > You mentioned "If an explicit empty string is configured for the group
> > id,
> > > then maybe we keep the current behavior for compatibility" 

Re: [VOTE] 2.0.0 RC3

2018-07-27 Thread Gwen Shapira
+1

Quickstart on binaries, built from sources, verified signatures -- it
really is Rajini. Thank you!

Gwen

On Thu, Jul 26, 2018 at 10:27 PM, Guozhang Wang  wrote:

> +1.
>
> Validated the following:
>
> 1. quick start on binary (2.12)
> 2. unit test on source
> 3. javadoc
> 4. web doc
> 5. included jars (2.12).
>
>
> Thanks Rajini!
>
>
> Guozhang
>
>
> On Wed, Jul 25, 2018 at 8:10 AM, Ron Dagostino  wrote:
>
> > +1 (non-binding)
> >
> > Built from source and exercised the new SASL/OAUTHBEARER functionality
> with
> > unsecured tokens.
> >
> > Thanks, Rajini -- apologies for KAFKA-7182.
> >
> > Ron
> >
> > On Tue, Jul 24, 2018 at 5:10 PM Vahid S Hashemian <
> > vahidhashem...@us.ibm.com>
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Built from source and ran quickstart successfully with both Java 8 and
> > > Java 9 on Ubuntu.
> > > Thanks Rajini!
> > >
> > > --Vahid
> > >
> > >
> > >
> > >
> > > From:   Rajini Sivaram 
> > > To: dev , Users ,
> > > kafka-clients 
> > > Date:   07/24/2018 08:33 AM
> > > Subject:[VOTE] 2.0.0 RC3
> > >
> > >
> > >
> > > Hello Kafka users, developers and client-developers,
> > >
> > >
> > > This is the fourth candidate for release of Apache Kafka 2.0.0.
> > >
> > >
> > > This is a major version release of Apache Kafka. It includes 40 new
> KIPs
> > > and
> > >
> > > several critical bug fixes. Please see the 2.0.0 release plan for more
> > > details:
> > >
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=80448820
> > >
> > >
> > >
> > > A few notable highlights:
> > >
> > >- Prefixed wildcard ACLs (KIP-290), Fine grained ACLs for
> CreateTopics
> > >(KIP-277)
> > >- SASL/OAUTHBEARER implementation (KIP-255)
> > >- Improved quota communication and customization of quotas (KIP-219,
> > >KIP-257)
> > >- Efficient memory usage for down conversion (KIP-283)
> > >- Fix log divergence between leader and follower during fast leader
> > >failover (KIP-279)
> > >- Drop support for Java 7 and remove deprecated code including old
> > > scala
> > >clients
> > >- Connect REST extension plugin, support for externalizing secrets
> and
> > >improved error handling (KIP-285, KIP-297, KIP-298 etc.)
> > >- Scala API for Kafka Streams and other Streams API improvements
> > >(KIP-270, KIP-150, KIP-245, KIP-251 etc.)
> > >
> > >
> > > Release notes for the 2.0.0 release:
> > >
> > > http://home.apache.org/~rsivaram/kafka-2.0.0-rc3/RELEASE_NOTES.html
> > >
> > >
> > >
> > > *** Please download, test and vote by Friday July 27, 4pm PT.
> > >
> > >
> > > Kafka's KEYS file containing PGP keys we use to sign the release:
> > >
> > > http://kafka.apache.org/KEYS
> > >
> > >
> > >
> > > * Release artifacts to be voted upon (source and binary):
> > >
> > > http://home.apache.org/~rsivaram/kafka-2.0.0-rc3/
> > >
> > >
> > >
> > > * Maven artifacts to be voted upon:
> > >
> > > https://repository.apache.org/content/groups/staging/
> > >
> > >
> > >
> > > * Javadoc:
> > >
> > > http://home.apache.org/~rsivaram/kafka-2.0.0-rc3/javadoc/
> > >
> > >
> > >
> > > * Tag to be voted upon (off 2.0 branch) is the 2.0.0 tag:
> > >
> > > https://github.com/apache/kafka/releases/tag/2.0.0-rc3
> > >
> > >
> > > * Documentation:
> > >
> > > http://kafka.apache.org/20/documentation.html
> > >
> > >
> > >
> > > * Protocol:
> > >
> > > http://kafka.apache.org/20/protocol.html
> > >
> > >
> > >
> > > * Successful Jenkins builds for the 2.0 branch:
> > >
> > > Unit/integration tests:
> > > https://builds.apache.org/job/kafka-2.0-jdk8/90/
> > >
> > >
> > > System tests:
> > > https://jenkins.confluent.io/job/system-test-kafka/job/2.0/41/
> > >
> > >
> > >
> > > /**
> > >
> > >
> > > Thanks,
> > >
> > >
> > >
> > > Rajini
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> -- Guozhang
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: [Vote] KIP-321: Update TopologyDescription to better represent Source and Sink Nodes

2018-07-27 Thread Bill Bejeck
Thanks for the KIP!

+1

-Bill

On Thu, Jul 26, 2018 at 2:39 AM Guozhang Wang  wrote:

> +1
>
> On Wed, Jul 25, 2018 at 11:13 PM, Matthias J. Sax 
> wrote:
>
> > +1 (binding)
> >
> > -Matthias
> >
> > On 7/25/18 7:47 PM, Ted Yu wrote:
> > > +1
> > >
> > > On Wed, Jul 25, 2018 at 7:24 PM Nishanth Pradeep <
> nishanth...@gmail.com>
> > > wrote:
> > >
> > >> Hello,
> > >>
> > >> I'm calling a vote for KIP-321:
> > >>
> > >>
> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-321%3A+Update+
> > TopologyDescription+to+better+represent+Source+and+Sink+Nodes
> > >>
> > >> Best,
> > >> Nishanth Pradeep
> > >>
> > >
> >
> >
>
>
> --
> -- Guozhang
>


[jira] [Created] (KAFKA-7213) NullPointerException during state restoration in kafka streams

2018-07-27 Thread Abhishek Agarwal (JIRA)
Abhishek Agarwal created KAFKA-7213:
---

 Summary: NullPointerException during state restoration in kafka 
streams
 Key: KAFKA-7213
 URL: https://issues.apache.org/jira/browse/KAFKA-7213
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.0.0
Reporter: Abhishek Agarwal
Assignee: Abhishek Agarwal


I had written a custom state store which has a batch restoration callback 
registered. What I have observed, when multiple consumer instances are 
restarted, the application keeps failing with NullPointerException. The stack 
trace is 
{noformat}
java.lang.NullPointerException: null
at 
org.apache.kafka.streams.state.internals.RocksDBStore.putAll(RocksDBStore.java:351)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.state.internals.RocksDBSlotKeyValueBytesStore.putAll(RocksDBSlotKeyValueBytesStore.java:100)
 ~[streams-core-1.0.0.297.jar:?]
at 
org.apache.kafka.streams.state.internals.RocksDBSlotKeyValueBytesStore$SlotKeyValueBatchRestoreCallback.restoreAll(RocksDBSlotKeyValueBytesStore.java:303)
 ~[streams-core-1.0.0.297.jar:?]
at 
org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreAll(CompositeRestoreListener.java:89)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:75)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:277)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restorePartition(StoreChangelogReader.java:238)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:83)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:263)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:803)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
 ~[kafka-streams-1.0.0.jar:?]
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
 ~[kafka-streams-1.0.0.jar:?]
{noformat}

The faulty line in question is 
{noformat}
db.write(wOptions, batch);
{noformat}

in RocksDBStore.java which would mean that db variable is null. Probably the 
store has been closed and restoration is still being done on it. After going 
through the code, I think the problem is when state transitions from 
PARTITIONS_ASSIGNED to PARTITIONS_REVOKED and restoration is still in progress. 
In such state transition, while the active tasks themselves are closed, the 
changelog reader is not reset. It tries to restore the tasks that have already 
been closed, db is null and results in NPE. 

I will put in a fix to see if that fixes the issue. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7212) Bad exception message on failed serialization

2018-07-27 Thread D T (JIRA)
D T created KAFKA-7212:
--

 Summary: Bad exception message on failed serialization
 Key: KAFKA-7212
 URL: https://issues.apache.org/jira/browse/KAFKA-7212
 Project: Kafka
  Issue Type: Bug
  Components: clients, producer 
Affects Versions: 1.1.1, 1.0.1
Reporter: D T


I use Spring-Kafka to connect to a Kafka-Server. While trying to use Spring's 
MessageConverter I encountered strange error messages that did not make any 
sense for me.

 

 
{noformat}
org.apache.kafka.common.errors.SerializationException: Can't convert value of 
class org.springframework.messaging.support.GenericMessage to class 
org.apache.kafka.common.serialization.ByteArraySerializer specified in 
value.serializer
Caused by: java.lang.ClassCastException: 
org.springframework.messaging.support.GenericMessage cannot be cast to [B
    at 
org.apache.kafka.common.serialization.ByteArraySerializer.serialize(ByteArraySerializer.java:21)
    at 
org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:65)
    at 
org.apache.kafka.common.serialization.ExtendedSerializer$Wrapper.serialize(ExtendedSerializer.java:55)
    at 
org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:791)
    at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:768)
    at 
org.springframework.kafka.core.DefaultKafkaProducerFactory$CloseSafeProducer.send(DefaultKafkaProducerFactory.java:285)
    at 
org.springframework.kafka.core.KafkaTemplate.doSend(KafkaTemplate.java:349)
    at 
org.springframework.kafka.core.KafkaTemplate.send(KafkaTemplate.java:182){noformat}
My question was why would Kafka try to cast/convert Spring's GenericMessage to 
Kafka's ByteArraySerializer?

After quite some time trying various config options I debugged the code and 
found that the exception message was just bad.

The message should be something like
{noformat}
Can't convert value of class 
org.springframework.messaging.support.GenericMessage to byte[] in class 
org.apache.kafka.common.serialization.ByteArraySerializer specified in 
value.serializer{noformat}
 The issue is caused by line:
[https://github.com/apache/kafka/blob/1.1.1/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L801]

and
[https://github.com/apache/kafka/blob/1.1.1/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L809]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-345: Reduce multiple consumer rebalances by specifying member id

2018-07-27 Thread Guozhang Wang
Hi Boyang,

Thanks for the proposed KIP. I made a pass over the wiki and here are some
comments / questions:

1. In order to preserve broker compatibility, we need to make sure the
broker version discovery logic can be integrated with this new logic. I.e.
if a newer versioned consumer is talking to an older versioned broker who
does not recognize V4, the client needs to downgrade its JoinGroupRequest
version to V3 and not setting the member-id specifically. You can take a
look at the ApiVersionsRequest and see how to work with it.

2. There may exist some manners to validate that two different clients do
not send with the same member id, for example if we pass along the
host:port information from KafkaApis to the GroupCoordinator interface. But
I think this is overly complicated the logic and may not worthwhile than
relying on users to specify unique member ids.

3. Minor: you would need to bumping up the version of JoinGroupResponse to
V4 as well.

4. Minor: in the wiki page, you need to specify the actual string value for
`MEMBER_ID`, for example "member.id".

5. When this additional config it specified by users, we should consider
setting the default of internal `LEAVE_GROUP_ON_CLOSE_CONFIG` to false,
since otherwise its effectiveness would be less.


Guozhang



On Thu, Jul 26, 2018 at 9:20 PM, Boyang Chen  wrote:

> Hey friends,
>
>
> I would like to open a discussion thread on KIP-345:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-345%3A
> +Reduce+multiple+consumer+rebalances+by+specifying+member+id
>
>
> This KIP is trying to resolve multiple rebalances by maintaining the
> consumer member id across rebalance generations. I have verified the theory
> on our internal Stream application, and it could reduce rebalance time to a
> few seconds when service is rolling restart.
>
>
> Let me know your thoughts, thank you!
>
>
> Best,
>
> Boyang
>



-- 
-- Guozhang