[jira] [Commented] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect ProcessorNode

2016-10-21 Thread Frank Lyaruu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595593#comment-15595593
 ] 

Frank Lyaruu commented on KAFKA-4311:
-

The ClassCast exception is gone, but I still see the Illegal state exception:

java.lang.IllegalStateException: Key found in dirty key set, but entry is null
at 
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
at 
org.apache.kafka.streams.state.internals.NamedCache.evict(NamedCache.java:199)
at 
org.apache.kafka.streams.state.internals.ThreadCache.maybeEvict(ThreadCache.java:198)
at 
org.apache.kafka.streams.state.internals.ThreadCache.put(ThreadCache.java:121)
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(CachingKeyValueStore.java:147)
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.get(CachingKeyValueStore.java:134)
at 
org.apache.kafka.streams.kstream.internals.KTableReduce$KTableReduceProcessor.process(KTableReduce.java:74)
at 
org.apache.kafka.streams.kstream.internals.KTableReduce$KTableReduceProcessor.process(KTableReduce.java:52)
at 
org.apache.kafka.streams.processor.internals.ProcessorNode.process(ProcessorNode.java:82)
at 
org.apache.kafka.streams.processor.internals.ProcessorContextImpl.forward(ProcessorContextImpl.java:200)
at 
org.apache.kafka.streams.processor.internals.SourceNode.process(SourceNode.java:66)
at 
org.apache.kafka.streams.processor.internals.StreamTask.process(StreamTask.java:180)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:439)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)

and: 

org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed to 
flush state store develop2_personsperteam3-personsperteam-person-develop2
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:331)
at 
org.apache.kafka.streams.processor.internals.AbstractTask.flushState(AbstractTask.java:180)
at 
org.apache.kafka.streams.processor.internals.StreamThread$4.apply(StreamThread.java:372)
at 
org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:331)
at 
org.apache.kafka.streams.processor.internals.StreamThread.flushAllState(StreamThread.java:368)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:304)
at 
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:272)
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:255)
Caused by: java.lang.IllegalStateException: Key found in dirty key set, but 
entry is null
at 
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
at 
org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
at 
org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
at 
org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:329)

Again, when I set CACHE_MAX_BYTES_BUFFERING_CONFIG to 0, it runs fine.


> Multi layer cache eviction causes forwarding to incorrect ProcessorNode 
> 
>
> Key: KAFKA-4311
> URL: https://issues.apache.org/jira/browse/KAFKA-4311
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.1
>
>
> The two exceptions below were reported by Frank on the dev mailing list. 
> After investigation, the root cause is multiple cache evictions happening in 
> the same topology. 
> Given a topology like the one below. If a record arriving in `tableOne` 
> causes a cache eviction, it will trigger the `leftJoin` that will do a `get` 
> from `reducer-store`. If the key is not currently cached in `reducer-store`, 
> but is in the backing store, it will be put into the cache, and it may also 
> trigger an eviction. If it does trigger an eviction and the eldest entry is 
> dirty it will flush the dirty keys. It is at this point that the exception in 
> the comment happens (ClassCastException). This occurs because the 
> ProcessorContext is still set to the context of the `leftJoin` and the next 
> child in the topology is `mapValues`.
> We need to set the correct `ProcessorNode`, on the context,  in the 
> `ForwardingCacheFlushListener` prior to calling `context.forward`. We also 
> need to set remember to reset the `ProcessorNode` to the previous node once 
> `context.forward` has 

Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Michael Pearce
Here is Ignite’s  - essentially idea is having different owners/maintainers per 
module/area that keep a check on that area.

https://cwiki.apache.org/confluence/display/IGNITE/How+to+Contribute



On 10/21/16, 3:44 PM, "isma...@gmail.com on behalf of Ismael Juma" 
 wrote:

Hi Michael,

Can you please share which Apache projects have a MMC? I couldn't find
anything after a quick google.

Ismael

On Fri, Oct 21, 2016 at 7:28 AM, Michael Pearce 
wrote:

> So from my reading essentially the first question needs to answered/and
> voted on is:
>
> Is Apache Kafka Community only about the Core or does the apache community
> also support some subprojects (and just we need some better way to manage
> this)
>
> If vote for Core only wins, then the following should be removed:
> Kafka Connect
> Kafka Stream
>
> If vote for Core only loses (aka we will support subprojects) then:
> We should look to add Kafka Rest
>
> And we should look to see how we can manage better govern and manage
> submodules.
>
> A good example which id propose here is how some other communities in
> Apache do this.
>
> Each Module has a Module Management Committee(MMC), this is like almost
> the PMC but at a per module basis.
>
> This MMC should essentially hold the binding votes for that module.
> The MMC should be made up of a single representative from each
> organisation  (so no single organisation can fully veto the community it
> has to a genuine consenus)
> The MMC requires at least 3 members (so there cant be a tied vote on 2)
> For a new Module to be added a MMC committee should be sought
> A new Module is only capable of being added if the above requirements can
> be met (e.g. 3 people wishing to step up, from 3 organisations) so that
> only actively support modules would be added
>
> The PMC reviews each module every 6months or Year. If MMC is inactive, a
> vote/call to find replacements if raised, if none are forthcoming dropping
> the MMC to less than 3 then the module moves to "the attic" (very much 
like
> apache attic but a little more aggressively)
>
> This way the PMC does not need to micro manage every module
> We only add modules where some amount of active support and maintenance
> and use is provided by the community
> We have an automatic way to retire old or inactive projects.
>
> Thoughts?
> Mike
>
>
> 
> From: Harsha Ch 
> Sent: Thursday, October 20, 2016 10:26 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-80: Kafka REST Server
>
> Jay,
>   REST API is something every user is in need of. If the argument is 
to
> clone and write your  API, this will do a disservice to the users as they
> now have to choose one vs. others instead of keeping one API that is
> supported in Kafka community.
>
> "Pre-emptively re-creating another
> REST layer when it seems like we all quite agree on what needs to be done
> and we have an existing code base for HTTP/Kafka access that is heavily
> used in production seems quite silly."
>
>Exactly our point. Why can't we develop this in Apache Kafka
> community? Instead of us open sourcing another GitHub project and creating
> a divide in users and another version of API. Let's build this in Kafka
> Community and use the governance model that is proven to provide vendor
> free user driven consensus features. The argument that is adding this REST
> server to Kafka will affect the agility of the project doesn't mak sense.
>
> It looks like your argument is either we develop all these small tools or
> none at all. We as a community need to look at supporting critical
> tools/API. Instead of dividing this project into individual external
> communities. We should build this as part of Kafka which best serves the
> needs of users.
> The Streams and Connect projects that were pushed into Kafka could
> have been left in their own Github projects based on your arguments. What
> about the REST API is so different that such that it should stay out of 
the
> Kafka project? From my experience, more users are asking for the REST API.
>
> Thanks,
> Harsha
>
>
>
>
>
> On Wed, Oct 12, 2016 at 8:03 AM Jay Kreps  wrote:
>
> > I think the questions around governance make sense, I think we should
> > really clarify that to make the process more clear so it can be fully
> > inclusive.
> >
> > The idea that we should not collaborate on what is there now, though,
> > because in the future we might disagree about direction does not 

[GitHub] kafka pull request #2051: KAFKA-4311: Multi layer cache eviction causes forw...

2016-10-21 Thread dguy
GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2051

KAFKA-4311: Multi layer cache eviction causes forwarding to incorrect 
ProcessorNode

Given a topology like the one below. If a record arriving in `tableOne` 
causes a cache eviction, it will trigger the `leftJoin` that will do a `get` 
from `reducer-store`. If the key is not currently cached in `reducer-store`, 
but is in the backing store, it will be put into the cache, and it may also 
trigger an eviction. If it does trigger an eviction and the eldest entry is 
dirty it will flush the dirty keys. It is at this point that a 
ClassCastException is thrown. This occurs because the ProcessorContext is still 
set to the context of the `leftJoin` and the next child in the topology is 
`mapValues`.
We need to set the correct `ProcessorNode`, on the context, in the 
`ForwardingCacheFlushListener` prior to calling `context.forward`. We also need 
to set remember to reset the `ProcessorNode` to the previous node once 
`context.forward` has completed.

```
   final KTable one = builder.table(Serdes.String(), 
Serdes.String(), tableOne, tableOne);
final KTable two = builder.table(Serdes.Long(), 
Serdes.String(), tableTwo, tableTwo);
final KTable reduce = two.groupBy(new 
KeyValueMapper>() {
@Override
public KeyValue apply(final Long key, final 
String value) {
return new KeyValue<>(value, key);
}
}, Serdes.String(), Serdes.Long())
.reduce(new Reducer() {..}, new Reducer() {..}, 
"reducer-store");

one.leftJoin(reduce, new ValueJoiner() {..})
.mapValues(new ValueMapper() {..});
   
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4311

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2051


commit 5c8896a9d2aaca6bf5b9fb8de4de8140919fb280
Author: Damian Guy 
Date:   2016-10-21T11:27:17Z

Save and set the current processor node in FlushingCacheListener.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect ProcessorNode

2016-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15595401#comment-15595401
 ] 

ASF GitHub Bot commented on KAFKA-4311:
---

GitHub user dguy opened a pull request:

https://github.com/apache/kafka/pull/2051

KAFKA-4311: Multi layer cache eviction causes forwarding to incorrect 
ProcessorNode

Given a topology like the one below. If a record arriving in `tableOne` 
causes a cache eviction, it will trigger the `leftJoin` that will do a `get` 
from `reducer-store`. If the key is not currently cached in `reducer-store`, 
but is in the backing store, it will be put into the cache, and it may also 
trigger an eviction. If it does trigger an eviction and the eldest entry is 
dirty it will flush the dirty keys. It is at this point that a 
ClassCastException is thrown. This occurs because the ProcessorContext is still 
set to the context of the `leftJoin` and the next child in the topology is 
`mapValues`.
We need to set the correct `ProcessorNode`, on the context, in the 
`ForwardingCacheFlushListener` prior to calling `context.forward`. We also need 
to set remember to reset the `ProcessorNode` to the previous node once 
`context.forward` has completed.

```
   final KTable one = builder.table(Serdes.String(), 
Serdes.String(), tableOne, tableOne);
final KTable two = builder.table(Serdes.Long(), 
Serdes.String(), tableTwo, tableTwo);
final KTable reduce = two.groupBy(new 
KeyValueMapper>() {
@Override
public KeyValue apply(final Long key, final 
String value) {
return new KeyValue<>(value, key);
}
}, Serdes.String(), Serdes.Long())
.reduce(new Reducer() {..}, new Reducer() {..}, 
"reducer-store");

one.leftJoin(reduce, new ValueJoiner() {..})
.mapValues(new ValueMapper() {..});
   
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dguy/kafka kafka-4311

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2051


commit 5c8896a9d2aaca6bf5b9fb8de4de8140919fb280
Author: Damian Guy 
Date:   2016-10-21T11:27:17Z

Save and set the current processor node in FlushingCacheListener.




> Multi layer cache eviction causes forwarding to incorrect ProcessorNode 
> 
>
> Key: KAFKA-4311
> URL: https://issues.apache.org/jira/browse/KAFKA-4311
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.1
>
>
> The two exceptions below were reported by Frank on the dev mailing list. 
> After investigation, the root cause is multiple cache evictions happening in 
> the same topology. 
> Given a topology like the one below. If a record arriving in `tableOne` 
> causes a cache eviction, it will trigger the `leftJoin` that will do a `get` 
> from `reducer-store`. If the key is not currently cached in `reducer-store`, 
> but is in the backing store, it will be put into the cache, and it may also 
> trigger an eviction. If it does trigger an eviction and the eldest entry is 
> dirty it will flush the dirty keys. It is at this point that the exception in 
> the comment happens (ClassCastException). This occurs because the 
> ProcessorContext is still set to the context of the `leftJoin` and the next 
> child in the topology is `mapValues`.
> We need to set the correct `ProcessorNode`, on the context,  in the 
> `ForwardingCacheFlushListener` prior to calling `context.forward`. We also 
> need to set remember to reset the `ProcessorNode` to the previous node once 
> `context.forward` has completed.
> {code}
> final KTable one = builder.table(Serdes.String(), 
> Serdes.String(), tableOne, tableOne);
> final KTable two = builder.table(Serdes.Long(), 
> Serdes.String(), tableTwo, tableTwo);
> final KTable reduce = two.groupBy(new 
> KeyValueMapper>() {
> @Override
> public KeyValue apply(final Long key, final String 
> value) {
> return new KeyValue<>(value, key);
> }
> }, Serdes.String(), Serdes.Long())
> .reduce(new Reducer() {
>   

Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Jay Kreps
Harsha,

You seem to be saying that your only two options are to fork or duplicate
the existing REST project or add it to Kafka to be able to contribute. I
don't think those are the only two options. The other option is to
contribute to the existing successful project--which is Apache licensed and
getting active contribution today. You always have the power to
fork/duplicate later if you don't like the governance/community/direction.
Saying that you have to do this proactively doesn't really make sense.

-Jay

On Thu, Oct 20, 2016 at 2:27 PM, Harsha Chintalapani 
wrote:

> Jay,
>   REST API is something every user is in need of. If the argument is to
> clone and write your  API, this will do a disservice to the users as they
> now have to choose one vs. others instead of keeping one API that is
> supported in Kafka community.
>
> "Pre-emptively re-creating another
> REST layer when it seems like we all quite agree on what needs to be done
> and we have an existing code base for HTTP/Kafka access that is heavily
> used in production seems quite silly."
>Exactly our point. Why can't we develop this in Apache Kafka
> community? Instead of us open sourcing another GitHub project and creating
> a divide in users and another version of API. Let's build this in Kafka
> Community and use the governance model that is proven to provide vendor
> free user driven consensus features. The argument that is adding this REST
> server to Kafka will affect the agility of the project doesn't mak sense.
>
> It looks like your argument is either we develop all these small tools or
> none at all. We as a community need to look at supporting critical
> tools/API. Instead of dividing this project into individual external
> communities. We should build this as part of Kafka which best serves the
> needs of users.
> The Streams and Connect projects that were pushed into Kafka could
> have been left in their own Github projects based on your arguments. What
> about the REST API is so different that such that it should stay out of the
> Kafka project? From my experience, more users are asking for the REST API.
>
> Thanks,
> Harsha
>
> On Sun, Oct 16, 2016 at 5:19 PM Jungtaek Lim  wrote:
>
> > I guess no one doubts its power on REST server or even UI. I understand
> the
> > difficulty to add a module to project, but it's maximized when there is
> > less support expected hence maintenance issue is likely to rise, and IMHO
> > this seems to be not the case.
> >
> > There're also pain points when project doesn't maintain features and
> > delegates to ecosystem. Based on some points (last commit date, pull
> > request open and closed, and contributor graph), kafka-manager seems to
> > have similar activity to kafka-rest, but it doesn't show any responses
> for
> > pull request supporting Kafka 0.10.0 even though numerous users leave
> > comments wish to support. What Kafka community can do for that project to
> > follow up? Nothing but just persuading by leaving comments hoping that
> will
> > be merged. (or finally come up another implementation) Kafka project
> keeps
> > agile but in point of whole ecosystem it can be less agile.
> >
> > Yes decisions and roadmap of the project are driven by PMCs and I think
> > it's valid right. But we also imagine ASF projects as driven by community
> > aspect, though it's alike to ideal world. KIP makes innovation on
> adopting
> > new feature transparently, which makes many developers inspiring and
> > adopting it to their projects. Hopes that Kafka community continuously
> > drives the transparency model among the ASF projects, and beyond.
> >
> > - Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 10월 17일 (월) 오전 7:56, Jay Kreps 님이 작성:
> >
> > Hey Nacho,
> >
> > Yeah, I think it is definitely a call we have to make case by case. We
> have
> > some experience with this: originally we attempted to maintain things
> like
> > non-java clients, a hadoop connector, etc all in the main project. The
> > difficulty of that lead us to the current federated approach. In terms of
> > what is included now, yes, I agree you could potentially have even less
> > included.
> >
> > -Jay
> >
> > On Wed, Oct 12, 2016 at 11:37 AM, Nacho Solis
>  > >
> > wrote:
> >
> > > What is the criteria for keeping things in and out of Kafka, what code
> > goes
> > > in or out and what is part of the architecture or not?
> > >
> > > The discussion of what goes into a project and what stays out is an
> > always
> > > evolving question. Different projects treat this in different ways.
> > >
> > > Let me paint 2 extremes.  On one side, you have a single monolithic
> > project
> > > that brings everything in one tent.  On the other side you have the
> many
> > > modules approach.  From what I've learned, Kafka falls in the middle.
> > > Because of this, the question is bound to come up with respect to the
> > > criteria used to bring 

[jira] [Updated] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect Processor Node

2016-10-21 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4311:
--
Description: 
The two exceptions below were reported by Frank on the dev mailing list. After 
investigation, the root cause is multiple cache evictions happening in the same 
topology. 

Given a topology like the one below. If a record arriving in `tableOne` causes 
a cache eviction, it will trigger the `leftJoin` that will do a `get` from 
`reducer-store`. If the key is not currently cached in `reducer-store`, but is 
in the backing store, it will be put into the cache, and it may also trigger an 
eviction. If it does trigger an eviction and the eldest entry is dirty it will 
flush the dirty keys. It is at this point that the exception in the comment 
happens (ClassCastException). This occurs because the ProcessorContext is still 
set to the context of the `leftJoin` and the next child in the topology is 
`mapValues`.

We need to set the correct `ProcessorNode`, on the context,  in the 
`ForwardingCacheFlushListener` prior to calling `context.forward`. We also need 
to set remember to reset the `ProcessorNode` to the previous node once 
`context.forward` has completed.

{code}
final KTable one = builder.table(Serdes.String(), 
Serdes.String(), tableOne, tableOne);
final KTable two = builder.table(Serdes.Long(), 
Serdes.String(), tableTwo, tableTwo);
final KTable reduce = two.groupBy(new 
KeyValueMapper>() {
@Override
public KeyValue apply(final Long key, final String 
value) {
return new KeyValue<>(value, key);
}
}, Serdes.String(), Serdes.Long())
.reduce(new Reducer() {
@Override
public Long apply(final Long value1, final Long value2) {
return value1 + value2;
}
}, new Reducer() {
@Override
public Long apply(final Long value1, final Long value2) {
return value1 - value2;
}
}, "reducer-store");
one.leftJoin(reduce, new ValueJoiner() {
@Override
public String apply(final String value1, final Long value2) {
return value1 + ":" + value2;
}
})
.mapValues(new ValueMapper() {
@Override
public String apply(final String value) {
return value;
}
});
{code}


This exception is actually a symptom of the exception reported below in the 
comment. After the first exception is thrown, the StreamThread triggers a 
shutdown that then throws this exception.

[StreamThread-1] ERROR
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
[StreamThread-1] Failed to close state manager for StreamTask 0_0:
org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
to close state store addr-organization

at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
at
org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
at
org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
at
org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
at
org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)

at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
entry is null

at
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
at
org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)

... 7 more


  was:
The two exceptions below were reported by Frank on the dev mailing list. After 
investigation, the root cause is multiple cache evictions happening in the same 
topology. 

{code}
final KTable one = builder.table(Serdes.String(), 
Serdes.String(), tableOne, tableOne);
final KTable two = 

Re: Kafka KIP meeting Oct 19 at 11:00am PST

2016-10-21 Thread Gwen Shapira
I think this will be a good idea. It will separate an issue with concrete
use-case and a major pain-point from a wider discussion about architectures
and who is responsible for data formats.


On Fri, Oct 21, 2016 at 12:57 AM, Michael Pearce 
wrote:

> I had noted that what ever the solution having compaction based on null
> payload was agreed isn't elegant.
>
> Shall we raise another kip to : as discussed propose using a attribute bit
> for delete/compaction flag as well/or instead of null value and updating
> compaction logic to look at that delelete/compaction attribute
>
> I believe this is less contentious, so that at least we get that done
> alleviating some concerns whilst the below gets discussed further?
>
> 
> From: Jun Rao 
> Sent: Wednesday, October 19, 2016 8:56:52 PM
> To: dev@kafka.apache.org
> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
>
> The following are the notes from today's KIP discussion.
>
>
>- KIP-82 - add record header: We agreed that there are use cases for
>third-party vendors building tools around Kafka. We haven't reached the
>conclusion whether the added complexity justifies the use cases. We will
>follow up on the mailing list with use cases, container format people
> have
>been using, and details on the proposal.
>
>
> The video will be uploaded soon in https://cwiki.apache.org/
> confluence/display/KAFKA/Kafka+Improvement+Proposals .
>
> Thanks,
>
> Jun
>
> On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao  wrote:
>
> > Hi, Everyone.,
> >
> > We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am PST.
> > If you plan to attend but haven't received an invite, please let me know.
> > The following is the tentative agenda.
> >
> > Agenda:
> > KIP-82: add record header
> >
> > Thanks,
> >
> > Jun
> >
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Nacho Solis
Are you saying Kafka REST is subjective but Kafka Streams and Kafka Connect
are not subjective?

> "there are likely places that can live without a rest proxy"

There are also places that can live without Kafka Streams and Kafka Connect.

Nacho

On Fri, Oct 21, 2016 at 11:17 AM, Jun Rao  wrote:

> At the high level, I think ideally it makes sense to add a component to
> Apache Kafka if (1) it's widely needed and (2) it needs tight integration
> with Kafka core. For Kafka Stream, we do expect stream processing will be
> used widely in the future. Implementation wise, Kafka Stream only supports
> getting data from Kafka and leverages quite a few of the core
> functionalities in Kafka core. For example, it uses customized rebalance
> callback in the consumer and uses the compacted topic heavily. So, having
> Kafka Stream in the same repo makes it easier for testing when those core
> functionalities evolve over time. Kafka Connect is in the same situation.
>
> For rest proxy, whether it's widely used or not is going to be a bit
> subjective. However, there are likely places that can live without a rest
> proxy. The rest proxy is just a proxy for the regular clients and doesn't
> need to be tightly integrated with Kafka core. So, the case for including
> rest proxy in Apache Kafka is probably not as strong as Kafka Stream and
> Kafka Connect.
>
> Thanks,
>
> Jun
>
> On Thu, Oct 20, 2016 at 11:28 PM, Michael Pearce 
> wrote:
>
> > So from my reading essentially the first question needs to answered/and
> > voted on is:
> >
> > Is Apache Kafka Community only about the Core or does the apache
> community
> > also support some subprojects (and just we need some better way to manage
> > this)
> >
> > If vote for Core only wins, then the following should be removed:
> > Kafka Connect
> > Kafka Stream
> >
> > If vote for Core only loses (aka we will support subprojects) then:
> > We should look to add Kafka Rest
> >
> > And we should look to see how we can manage better govern and manage
> > submodules.
> >
> > A good example which id propose here is how some other communities in
> > Apache do this.
> >
> > Each Module has a Module Management Committee(MMC), this is like almost
> > the PMC but at a per module basis.
> >
> > This MMC should essentially hold the binding votes for that module.
> > The MMC should be made up of a single representative from each
> > organisation  (so no single organisation can fully veto the community it
> > has to a genuine consenus)
> > The MMC requires at least 3 members (so there cant be a tied vote on 2)
> > For a new Module to be added a MMC committee should be sought
> > A new Module is only capable of being added if the above requirements can
> > be met (e.g. 3 people wishing to step up, from 3 organisations) so that
> > only actively support modules would be added
> >
> > The PMC reviews each module every 6months or Year. If MMC is inactive, a
> > vote/call to find replacements if raised, if none are forthcoming
> dropping
> > the MMC to less than 3 then the module moves to "the attic" (very much
> like
> > apache attic but a little more aggressively)
> >
> > This way the PMC does not need to micro manage every module
> > We only add modules where some amount of active support and maintenance
> > and use is provided by the community
> > We have an automatic way to retire old or inactive projects.
> >
> > Thoughts?
> > Mike
> >
> >
> > 
> > From: Harsha Ch 
> > Sent: Thursday, October 20, 2016 10:26 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-80: Kafka REST Server
> >
> > Jay,
> >   REST API is something every user is in need of. If the argument is
> to
> > clone and write your  API, this will do a disservice to the users as they
> > now have to choose one vs. others instead of keeping one API that is
> > supported in Kafka community.
> >
> > "Pre-emptively re-creating another
> > REST layer when it seems like we all quite agree on what needs to be done
> > and we have an existing code base for HTTP/Kafka access that is heavily
> > used in production seems quite silly."
> >
> >Exactly our point. Why can't we develop this in Apache Kafka
> > community? Instead of us open sourcing another GitHub project and
> creating
> > a divide in users and another version of API. Let's build this in Kafka
> > Community and use the governance model that is proven to provide vendor
> > free user driven consensus features. The argument that is adding this
> REST
> > server to Kafka will affect the agility of the project doesn't mak sense.
> >
> > It looks like your argument is either we develop all these small tools or
> > none at all. We as a community need to look at supporting critical
> > tools/API. Instead of dividing this project into individual external
> > communities. We should build this as part of Kafka which best serves the
> > needs of users.
> > 

Re: [jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

2016-10-21 Thread Greg Fodor
I managed to track down one case where we were seeing issues with missing
data when transitioning to a new node to being a retention policy on the
topic. There is an additional case but have not been able to repro at this
time. We recently fixed a problem where we were failing to properly
gracefully shut down our jobs in certain cases so there's a chance that
might be related. Anyhow, now that I have a better understanding of things
I will be able to investigate if we experience missing keys in the future,
thanks!

On Oct 20, 2016 2:08 PM, "Greg Fodor (JIRA)"  wrote:


[ https://issues.apache.org/jira/browse/KAFKA-4113?page=
com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel=15593018#comment-15593018 ]

Greg Fodor commented on KAFKA-4113:
---

Oh, so it should be doing exactly what makes sense to me -- I am on 0.10.0.
Let me verify that there isn't something else going on! Thanks for the info.

> Allow KTable bootstrap
> --
>
> Key: KAFKA-4113
> URL: https://issues.apache.org/jira/browse/KAFKA-4113
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Guozhang Wang
>
> On the mailing list, there are multiple request about the possibility to
"fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating
phase should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the
data. Only after this topic got read completely and the KTable is ready,
the application should start processing. This would indicate, that on
startup, the current partition sizes must be fetched and stored, and after
KTable got populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this
JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table
without reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4099) Change the time based log rolling to only based on the message timestamp.

2016-10-21 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596037#comment-15596037
 ] 

Jun Rao commented on KAFKA-4099:


I had two use cases of time-based rolling in mind. The first one is for users 
who don't want to retain a message (say sensitive data) in the log for too 
long. In this case, we want to be able to roll the log periodically based on 
time such that it will freeze the largest timestamp in the rolled segment and 
cause it to be deleted when the time limit has been reached. The second one is 
for log cleaner to happen quicker since the cleaner never cleans the active 
segment. In both cases, we really just want to be able to roll the log at some 
predicable time interval. There are different implementations can achieve this. 

The issue with the current implementation is that if data with oscillating 
timestamp are published at the same time, it causes the log to roll to quickly, 
which will surprise people. We can ask people to turn off log rolling in most 
cases. However, the default log rolling is 7 days and people could hit this 
issue before realizing it. In some of the rare cases, people may indeed want to 
configure time-based log rolling and may still send data with oscillating 
timestamp. It would be good if the underlying system can support his without 
any performance impact.

As for a better implementation, the original approach of just rolling based on 
create time addresses both use cases in the common cases, without the risk of 
rolling too frequently. The only thing is that create time will be reset when 
segments get moved. However, that happens rarely though. So, if there are no 
other better solutions that we could think of, this could be a safer 
implementation.

> Change the time based log rolling to only based on the message timestamp.
> -
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior to only based on the message timestamp when the messages are in 
> message format 0.10.0 or above. If the first message in the segment does not 
> have a timetamp, we will fall back to use the wall clock time for log rolling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Jun Rao
At the high level, I think ideally it makes sense to add a component to
Apache Kafka if (1) it's widely needed and (2) it needs tight integration
with Kafka core. For Kafka Stream, we do expect stream processing will be
used widely in the future. Implementation wise, Kafka Stream only supports
getting data from Kafka and leverages quite a few of the core
functionalities in Kafka core. For example, it uses customized rebalance
callback in the consumer and uses the compacted topic heavily. So, having
Kafka Stream in the same repo makes it easier for testing when those core
functionalities evolve over time. Kafka Connect is in the same situation.

For rest proxy, whether it's widely used or not is going to be a bit
subjective. However, there are likely places that can live without a rest
proxy. The rest proxy is just a proxy for the regular clients and doesn't
need to be tightly integrated with Kafka core. So, the case for including
rest proxy in Apache Kafka is probably not as strong as Kafka Stream and
Kafka Connect.

Thanks,

Jun

On Thu, Oct 20, 2016 at 11:28 PM, Michael Pearce 
wrote:

> So from my reading essentially the first question needs to answered/and
> voted on is:
>
> Is Apache Kafka Community only about the Core or does the apache community
> also support some subprojects (and just we need some better way to manage
> this)
>
> If vote for Core only wins, then the following should be removed:
> Kafka Connect
> Kafka Stream
>
> If vote for Core only loses (aka we will support subprojects) then:
> We should look to add Kafka Rest
>
> And we should look to see how we can manage better govern and manage
> submodules.
>
> A good example which id propose here is how some other communities in
> Apache do this.
>
> Each Module has a Module Management Committee(MMC), this is like almost
> the PMC but at a per module basis.
>
> This MMC should essentially hold the binding votes for that module.
> The MMC should be made up of a single representative from each
> organisation  (so no single organisation can fully veto the community it
> has to a genuine consenus)
> The MMC requires at least 3 members (so there cant be a tied vote on 2)
> For a new Module to be added a MMC committee should be sought
> A new Module is only capable of being added if the above requirements can
> be met (e.g. 3 people wishing to step up, from 3 organisations) so that
> only actively support modules would be added
>
> The PMC reviews each module every 6months or Year. If MMC is inactive, a
> vote/call to find replacements if raised, if none are forthcoming dropping
> the MMC to less than 3 then the module moves to "the attic" (very much like
> apache attic but a little more aggressively)
>
> This way the PMC does not need to micro manage every module
> We only add modules where some amount of active support and maintenance
> and use is provided by the community
> We have an automatic way to retire old or inactive projects.
>
> Thoughts?
> Mike
>
>
> 
> From: Harsha Ch 
> Sent: Thursday, October 20, 2016 10:26 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-80: Kafka REST Server
>
> Jay,
>   REST API is something every user is in need of. If the argument is to
> clone and write your  API, this will do a disservice to the users as they
> now have to choose one vs. others instead of keeping one API that is
> supported in Kafka community.
>
> "Pre-emptively re-creating another
> REST layer when it seems like we all quite agree on what needs to be done
> and we have an existing code base for HTTP/Kafka access that is heavily
> used in production seems quite silly."
>
>Exactly our point. Why can't we develop this in Apache Kafka
> community? Instead of us open sourcing another GitHub project and creating
> a divide in users and another version of API. Let's build this in Kafka
> Community and use the governance model that is proven to provide vendor
> free user driven consensus features. The argument that is adding this REST
> server to Kafka will affect the agility of the project doesn't mak sense.
>
> It looks like your argument is either we develop all these small tools or
> none at all. We as a community need to look at supporting critical
> tools/API. Instead of dividing this project into individual external
> communities. We should build this as part of Kafka which best serves the
> needs of users.
> The Streams and Connect projects that were pushed into Kafka could
> have been left in their own Github projects based on your arguments. What
> about the REST API is so different that such that it should stay out of the
> Kafka project? From my experience, more users are asking for the REST API.
>
> Thanks,
> Harsha
>
>
>
>
>
> On Wed, Oct 12, 2016 at 8:03 AM Jay Kreps  wrote:
>
> > I think the questions around governance make sense, I think we should
> > really clarify that to make the process more 

Re: Kafka KIP meeting Oct 19 at 11:00am PST

2016-10-21 Thread Nacho Solis
I think a separate KIP is a good idea as well.  Note however that potential
decisions in this KIP could affect the other KIP.

Nacho

On Fri, Oct 21, 2016 at 10:23 AM, Jun Rao  wrote:

> Michael,
>
> Yes, doing a separate KIP to address the null payload issue for compacted
> topics is a good idea.
>
> Thanks,
>
> Jun
>
> On Fri, Oct 21, 2016 at 12:57 AM, Michael Pearce 
> wrote:
>
> > I had noted that what ever the solution having compaction based on null
> > payload was agreed isn't elegant.
> >
> > Shall we raise another kip to : as discussed propose using a attribute
> bit
> > for delete/compaction flag as well/or instead of null value and updating
> > compaction logic to look at that delelete/compaction attribute
> >
> > I believe this is less contentious, so that at least we get that done
> > alleviating some concerns whilst the below gets discussed further?
> >
> > 
> > From: Jun Rao 
> > Sent: Wednesday, October 19, 2016 8:56:52 PM
> > To: dev@kafka.apache.org
> > Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
> >
> > The following are the notes from today's KIP discussion.
> >
> >
> >- KIP-82 - add record header: We agreed that there are use cases for
> >third-party vendors building tools around Kafka. We haven't reached
> the
> >conclusion whether the added complexity justifies the use cases. We
> will
> >follow up on the mailing list with use cases, container format people
> > have
> >been using, and details on the proposal.
> >
> >
> > The video will be uploaded soon in https://cwiki.apache.org/
> > confluence/display/KAFKA/Kafka+Improvement+Proposals .
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao  wrote:
> >
> > > Hi, Everyone.,
> > >
> > > We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am
> PST.
> > > If you plan to attend but haven't received an invite, please let me
> know.
> > > The following is the tentative agenda.
> > >
> > > Agenda:
> > > KIP-82: add record header
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> others
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> email
> > and any copies of it. Opinions, conclusion (etc) that do not relate to
> the
> > official business of this company shall be understood as neither given
> nor
> > endorsed by it. IG is a trading name of IG Markets Limited (a company
> > registered in England and Wales, company number 04008957) and IG Index
> > Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> the
> > Financial Conduct Authority.
> >
>



-- 
Nacho (Ignacio) Solis
Kafka
nso...@linkedin.com


Re: Kafka KIP meeting Oct 19 at 11:00am PST

2016-10-21 Thread Jun Rao
Michael,

Yes, doing a separate KIP to address the null payload issue for compacted
topics is a good idea.

Thanks,

Jun

On Fri, Oct 21, 2016 at 12:57 AM, Michael Pearce 
wrote:

> I had noted that what ever the solution having compaction based on null
> payload was agreed isn't elegant.
>
> Shall we raise another kip to : as discussed propose using a attribute bit
> for delete/compaction flag as well/or instead of null value and updating
> compaction logic to look at that delelete/compaction attribute
>
> I believe this is less contentious, so that at least we get that done
> alleviating some concerns whilst the below gets discussed further?
>
> 
> From: Jun Rao 
> Sent: Wednesday, October 19, 2016 8:56:52 PM
> To: dev@kafka.apache.org
> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
>
> The following are the notes from today's KIP discussion.
>
>
>- KIP-82 - add record header: We agreed that there are use cases for
>third-party vendors building tools around Kafka. We haven't reached the
>conclusion whether the added complexity justifies the use cases. We will
>follow up on the mailing list with use cases, container format people
> have
>been using, and details on the proposal.
>
>
> The video will be uploaded soon in https://cwiki.apache.org/
> confluence/display/KAFKA/Kafka+Improvement+Proposals .
>
> Thanks,
>
> Jun
>
> On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao  wrote:
>
> > Hi, Everyone.,
> >
> > We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am PST.
> > If you plan to attend but haven't received an invite, please let me know.
> > The following is the tentative agenda.
> >
> > Agenda:
> > KIP-82: add record header
> >
> > Thanks,
> >
> > Jun
> >
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the email
> and any copies of it. Opinions, conclusion (etc) that do not relate to the
> official business of this company shall be understood as neither given nor
> endorsed by it. IG is a trading name of IG Markets Limited (a company
> registered in England and Wales, company number 04008957) and IG Index
> Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>


[jira] [Created] (KAFKA-4331) Kafka Streams resetter is slow because it joins the same group for each topic

2016-10-21 Thread Roger Hoover (JIRA)
Roger Hoover created KAFKA-4331:
---

 Summary: Kafka Streams resetter is slow because it joins the same 
group for each topic
 Key: KAFKA-4331
 URL: https://issues.apache.org/jira/browse/KAFKA-4331
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Affects Versions: 0.10.0.1, 0.10.0.0
Reporter: Roger Hoover
Assignee: Matthias J. Sax


The resetter is joining the same group for each topic which takes ~10secs in my 
testing.  This makes the reset very slow when you have a lot of topics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-10-21 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596540#comment-15596540
 ] 

Vahid Hashemian commented on KAFKA-3144:


[~hachikuji] suggested that we report the coordinator id when {{--list}} option 
is used. I'll open a JIRA to address this request.

> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-73 - Replication Quotas

2016-10-21 Thread Joel Koshy
Thanks for catching that and the fix as well. Makes sense to me.

We should consider adding an "amendments" section to KIPs - perhaps just a
link to KAFKA-4313 would suffice in this case.

Thanks,

Joel

On Wed, Oct 19, 2016 at 7:12 PM, Jun Rao  wrote:

> Hi,
>
> While testing KIP-73, we found an issue described in
> https://issues.apache.org/jira/browse/KAFKA-4313. Basically, when there
> are
> mixed high-volume and low-volume partitions, when replication throttling is
> specified, ISRs for those low volume partitions could thrash. KAFKA-4313
> fixes this issue by avoiding throttling those replicas in the throttled
> replica list that are already in sync. Those in-sync replicas traffic will
> still be accounted for the throttled traffic though. Just want to bring
> this up since it slightly changes the behavior described in the KIP. If
> anyone has concerns on this, please comment on the jira.
>
> Thanks,
>
> Jun
>
> On Tue, Aug 23, 2016 at 3:25 PM, Ismael Juma  wrote:
>
> > For the record, there were 4 binding +1s.
> >
> > Ismael
> >
> > On Tue, Aug 23, 2016 at 11:16 PM, Ben Stopford  wrote:
> >
> > > Thanks everyone. It looks like this KIP has now been accepted.
> > >
> > > There is a corresponding PR  >
> > > for the implementation also.
> > >
> > > All the best
> > >
> > > B
> > >
> > >
> > > > On 23 Aug 2016, at 22:39, Joel Koshy  wrote:
> > > >
> > > > +1
> > > > (sent some very minor edits to you off-thread)
> > > >
> > > > On Fri, Aug 19, 2016 at 1:21 AM, Ben Stopford 
> > wrote:
> > > >
> > > >> I’d like to initiate the voting process for KIP-73:
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >> 73+Replication+Quotas  > > >> confluence/display/KAFKA/KIP-73+Replication+Quotas>
> > > >>
> > > >> Ben
> > >
> > >
> >
>


[jira] [Comment Edited] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-10-21 Thread Vahid Hashemian (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596540#comment-15596540
 ] 

Vahid Hashemian edited comment on KAFKA-3144 at 10/21/16 10:31 PM:
---

[~hachikuji] suggested that we report the coordinator id when {{--list}} option 
is used. [KAFKA-4333|https://issues.apache.org/jira/browse/KAFKA-4333] was 
created to track this improvement.


was (Author: vahid):
[~hachikuji] suggested that we report the coordinator id when {{--list}} option 
is used. I'll open a JIRA to address this request.

> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4332) kafka.api.UserQuotaTest.testThrottledProducerConsumer transient unit test failure

2016-10-21 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-4332:
--

 Summary: kafka.api.UserQuotaTest.testThrottledProducerConsumer 
transient unit test failure
 Key: KAFKA-4332
 URL: https://issues.apache.org/jira/browse/KAFKA-4332
 Project: Kafka
  Issue Type: Sub-task
Affects Versions: 0.10.1.0
Reporter: Jun Rao


kafka.api.UserQuotaTest > testThrottledProducerConsumer FAILED
java.lang.AssertionError: Should have been throttled



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-4296) LogCleaner CleanerStats swap logic seems incorrect

2016-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596460#comment-15596460
 ] 

ASF GitHub Bot commented on KAFKA-4296:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2016


> LogCleaner CleanerStats swap logic seems incorrect
> --
>
> Key: KAFKA-4296
> URL: https://issues.apache.org/jira/browse/KAFKA-4296
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.10.2.0
>
>
> In LogCleaner, we keep track of two instances of the {{CleanerStats}} object 
> in a tuple object. One instance is intended to keep track the stats for the 
> last cycle while the other is for the current cycle. The idea is to swap them 
> after each cleaning cycle, but the current logic does not actually mutate the 
> existing tuple, which means that we always clear the same instance of 
> {{CleanerStats}} after each cleaning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2052: MINOR: add list_topics command to help debug tests

2016-10-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2052


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-4333) Report consumer group coordinator id when '--list' option is used

2016-10-21 Thread Vahid Hashemian (JIRA)
Vahid Hashemian created KAFKA-4333:
--

 Summary: Report consumer group coordinator id when '--list' option 
is used
 Key: KAFKA-4333
 URL: https://issues.apache.org/jira/browse/KAFKA-4333
 Project: Kafka
  Issue Type: Improvement
  Components: consumer
Reporter: Vahid Hashemian
Assignee: Vahid Hashemian
Priority: Minor


One piece of information missing when extracting information about consumer 
groups (Java API based) is the coordinator id (broker id of the coordinator). 
It would be useful to enhance the {{--list}} option of the consumer group 
command to report the corresponding coordinator id of each consumer group.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2053: KAFKA-4326: Refactor LogCleaner for better reuse o...

2016-10-21 Thread hachikuji
GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2053

KAFKA-4326: Refactor LogCleaner for better reuse of common copy/compress 
logic



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-4326

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2053.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2053


commit 9954ed9a23716ca88ec6ebb5db82e66505df13e2
Author: Jason Gustafson 
Date:   2016-10-12T21:55:10Z

KAFKA-4326: Refactor LogCleaner for better reuse of common copy/compress 
logic




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4326) Refactor LogCleaner to remove duplicate log copying logic

2016-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596601#comment-15596601
 ] 

ASF GitHub Bot commented on KAFKA-4326:
---

GitHub user hachikuji opened a pull request:

https://github.com/apache/kafka/pull/2053

KAFKA-4326: Refactor LogCleaner for better reuse of common copy/compress 
logic



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hachikuji/kafka KAFKA-4326

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2053.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2053


commit 9954ed9a23716ca88ec6ebb5db82e66505df13e2
Author: Jason Gustafson 
Date:   2016-10-12T21:55:10Z

KAFKA-4326: Refactor LogCleaner for better reuse of common copy/compress 
logic




> Refactor LogCleaner to remove duplicate log copying logic
> -
>
> Key: KAFKA-4326
> URL: https://issues.apache.org/jira/browse/KAFKA-4326
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>
> I think there's some code duplication in the log cleaner with respect to the 
> copying of the log and re-compression after cleaning. We have similar logic 
> already in {{ByteBufferMessageSet}} which we can potentially reuse. This 
> improves encapsulation so that message format changes in the future become a 
> little easier (since they touch fewer components).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (KAFKA-2089) MetadataTest transient failure

2016-10-21 Thread Jun Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao reopened KAFKA-2089:


Reopening this issue. Saw the following transient failure in trunk.

org.apache.kafka.clients.MetadataTest > testMetadata FAILED
java.lang.AssertionError: Exception in background thread : 
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata 
after 100 ms. expected null, but 
was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotNull(Assert.java:755)
at org.junit.Assert.assertNull(Assert.java:737)
at org.apache.kafka.clients.MetadataTest.tearDown(MetadataTest.java:49)


> MetadataTest transient failure
> --
>
> Key: KAFKA-2089
> URL: https://issues.apache.org/jira/browse/KAFKA-2089
> Project: Kafka
>  Issue Type: Sub-task
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Rajini Sivaram
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-2089.patch, KAFKA-2089.patch, 
> KAFKA-2089_2015-04-13_18:59:33.patch
>
>
> org.apache.kafka.clients.MetadataTest > testMetadata FAILED
> java.lang.AssertionError:
> at org.junit.Assert.fail(Assert.java:91)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at org.junit.Assert.assertFalse(Assert.java:68)
> at org.junit.Assert.assertFalse(Assert.java:79)
> at 
> org.apache.kafka.clients.MetadataTest.tearDown(MetadataTest.java:34)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Sriram Subramanian
FWIW, Apache Kafka has evolved a lot from where it started. It did start as
a messaging system. Over time we realized that that the vision for Kafka is
to build a streaming platform and not just a messaging system. You can take
a look at the site for more description about what comprises the streaming
platform http://kafka.apache.org/ and http://kafka.apache.org/intro.

Can the streaming platform exist without Connect? - No. Data integration is
fundamental to building an end to end platform

Can the streaming platform exist without stream processing? - No.
Processing stream data again is a core part of streaming platform.

Can the streaming platform exist without clients? - We at least need one
client library to complete the platform. Our Java clients help us to
complete the platform story. The rest of the clients are built and
maintained outside the project.

Can the platform exist without the rest proxy? - Yes. The proxy does not
complete the platform vision in anyway. It is just a good to have tool that
might be required by quite a few users and there is an active project that
works on this - https://github.com/confluentinc/kafka-rest




On Fri, Oct 21, 2016 at 11:49 AM, Nacho Solis 
wrote:

> Are you saying Kafka REST is subjective but Kafka Streams and Kafka Connect
> are not subjective?
>
> > "there are likely places that can live without a rest proxy"
>
> There are also places that can live without Kafka Streams and Kafka
> Connect.
>
> Nacho
>
> On Fri, Oct 21, 2016 at 11:17 AM, Jun Rao  wrote:
>
> > At the high level, I think ideally it makes sense to add a component to
> > Apache Kafka if (1) it's widely needed and (2) it needs tight integration
> > with Kafka core. For Kafka Stream, we do expect stream processing will be
> > used widely in the future. Implementation wise, Kafka Stream only
> supports
> > getting data from Kafka and leverages quite a few of the core
> > functionalities in Kafka core. For example, it uses customized rebalance
> > callback in the consumer and uses the compacted topic heavily. So, having
> > Kafka Stream in the same repo makes it easier for testing when those core
> > functionalities evolve over time. Kafka Connect is in the same situation.
> >
> > For rest proxy, whether it's widely used or not is going to be a bit
> > subjective. However, there are likely places that can live without a rest
> > proxy. The rest proxy is just a proxy for the regular clients and doesn't
> > need to be tightly integrated with Kafka core. So, the case for including
> > rest proxy in Apache Kafka is probably not as strong as Kafka Stream and
> > Kafka Connect.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Oct 20, 2016 at 11:28 PM, Michael Pearce 
> > wrote:
> >
> > > So from my reading essentially the first question needs to answered/and
> > > voted on is:
> > >
> > > Is Apache Kafka Community only about the Core or does the apache
> > community
> > > also support some subprojects (and just we need some better way to
> manage
> > > this)
> > >
> > > If vote for Core only wins, then the following should be removed:
> > > Kafka Connect
> > > Kafka Stream
> > >
> > > If vote for Core only loses (aka we will support subprojects) then:
> > > We should look to add Kafka Rest
> > >
> > > And we should look to see how we can manage better govern and manage
> > > submodules.
> > >
> > > A good example which id propose here is how some other communities in
> > > Apache do this.
> > >
> > > Each Module has a Module Management Committee(MMC), this is like almost
> > > the PMC but at a per module basis.
> > >
> > > This MMC should essentially hold the binding votes for that module.
> > > The MMC should be made up of a single representative from each
> > > organisation  (so no single organisation can fully veto the community
> it
> > > has to a genuine consenus)
> > > The MMC requires at least 3 members (so there cant be a tied vote on 2)
> > > For a new Module to be added a MMC committee should be sought
> > > A new Module is only capable of being added if the above requirements
> can
> > > be met (e.g. 3 people wishing to step up, from 3 organisations) so that
> > > only actively support modules would be added
> > >
> > > The PMC reviews each module every 6months or Year. If MMC is inactive,
> a
> > > vote/call to find replacements if raised, if none are forthcoming
> > dropping
> > > the MMC to less than 3 then the module moves to "the attic" (very much
> > like
> > > apache attic but a little more aggressively)
> > >
> > > This way the PMC does not need to micro manage every module
> > > We only add modules where some amount of active support and maintenance
> > > and use is provided by the community
> > > We have an automatic way to retire old or inactive projects.
> > >
> > > Thoughts?
> > > Mike
> > >
> > >
> > > 
> > > From: Harsha Ch 
> > > Sent: Thursday, 

Re: [jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

2016-10-21 Thread Guozhang Wang
Great to know! Thanks Greg.

Please keep us posted with any new finding you have.


Guozhang

On Fri, Oct 21, 2016 at 12:35 PM, Greg Fodor  wrote:

> I managed to track down one case where we were seeing issues with missing
> data when transitioning to a new node to being a retention policy on the
> topic. There is an additional case but have not been able to repro at this
> time. We recently fixed a problem where we were failing to properly
> gracefully shut down our jobs in certain cases so there's a chance that
> might be related. Anyhow, now that I have a better understanding of things
> I will be able to investigate if we experience missing keys in the future,
> thanks!
>
> On Oct 20, 2016 2:08 PM, "Greg Fodor (JIRA)"  wrote:
>
>
> [ https://issues.apache.org/jira/browse/KAFKA-4113?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel=15593018#comment-15593018 ]
>
> Greg Fodor commented on KAFKA-4113:
> ---
>
> Oh, so it should be doing exactly what makes sense to me -- I am on 0.10.0.
> Let me verify that there isn't something else going on! Thanks for the
> info.
>
> > Allow KTable bootstrap
> > --
> >
> > Key: KAFKA-4113
> > URL: https://issues.apache.org/jira/browse/KAFKA-4113
> > Project: Kafka
> >  Issue Type: Sub-task
> >  Components: streams
> >Reporter: Matthias J. Sax
> >Assignee: Guozhang Wang
> >
> > On the mailing list, there are multiple request about the possibility to
> "fully populate" a KTable before actual stream processing start.
> > Even if it is somewhat difficult to define, when the initial populating
> phase should end, there are multiple possibilities:
> > The main idea is, that there is a rarely updated topic that contains the
> data. Only after this topic got read completely and the KTable is ready,
> the application should start processing. This would indicate, that on
> startup, the current partition sizes must be fetched and stored, and after
> KTable got populated up to those offsets, stream processing can start.
> > Other discussed ideas are:
> > 1) an initial fixed time period for populating
> > (it might be hard for a user to estimate the correct value)
> > 2) an "idle" period, ie, if no update to a KTable for a certain time is
> > done, we consider it as populated
> > 3) a timestamp cut off point, ie, all records with an older timestamp
> > belong to the initial populating phase
> > The API change is not decided yet, and the API desing is part of this
> JIRA.
> > One suggestion (for option (4)) was:
> > {noformat}
> > KTable table = builder.table("topic", 1000); // populate the table
> without reading any other topics until see one record with timestamp 1000.
> > {noformat}
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>



-- 
-- Guozhang


[jira] [Resolved] (KAFKA-4296) LogCleaner CleanerStats swap logic seems incorrect

2016-10-21 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-4296.

   Resolution: Fixed
Fix Version/s: (was: 0.10.1.1)
   0.10.2.0

Issue resolved by pull request 2016
[https://github.com/apache/kafka/pull/2016]

> LogCleaner CleanerStats swap logic seems incorrect
> --
>
> Key: KAFKA-4296
> URL: https://issues.apache.org/jira/browse/KAFKA-4296
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
> Fix For: 0.10.2.0
>
>
> In LogCleaner, we keep track of two instances of the {{CleanerStats}} object 
> in a tuple object. One instance is intended to keep track the stats for the 
> last cycle while the other is for the current cycle. The idea is to swap them 
> after each cleaning cycle, but the current logic does not actually mutate the 
> existing tuple, which means that we always clear the same instance of 
> {{CleanerStats}} after each cleaning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2016: KAFKA-4296: Fix LogCleaner statistics rolling

2016-10-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2016


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2052: MINOR: add list_topics command to help debug tests

2016-10-21 Thread xvrl
GitHub user xvrl opened a pull request:

https://github.com/apache/kafka/pull/2052

MINOR: add list_topics command to help debug tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xvrl/kafka test-add-list-topics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2052.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2052


commit d954a039eb0108fb65d5c5068586c8a6e7b74ae9
Author: Xavier Léauté 
Date:   2016-10-21T21:37:48Z

add list_topics command to help debug tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk8 #997

2016-10-21 Thread Apache Jenkins Server
See 



[jira] [Updated] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-10-21 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson updated KAFKA-3144:
---
   Resolution: Fixed
Fix Version/s: 0.10.2.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1336
[https://github.com/apache/kafka/pull/1336]

> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
> Fix For: 0.10.2.0
>
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3144) report members with no assigned partitions in ConsumerGroupCommand

2016-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596984#comment-15596984
 ] 

ASF GitHub Bot commented on KAFKA-3144:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1336


> report members with no assigned partitions in ConsumerGroupCommand
> --
>
> Key: KAFKA-3144
> URL: https://issues.apache.org/jira/browse/KAFKA-3144
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 0.9.0.0
>Reporter: Jun Rao
>Assignee: Vahid Hashemian
>  Labels: newbie
> Fix For: 0.10.2.0
>
>
> A couple of suggestions on improving ConsumerGroupCommand. 
> 1. It would be useful to list members with no assigned partitions when doing 
> describe in ConsumerGroupCommand.
> 2. Currently, we show the client.id of each member when doing describe in 
> ConsumerGroupCommand. Since client.id is supposed to be the logical 
> application id, all members in the same group are supposed to set the same 
> client.id. So, it would be clearer if we show the client id as well as the 
> member id.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1336: KAFKA-3144: Report members with no assigned partit...

2016-10-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1336


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2045: Fix for kafka-4295: kafka-console-consumer.sh does...

2016-10-21 Thread amethystic
Github user amethystic closed the pull request at:

https://github.com/apache/kafka/pull/2045


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4099) Change the time based log rolling to only based on the message timestamp.

2016-10-21 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15596965#comment-15596965
 ] 

Jiangjie Qin commented on KAFKA-4099:
-

[~junrao] Thanks for the explanation. I agree that it is reasonable to roll the 
log segment based on create time. However, I have a few concern over using the 
original proposal:
1. It seems the rareness of replica movement is related to scale. e.g. today we 
have over 1800 brokers at LI and 1-2 brokers die every day. So partition 
reassignment almost happen every day. So I think there is a difference between 
"rare at small scale" and "rare regardless of scale". 
2. The incorrect create time does not only happen when partition movement 
occurs. It seems most linux does not have a create time for the files. So the 
create time of a segment would be lost when the brokers are rebooted.

Actually after thinking about the case of oscillating timestamp again, I am not 
sure if that would actually cause frequent log rolling or not. Let's say we 
have two producers one producing messages with current timestamp. The other one 
is producing with timestamps of 7 days old. Assume the current active segment 
is segment 0 and the current time is T. Because the log rolling is based on the 
timestamp of the first message in a log segment, it is possible that the first 
timestamp in segment 0 is 7 days ago (T - 7 days) so once we append a current 
timestamp T, segment 1 is rolled out and its first timestamp will be T, so 
segment 1 won't roll immediately like the previous one, i.e. segment 2 will 
only be rolled out when it sees a timestamp greater than (T + log.roll.ms), and 
so on.

In the above example, it is possible that segment 2 is rolled out because of 
the segment size. In that case, segment 2 may have the first timestamp of (T - 
7days) and segment 3 may get rolled out immediately but segment 3 will again 
wait until either the segment is full or it sees a bigger timestamp that 
triggers the log rolling. So in the worst case, we may roll out two new 
segments in a row. not sure how bad it would be in terms of performance.

Admittedly, if we have some certain timestamp pattern, frequent log rolling may 
still happen. I am curious did you see any real timestamp pattern that has 
caused the frequent log rolling?

> Change the time based log rolling to only based on the message timestamp.
> -
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior to only based on the message timestamp when the messages are in 
> message format 0.10.0 or above. If the first message in the segment does not 
> have a timetamp, we will fall back to use the wall clock time for log rolling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect ProcessorNode

2016-10-21 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4311:
--
Summary: Multi layer cache eviction causes forwarding to incorrect 
ProcessorNode   (was: Multi layer cache eviction causes forwarding to incorrect 
Processor Node )

> Multi layer cache eviction causes forwarding to incorrect ProcessorNode 
> 
>
> Key: KAFKA-4311
> URL: https://issues.apache.org/jira/browse/KAFKA-4311
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.1
>
>
> The two exceptions below were reported by Frank on the dev mailing list. 
> After investigation, the root cause is multiple cache evictions happening in 
> the same topology. 
> Given a topology like the one below. If a record arriving in `tableOne` 
> causes a cache eviction, it will trigger the `leftJoin` that will do a `get` 
> from `reducer-store`. If the key is not currently cached in `reducer-store`, 
> but is in the backing store, it will be put into the cache, and it may also 
> trigger an eviction. If it does trigger an eviction and the eldest entry is 
> dirty it will flush the dirty keys. It is at this point that the exception in 
> the comment happens (ClassCastException). This occurs because the 
> ProcessorContext is still set to the context of the `leftJoin` and the next 
> child in the topology is `mapValues`.
> We need to set the correct `ProcessorNode`, on the context,  in the 
> `ForwardingCacheFlushListener` prior to calling `context.forward`. We also 
> need to set remember to reset the `ProcessorNode` to the previous node once 
> `context.forward` has completed.
> {code}
> final KTable one = builder.table(Serdes.String(), 
> Serdes.String(), tableOne, tableOne);
> final KTable two = builder.table(Serdes.Long(), 
> Serdes.String(), tableTwo, tableTwo);
> final KTable reduce = two.groupBy(new 
> KeyValueMapper>() {
> @Override
> public KeyValue apply(final Long key, final String 
> value) {
> return new KeyValue<>(value, key);
> }
> }, Serdes.String(), Serdes.Long())
> .reduce(new Reducer() {
> @Override
> public Long apply(final Long value1, final Long value2) {
> return value1 + value2;
> }
> }, new Reducer() {
> @Override
> public Long apply(final Long value1, final Long value2) {
> return value1 - value2;
> }
> }, "reducer-store");
> one.leftJoin(reduce, new ValueJoiner() {
> @Override
> public String apply(final String value1, final Long value2) {
> return value1 + ":" + value2;
> }
> })
> .mapValues(new ValueMapper() {
> @Override
> public String apply(final String value) {
> return value;
> }
> });
> {code}
> This exception is actually a symptom of the exception reported below in the 
> comment. After the first exception is thrown, the StreamThread triggers a 
> shutdown that then throws this exception.
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
> Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
> entry is null
> at
> 

Re: Kafka KIP meeting Oct 19 at 11:00am PST

2016-10-21 Thread Jim Jagielski
++1

> On Oct 21, 2016, at 11:47 AM, Gwen Shapira  wrote:
> 
> I think this will be a good idea. It will separate an issue with concrete
> use-case and a major pain-point from a wider discussion about architectures
> and who is responsible for data formats.
> 
> 
> On Fri, Oct 21, 2016 at 12:57 AM, Michael Pearce 
> wrote:
> 
>> I had noted that what ever the solution having compaction based on null
>> payload was agreed isn't elegant.
>> 
>> Shall we raise another kip to : as discussed propose using a attribute bit
>> for delete/compaction flag as well/or instead of null value and updating
>> compaction logic to look at that delelete/compaction attribute
>> 
>> I believe this is less contentious, so that at least we get that done
>> alleviating some concerns whilst the below gets discussed further?
>> 
>> 
>> From: Jun Rao 
>> Sent: Wednesday, October 19, 2016 8:56:52 PM
>> To: dev@kafka.apache.org
>> Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST
>> 
>> The following are the notes from today's KIP discussion.
>> 
>> 
>>   - KIP-82 - add record header: We agreed that there are use cases for
>>   third-party vendors building tools around Kafka. We haven't reached the
>>   conclusion whether the added complexity justifies the use cases. We will
>>   follow up on the mailing list with use cases, container format people
>> have
>>   been using, and details on the proposal.
>> 
>> 
>> The video will be uploaded soon in https://cwiki.apache.org/
>> confluence/display/KAFKA/Kafka+Improvement+Proposals .
>> 
>> Thanks,
>> 
>> Jun
>> 
>> On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao  wrote:
>> 
>>> Hi, Everyone.,
>>> 
>>> We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am PST.
>>> If you plan to attend but haven't received an invite, please let me know.
>>> The following is the tentative agenda.
>>> 
>>> Agenda:
>>> KIP-82: add record header
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>> The information contained in this email is strictly confidential and for
>> the use of the addressee only, unless otherwise indicated. If you are not
>> the intended recipient, please do not read, copy, use or disclose to others
>> this message or any attachment. Please also notify the sender by replying
>> to this email or by telephone (+44(020 7896 0011) and then delete the email
>> and any copies of it. Opinions, conclusion (etc) that do not relate to the
>> official business of this company shall be understood as neither given nor
>> endorsed by it. IG is a trading name of IG Markets Limited (a company
>> registered in England and Wales, company number 04008957) and IG Index
>> Limited (a company registered in England and Wales, company number
>> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
>> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
>> Index Limited (register number 114059) are authorised and regulated by the
>> Financial Conduct Authority.
>> 
> 
> 
> 
> -- 
> *Gwen Shapira*
> Product Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter  | blog
> 



Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Harsha Chintalapani
Sriram,
   "Can the streaming platform exist without stream processing? - No.
Processing stream data again is a core part of streaming platform."

Yes, it can. There are no.of Stream processing frameworks out there, and
they all have integration into Kafka.
It doesn't need to be developed within Kafka.


"Can the platform exist without the rest proxy? - Yes. The proxy does not
complete the platform vision in anyway. It is just a good to have tool that
might be required by quite a few users and there is an active project that
works on this - https://github.com/confluentinc/kafka-rest;

The rest proxy is as important as any API. The vision that shown here
http://kafka.apache.org/intro
require users to write the producers and consumers to get their data into
and out of Kafka, without which having Streams or Connect won't help
anyone.
The rest proxy makes easier for users get their data into Kafka.
Adding the rest proxy to the project doesn't invalidate the current vision,
it only strengthens it.

Thanks,
Harsha




On Fri, Oct 21, 2016 at 2:31 PM Sriram Subramanian  wrote:

FWIW, Apache Kafka has evolved a lot from where it started. It did start as
a messaging system. Over time we realized that that the vision for Kafka is
to build a streaming platform and not just a messaging system. You can take
a look at the site for more description about what comprises the streaming
platform http://kafka.apache.org/ and http://kafka.apache.org/intro.

Can the streaming platform exist without Connect? - No. Data integration is
fundamental to building an end to end platform

Can the streaming platform exist without stream processing? - No.
Processing stream data again is a core part of streaming platform.

Can the streaming platform exist without clients? - We at least need one
client library to complete the platform. Our Java clients help us to
complete the platform story. The rest of the clients are built and
maintained outside the project.

Can the platform exist without the rest proxy? - Yes. The proxy does not
complete the platform vision in anyway. It is just a good to have tool that
might be required by quite a few users and there is an active project that
works on this - https://github.com/confluentinc/kafka-rest




On Fri, Oct 21, 2016 at 11:49 AM, Nacho Solis 
wrote:

> Are you saying Kafka REST is subjective but Kafka Streams and Kafka
Connect
> are not subjective?
>
> > "there are likely places that can live without a rest proxy"
>
> There are also places that can live without Kafka Streams and Kafka
> Connect.
>
> Nacho
>
> On Fri, Oct 21, 2016 at 11:17 AM, Jun Rao  wrote:
>
> > At the high level, I think ideally it makes sense to add a component to
> > Apache Kafka if (1) it's widely needed and (2) it needs tight
integration
> > with Kafka core. For Kafka Stream, we do expect stream processing will
be
> > used widely in the future. Implementation wise, Kafka Stream only
> supports
> > getting data from Kafka and leverages quite a few of the core
> > functionalities in Kafka core. For example, it uses customized rebalance
> > callback in the consumer and uses the compacted topic heavily. So,
having
> > Kafka Stream in the same repo makes it easier for testing when those
core
> > functionalities evolve over time. Kafka Connect is in the same
situation.
> >
> > For rest proxy, whether it's widely used or not is going to be a bit
> > subjective. However, there are likely places that can live without a
rest
> > proxy. The rest proxy is just a proxy for the regular clients and
doesn't
> > need to be tightly integrated with Kafka core. So, the case for
including
> > rest proxy in Apache Kafka is probably not as strong as Kafka Stream and
> > Kafka Connect.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Oct 20, 2016 at 11:28 PM, Michael Pearce 
> > wrote:
> >
> > > So from my reading essentially the first question needs to
answered/and
> > > voted on is:
> > >
> > > Is Apache Kafka Community only about the Core or does the apache
> > community
> > > also support some subprojects (and just we need some better way to
> manage
> > > this)
> > >
> > > If vote for Core only wins, then the following should be removed:
> > > Kafka Connect
> > > Kafka Stream
> > >
> > > If vote for Core only loses (aka we will support subprojects) then:
> > > We should look to add Kafka Rest
> > >
> > > And we should look to see how we can manage better govern and manage
> > > submodules.
> > >
> > > A good example which id propose here is how some other communities in
> > > Apache do this.
> > >
> > > Each Module has a Module Management Committee(MMC), this is like
almost
> > > the PMC but at a per module basis.
> > >
> > > This MMC should essentially hold the binding votes for that module.
> > > The MMC should be made up of a single representative from each
> > > organisation  (so no single organisation can fully veto the community
> it
> > > has to a 

Build failed in Jenkins: kafka-trunk-jdk7 #1649

2016-10-21 Thread Apache Jenkins Server
See 

Changes:

[jason] KAFKA-3144; Report members with no assigned partitions in

--
[...truncated 14356 lines...]
org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testSubscription PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldSetClusterMetadataOnAssignment STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
shouldSetClusterMetadataOnAssignment PASSED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithInternalTopics STARTED

org.apache.kafka.streams.processor.internals.StreamPartitionAssignorTest > 
testAssignWithInternalTopics PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testSpecificPartition STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testSpecificPartition PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
shouldThrowStreamsExceptionAfterMaxAttempts STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
shouldThrowStreamsExceptionAfterMaxAttempts PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
shouldRetryWhenTimeoutExceptionOccursOnSend STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
shouldRetryWhenTimeoutExceptionOccursOnSend PASSED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testStreamPartitioner STARTED

org.apache.kafka.streams.processor.internals.RecordCollectorTest > 
testStreamPartitioner PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldHaveCompactionPropSetIfSupplied STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldHaveCompactionPropSetIfSupplied PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldThrowIfNameIsNull STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldThrowIfNameIsNull PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldConfigureRetentionMsWithAdditionalRetentionWhenCompactAndDelete STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldConfigureRetentionMsWithAdditionalRetentionWhenCompactAndDelete PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldBeCompactedIfCleanupPolicyCompactOrCompactAndDelete STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldBeCompactedIfCleanupPolicyCompactOrCompactAndDelete PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldNotBeCompactedWhenCleanupPolicyIsDelete STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldNotBeCompactedWhenCleanupPolicyIsDelete PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldHavePropertiesSuppliedByUser STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldHavePropertiesSuppliedByUser PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldUseCleanupPolicyFromConfigIfSupplied STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldUseCleanupPolicyFromConfigIfSupplied PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldNotConfigureRetentionMsWhenCompact STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldNotConfigureRetentionMsWhenCompact PASSED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldNotConfigureRetentionMsWhenDelete STARTED

org.apache.kafka.streams.processor.internals.InternalTopicConfigTest > 
shouldNotConfigureRetentionMsWhenDelete PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotAllowNullProcessorSupplier STARTED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotAllowNullProcessorSupplier PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotSetApplicationIdToNull STARTED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotSetApplicationIdToNull PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > testSourceTopics 
STARTED

org.apache.kafka.streams.processor.TopologyBuilderTest > testSourceTopics PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotAllowNullNameWhenAddingSink STARTED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
shouldNotAllowNullNameWhenAddingSink PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
testNamedTopicMatchesAlreadyProvidedPattern STARTED

org.apache.kafka.streams.processor.TopologyBuilderTest > 
testNamedTopicMatchesAlreadyProvidedPattern PASSED

org.apache.kafka.streams.processor.TopologyBuilderTest > 

[jira] [Commented] (KAFKA-4099) Change the time based log rolling to only based on the message timestamp.

2016-10-21 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594273#comment-15594273
 ] 

Jiangjie Qin commented on KAFKA-4099:
-

[~junrao] Is the purpose of log.rolling.ms to avoid the case where low volume 
topics never get rolled? This is usually because we want to enforce the log 
retention. In the described use case, would it be  reasonable to turn off 
log.rolling.ms completely until the bootstrap finishes? It seems user actually 
do not care about the log rolling or even log retention in that case.

> Change the time based log rolling to only based on the message timestamp.
> -
>
> Key: KAFKA-4099
> URL: https://issues.apache.org/jira/browse/KAFKA-4099
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.0
>
>
> This is an issue introduced in KAFKA-3163. When partition relocation occurs, 
> the newly created replica may have messages with old timestamp and cause the 
> log segment rolling for each message. The fix is to change the log rolling 
> behavior to only based on the message timestamp when the messages are in 
> message format 0.10.0 or above. If the first message in the segment does not 
> have a timetamp, we will fall back to use the wall clock time for log rolling.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Michael Pearce
So from my reading essentially the first question needs to answered/and voted 
on is:

Is Apache Kafka Community only about the Core or does the apache community also 
support some subprojects (and just we need some better way to manage this)

If vote for Core only wins, then the following should be removed:
Kafka Connect
Kafka Stream

If vote for Core only loses (aka we will support subprojects) then:
We should look to add Kafka Rest

And we should look to see how we can manage better govern and manage submodules.

A good example which id propose here is how some other communities in Apache do 
this.

Each Module has a Module Management Committee(MMC), this is like almost the PMC 
but at a per module basis.

This MMC should essentially hold the binding votes for that module.
The MMC should be made up of a single representative from each organisation  
(so no single organisation can fully veto the community it has to a genuine 
consenus)
The MMC requires at least 3 members (so there cant be a tied vote on 2)
For a new Module to be added a MMC committee should be sought
A new Module is only capable of being added if the above requirements can be 
met (e.g. 3 people wishing to step up, from 3 organisations) so that only 
actively support modules would be added

The PMC reviews each module every 6months or Year. If MMC is inactive, a 
vote/call to find replacements if raised, if none are forthcoming dropping the 
MMC to less than 3 then the module moves to "the attic" (very much like apache 
attic but a little more aggressively)

This way the PMC does not need to micro manage every module
We only add modules where some amount of active support and maintenance and use 
is provided by the community
We have an automatic way to retire old or inactive projects.

Thoughts?
Mike



From: Harsha Ch 
Sent: Thursday, October 20, 2016 10:26 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-80: Kafka REST Server

Jay,
  REST API is something every user is in need of. If the argument is to
clone and write your  API, this will do a disservice to the users as they
now have to choose one vs. others instead of keeping one API that is
supported in Kafka community.

"Pre-emptively re-creating another
REST layer when it seems like we all quite agree on what needs to be done
and we have an existing code base for HTTP/Kafka access that is heavily
used in production seems quite silly."

   Exactly our point. Why can't we develop this in Apache Kafka
community? Instead of us open sourcing another GitHub project and creating
a divide in users and another version of API. Let's build this in Kafka
Community and use the governance model that is proven to provide vendor
free user driven consensus features. The argument that is adding this REST
server to Kafka will affect the agility of the project doesn't mak sense.

It looks like your argument is either we develop all these small tools or
none at all. We as a community need to look at supporting critical
tools/API. Instead of dividing this project into individual external
communities. We should build this as part of Kafka which best serves the
needs of users.
The Streams and Connect projects that were pushed into Kafka could
have been left in their own Github projects based on your arguments. What
about the REST API is so different that such that it should stay out of the
Kafka project? From my experience, more users are asking for the REST API.

Thanks,
Harsha





On Wed, Oct 12, 2016 at 8:03 AM Jay Kreps  wrote:

> I think the questions around governance make sense, I think we should
> really clarify that to make the process more clear so it can be fully
> inclusive.
>
> The idea that we should not collaborate on what is there now, though,
> because in the future we might disagree about direction does not really
> make sense to me. If in the future we disagree, that is the beauty of open
> source, you can always fork off a copy of the code and start an independent
> project either in Apache or elsewhere. Pre-emptively re-creating another
> REST layer when it seems like we all quite agree on what needs to be done
> and we have an existing code base for HTTP/kafka access that is heavily
> used in production seems quite silly.
>
> Let me give some background on how I at least think about these things.
> I've participated in open source projects out of LinkedIn via github as
> well as via the ASF. I don't think there is a "right" answer to how to do
> these but rather some tradeoffs. We thought about this quite a lot in the
> context of Kafka based on the experience with the Hadoop ecosystem as well
> as from other open source communities.
>
> There is a rich ecosystem around Kafka. Many of the projects are quite
> small--single clients or tools that do a single thing well--and almost none
> of them are top level apache projects. I don't think trying to force each
> of these to turn into 

[GitHub] kafka pull request #2045: Fix for kafka-4295: kafka-console-consumer.sh does...

2016-10-21 Thread amethystic
GitHub user amethystic reopened a pull request:

https://github.com/apache/kafka/pull/2045

Fix for kafka-4295: kafka-console-consumer.sh does not delete the temporary 
group in zookeeper

Since consumer stop logic and zk node removal code are in separate threads, 
so when two threads execute in an interleaving manner, persistent node 
'/consumers/' might not be removed for those console consumer 
groups which do not specify "group.id". This will pollute Zookeeper with lots 
of inactive console consumer offset information.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amethystic/kafka 
kafka-4295-consoleconsumer_failed_remove_zknodes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2045.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2045


commit 95779d7d509c012e9e3d1931fc3b6fd465075513
Author: huxi 
Date:   2016-10-20T09:03:31Z

Fix for kafka-4295: kafka-console-consumer.sh does not delete the temporary 
group in zookeeper
Since consumer stop logic and zk node removal code are in separate threads, 
so when two threads execute in an interleaving manner, persistent node 
'/consumers/' might not be removed for those console consumer 
groups which do not specify "group.id". This will pollute Zookeeper with lots 
of inactive console consumer offset information.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-trunk-jdk8 #996

2016-10-21 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-4309; Allow "pluggable" properties in KafkaService in System 
Tests

--
[...truncated 1066 lines...]
at 
org.gradle.internal.event.BroadcastDispatch.dispatch(BroadcastDispatch.java:79)
at 
org.gradle.internal.event.BroadcastDispatch.dispatch(BroadcastDispatch.java:30)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy50.output(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.results.StateTrackingTestResultProcessor.output(StateTrackingTestResultProcessor.java:87)
at 
org.gradle.api.internal.tasks.testing.results.AttachParentTestResultProcessor.output(AttachParentTestResultProcessor.java:48)
at sun.reflect.GeneratedMethodAccessor263.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.FailureHandlingDispatch.dispatch(FailureHandlingDispatch.java:29)
at 
org.gradle.internal.dispatch.AsyncDispatch.dispatchMessages(AsyncDispatch.java:132)
at 
org.gradle.internal.dispatch.AsyncDispatch.access$000(AsyncDispatch.java:33)
at 
org.gradle.internal.dispatch.AsyncDispatch$1.run(AsyncDispatch.java:72)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
at 
org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:154)
... 42 more
java.io.IOException: No space left on device
com.esotericsoftware.kryo.KryoException: java.io.IOException: No space left on 
device
at com.esotericsoftware.kryo.io.Output.flush(Output.java:156)
at com.esotericsoftware.kryo.io.Output.require(Output.java:134)
at com.esotericsoftware.kryo.io.Output.writeBoolean(Output.java:578)
at 
org.gradle.internal.serialize.kryo.KryoBackedEncoder.writeBoolean(KryoBackedEncoder.java:63)
at 
org.gradle.api.internal.tasks.testing.junit.result.TestOutputStore$Writer.onOutput(TestOutputStore.java:99)
at 
org.gradle.api.internal.tasks.testing.junit.result.TestReportDataCollector.onOutput(TestReportDataCollector.java:141)
at sun.reflect.GeneratedMethodAccessor261.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.event.AbstractBroadcastDispatch.dispatch(AbstractBroadcastDispatch.java:44)
at 
org.gradle.internal.event.BroadcastDispatch.dispatch(BroadcastDispatch.java:79)
at 
org.gradle.internal.event.BroadcastDispatch.dispatch(BroadcastDispatch.java:30)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy52.onOutput(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.results.TestListenerAdapter.output(TestListenerAdapter.java:56)
at sun.reflect.GeneratedMethodAccessor264.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.event.AbstractBroadcastDispatch.dispatch(AbstractBroadcastDispatch.java:44)
at 
org.gradle.internal.event.BroadcastDispatch.dispatch(BroadcastDispatch.java:79)
at 
org.gradle.internal.event.BroadcastDispatch.dispatch(BroadcastDispatch.java:30)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at 

[jira] [Created] (KAFKA-4330) Leader/Follower Quota Rates in JMX have ambiguous names

2016-10-21 Thread Ben Stopford (JIRA)
Ben Stopford created KAFKA-4330:
---

 Summary: Leader/Follower Quota Rates in JMX have ambiguous names
 Key: KAFKA-4330
 URL: https://issues.apache.org/jira/browse/KAFKA-4330
 Project: Kafka
  Issue Type: Bug
Reporter: Ben Stopford


The names of the JMX attribtutes for replication throttled rates don't include 
the name replication in them. This is a bit confusing. 

+MBean:kafka.server:type=LeaderReplication,name=byte-rate
 +MBean:kafka.server:type=FollowerReplication,name=byte-rate

Suggest ThrottledLeaderReplication etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


?????? [DISCUSS] KIP-68 Add a consumed log retention before log retention

2016-10-21 Thread ????????
Hi Becket,
It seems a good idea to have the topic to some specified groups to using 
the consumed log retention,  default is all the consumed groups, it is more 
flexible.
I have some comments for this :
1. What scenario is used to this configuration?


2. I think we can only support this configuration on the topic-level, so that 
all the brokers will do the same consumed log retention for the specified 
groups. Otherwise, some brokers
will have different behavior for the same topic.  And if this configuration is 
broker level, we have to add the group and topic Mapping relationship too,  the 
specified consumed group may not consuming all the topics in this brokers.


Thanks,
David






--  --
??: "Becket Qin";;
: 2016??10??17??(??) 3:27
??: "dev"; 

: Re: [DISCUSS] KIP-68 Add a consumed log retention before log retention



Hey David,

Thanks for replies to the questions.

I think one major thing still not clear at this point is that whether the
brokers will only apply the consumed log retention to a specific set of
interested consumer groups, or it does not have such a set of consumer
groups.

For example, for topic T, assume we know that there will be two downstream
consumer groups CG1 and CG2 consuming data from topic T. Will we add a
topic configurations such as
"log.retention.commitoffset.interested.groups=CG1,CG2" to topic T so that
the brokers only care about CG1 and CG2. The committed offsets of other
groups are not interested and won't have any impact on the committed offset
based log retention.

It seems the current proposal does not have an "interested consumer group
set" configuration, so that means any random consumer group may affect the
committed offset based log retention.

I think the committed offset based log retention seems more useful in cases
where we already know which consumer groups will be consuming from this
topic, so we will only wait for those consumer groups but ignore the
others. If a group will be consumed by many unknown or unpredictable
consumer groups, it seems the existing time based log retention is much
simple and clear enough. So I would argue we don't need to address the case
that some groups may come later in the committed offset based retention.

That said, there may still be value to keep the data for some time even
after all the interested consumer groups have consumed the messages. For
example, in a pipelined stream processing DAG, we may want to keep the data
of an intermediate topic for some time in case the job fails. So we can
resume from a previously succeeded stage instead of restart the entire
pipeline. Or we can use the intermediate topic for some debugging work.

Thanks,

Jiangjie (Becket) Qin



On Sun, Oct 16, 2016 at 2:15 AM,  <254479...@qq.com> wrote:

> Hi Dong,
> The KIP is used to solve both these 2 cases, we specify a small
> consumed log retention time to deleted the consumed data and avoid losing
> un-consumed data.
> And the specify a large force log retention time used as higher bound for
> the data.  I will update the KIP for this info.
> Another solution I think may be ok is to support an API to delete the
> inactive group?  If the group is in inactive, but it's commit offset is
> also in the __commit_offsets topic and
> stay in the offset cache,  we can delete it via this API.
>
>
> Thanks,
> David
>
>
> --  --
> ??: "Dong Lin";;
> : 2016??10??14??(??) 5:01
> ??: "dev";
>
> : Re: [DISCUSS] KIP-68 Add a consumed log retention before log retention
>
>
>
> Hi David,
>
> As explained in the motivation section of the KIP, the problem is that if
> log retention is too small, we may lose data; and if log retention is too
> large, then we waste disk space. Therefore, we need to solve one if the two
> problems -- allow data to be persisted longer for consumption if log
> retention is set too small, or allow data to be expired earlier if log
> retention is too large. I think the KIP probably needs to make this clear
> and explain which one is rejected and why. Note that the choice of the two
> affects the solution -- if we want to address the first problem then
> log.retention.ms should be used as lower bound on the actual retention
> time, and if we want to address the second problem then the
> log.retention.ms
> should be used as higher bound on the actual retention time.
>
> In both cases, we probably need to figure out a way to determine "active
> consumer group". Maybe we can compare the time-since-last-commit against a
> threshold to determine this. In addition, the threshold can be overridden
> either per-topic or per-groupId. If we go along this route, the rejected
> solution (per-topic vs. per-groupId) should probably be explained in the
> KIP.
>
>
> Thanks,
> Dong
>
>
>
> On Thu, Oct 13, 2016 

Jenkins build is back to normal : kafka-trunk-jdk7 #1647

2016-10-21 Thread Apache Jenkins Server
See 



?????? [DISCUSS] KIP-68 Add a consumed log retention before log retention

2016-10-21 Thread ????????
Hi Mayuresh,
Thanks for the reply: 
1.  In the log retention check schedule, the broker first find the all the 
consumed group which are consuming this topic, and query the commit offset of 
this consumed group for the topic
using the OffsetFetch API. And the min commit offset is the minimal commit 
offset between these commit offsets.


2.  If the console consumer reading and commit, its commit offset will be used 
to calculate the min commit offset for this topic.
We can avoid the random consumer using the method Becket suggested.


3. It will not delete the log immediately, the log will stay some time 
(retention.commitoffset.ms), and after that we only delete 
the log segments whose offsets are less than the min commit offset.  So the 
user can rewind its offset in the log.retention.ms.


Thanks,
David




--  --
??: "Mayuresh Gharat";;
: 2016??10??19??(??) 10:25
??: "dev"; 

: Re: [DISCUSS] KIP-68 Add a consumed log retention before log retention



Hi David,

Thanks for the KIP.

I had some questions/suggestions :

It would be great if you can explain with an example about how the min
offset for all the consumers will be calculated, in the KIP.
What I meant was, it would be great to understand with a pseudo
code/workflow if possible, how each broker knows all the consumers for the
given topic-partition and how the min is calculated.

Also it would be good to understand what happens if we start a console
consumer which would actually start reading from the beginning offset and
commit and crash immediately. How will the segments get deleted?

Will it delete all the log segments if all the consumers have read till
latest? If Yes, would we be able to handle a scenario were we say that user
can rewind its offset to reprocess the data since log.retention.ms might
not has reached.

Thanks,

Mayuresh

On Mon, Oct 17, 2016 at 12:27 AM, Becket Qin  wrote:

> Hey David,
>
> Thanks for replies to the questions.
>
> I think one major thing still not clear at this point is that whether the
> brokers will only apply the consumed log retention to a specific set of
> interested consumer groups, or it does not have such a set of consumer
> groups.
>
> For example, for topic T, assume we know that there will be two downstream
> consumer groups CG1 and CG2 consuming data from topic T. Will we add a
> topic configurations such as
> "log.retention.commitoffset.interested.groups=CG1,CG2" to topic T so that
> the brokers only care about CG1 and CG2. The committed offsets of other
> groups are not interested and won't have any impact on the committed offset
> based log retention.
>
> It seems the current proposal does not have an "interested consumer group
> set" configuration, so that means any random consumer group may affect the
> committed offset based log retention.
>
> I think the committed offset based log retention seems more useful in cases
> where we already know which consumer groups will be consuming from this
> topic, so we will only wait for those consumer groups but ignore the
> others. If a group will be consumed by many unknown or unpredictable
> consumer groups, it seems the existing time based log retention is much
> simple and clear enough. So I would argue we don't need to address the case
> that some groups may come later in the committed offset based retention.
>
> That said, there may still be value to keep the data for some time even
> after all the interested consumer groups have consumed the messages. For
> example, in a pipelined stream processing DAG, we may want to keep the data
> of an intermediate topic for some time in case the job fails. So we can
> resume from a previously succeeded stage instead of restart the entire
> pipeline. Or we can use the intermediate topic for some debugging work.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Sun, Oct 16, 2016 at 2:15 AM,  <254479...@qq.com> wrote:
>
> > Hi Dong,
> > The KIP is used to solve both these 2 cases, we specify a small
> > consumed log retention time to deleted the consumed data and avoid losing
> > un-consumed data.
> > And the specify a large force log retention time used as higher bound for
> > the data.  I will update the KIP for this info.
> > Another solution I think may be ok is to support an API to delete the
> > inactive group?  If the group is in inactive, but it's commit offset is
> > also in the __commit_offsets topic and
> > stay in the offset cache,  we can delete it via this API.
> >
> >
> > Thanks,
> > David
> >
> >
> > --  --
> > ??: "Dong Lin";;
> > : 2016??10??14??(??) 5:01
> > ??: "dev";
> >
> > : Re: [DISCUSS] KIP-68 Add a consumed log retention before log
> retention
> >
> >
> >
> > Hi David,
> >
> > As explained in the motivation section of the 

[jira] [Updated] (KAFKA-4311) Mutli layer cache eviction causes forwarding to incorrect Processor Node

2016-10-21 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4311:
--
Summary: Mutli layer cache eviction causes forwarding to incorrect 
Processor Node   (was: Exception in NamedCache.flush -  Key found in dirty key 
set, but entry is null )

> Mutli layer cache eviction causes forwarding to incorrect Processor Node 
> -
>
> Key: KAFKA-4311
> URL: https://issues.apache.org/jira/browse/KAFKA-4311
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.1
>
>
> Reported on the mailing list. Needs looking into how it could get in this 
> state.
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
> Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
> entry is null
> at
> org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
> at
> org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
> at
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
> at
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect Processor Node

2016-10-21 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4311:
--
Summary: Multi layer cache eviction causes forwarding to incorrect 
Processor Node   (was: Mutli layer cache eviction causes forwarding to 
incorrect Processor Node )

> Multi layer cache eviction causes forwarding to incorrect Processor Node 
> -
>
> Key: KAFKA-4311
> URL: https://issues.apache.org/jira/browse/KAFKA-4311
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.1
>
>
> Reported on the mailing list. Needs looking into how it could get in this 
> state.
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
> Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
> entry is null
> at
> org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
> at
> org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
> at
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
> at
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Ismael Juma
Hi Michael,

Can you please share which Apache projects have a MMC? I couldn't find
anything after a quick google.

Ismael

On Fri, Oct 21, 2016 at 7:28 AM, Michael Pearce 
wrote:

> So from my reading essentially the first question needs to answered/and
> voted on is:
>
> Is Apache Kafka Community only about the Core or does the apache community
> also support some subprojects (and just we need some better way to manage
> this)
>
> If vote for Core only wins, then the following should be removed:
> Kafka Connect
> Kafka Stream
>
> If vote for Core only loses (aka we will support subprojects) then:
> We should look to add Kafka Rest
>
> And we should look to see how we can manage better govern and manage
> submodules.
>
> A good example which id propose here is how some other communities in
> Apache do this.
>
> Each Module has a Module Management Committee(MMC), this is like almost
> the PMC but at a per module basis.
>
> This MMC should essentially hold the binding votes for that module.
> The MMC should be made up of a single representative from each
> organisation  (so no single organisation can fully veto the community it
> has to a genuine consenus)
> The MMC requires at least 3 members (so there cant be a tied vote on 2)
> For a new Module to be added a MMC committee should be sought
> A new Module is only capable of being added if the above requirements can
> be met (e.g. 3 people wishing to step up, from 3 organisations) so that
> only actively support modules would be added
>
> The PMC reviews each module every 6months or Year. If MMC is inactive, a
> vote/call to find replacements if raised, if none are forthcoming dropping
> the MMC to less than 3 then the module moves to "the attic" (very much like
> apache attic but a little more aggressively)
>
> This way the PMC does not need to micro manage every module
> We only add modules where some amount of active support and maintenance
> and use is provided by the community
> We have an automatic way to retire old or inactive projects.
>
> Thoughts?
> Mike
>
>
> 
> From: Harsha Ch 
> Sent: Thursday, October 20, 2016 10:26 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-80: Kafka REST Server
>
> Jay,
>   REST API is something every user is in need of. If the argument is to
> clone and write your  API, this will do a disservice to the users as they
> now have to choose one vs. others instead of keeping one API that is
> supported in Kafka community.
>
> "Pre-emptively re-creating another
> REST layer when it seems like we all quite agree on what needs to be done
> and we have an existing code base for HTTP/Kafka access that is heavily
> used in production seems quite silly."
>
>Exactly our point. Why can't we develop this in Apache Kafka
> community? Instead of us open sourcing another GitHub project and creating
> a divide in users and another version of API. Let's build this in Kafka
> Community and use the governance model that is proven to provide vendor
> free user driven consensus features. The argument that is adding this REST
> server to Kafka will affect the agility of the project doesn't mak sense.
>
> It looks like your argument is either we develop all these small tools or
> none at all. We as a community need to look at supporting critical
> tools/API. Instead of dividing this project into individual external
> communities. We should build this as part of Kafka which best serves the
> needs of users.
> The Streams and Connect projects that were pushed into Kafka could
> have been left in their own Github projects based on your arguments. What
> about the REST API is so different that such that it should stay out of the
> Kafka project? From my experience, more users are asking for the REST API.
>
> Thanks,
> Harsha
>
>
>
>
>
> On Wed, Oct 12, 2016 at 8:03 AM Jay Kreps  wrote:
>
> > I think the questions around governance make sense, I think we should
> > really clarify that to make the process more clear so it can be fully
> > inclusive.
> >
> > The idea that we should not collaborate on what is there now, though,
> > because in the future we might disagree about direction does not really
> > make sense to me. If in the future we disagree, that is the beauty of
> open
> > source, you can always fork off a copy of the code and start an
> independent
> > project either in Apache or elsewhere. Pre-emptively re-creating another
> > REST layer when it seems like we all quite agree on what needs to be done
> > and we have an existing code base for HTTP/kafka access that is heavily
> > used in production seems quite silly.
> >
> > Let me give some background on how I at least think about these things.
> > I've participated in open source projects out of LinkedIn via github as
> > well as via the ASF. I don't think there is a "right" answer to how to do
> > these but rather some tradeoffs. We thought about this 

[jira] [Updated] (KAFKA-4311) Multi layer cache eviction causes forwarding to incorrect Processor Node

2016-10-21 Thread Damian Guy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Guy updated KAFKA-4311:
--
Description: 
The two exceptions below were reported by Frank on the dev mailing list. After 
investigation, the root cause is multiple cache evictions happening in the same 
topology. 

{code}
final KTable one = builder.table(Serdes.String(), 
Serdes.String(), tableOne, tableOne);
final KTable two = builder.table(Serdes.Long(), 
Serdes.String(), tableTwo, tableTwo);
final KTable reduce = two.groupBy(new 
KeyValueMapper>() {
@Override
public KeyValue apply(final Long key, final String 
value) {
return new KeyValue<>(value, key);
}
}, Serdes.String(), Serdes.Long())
.reduce(new Reducer() {
@Override
public Long apply(final Long value1, final Long value2) {
return value1 + value2;
}
}, new Reducer() {
@Override
public Long apply(final Long value1, final Long value2) {
return value1 - value2;
}
}, "reducer-store");
one.leftJoin(reduce, new ValueJoiner() {
@Override
public String apply(final String value1, final Long value2) {
return value1 + ":" + value2;
}
})
.mapValues(new ValueMapper() {
@Override
public String apply(final String value) {
return value;
}
});
{code}



Reported on the mailing list. Needs looking into how it could get in this state.
[StreamThread-1] ERROR
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
[StreamThread-1] Failed to close state manager for StreamTask 0_0:
org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
to close state store addr-organization

at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
at
org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
at
org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
at
org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
at
org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)

at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
entry is null

at
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
at
org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
at
org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)

... 7 more


  was:
Reported on the mailing list. Needs looking into how it could get in this state.
[StreamThread-1] ERROR
org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
[StreamThread-1] Failed to close state manager for StreamTask 0_0:
org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
to close state store addr-organization

at
org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
at
org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
at
org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
at
org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
at
org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
at
org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)

at
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
entry is null

at
org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
at

[jira] [Commented] (KAFKA-4309) Allow "pluggable" properties in KafkaService in System Tests

2016-10-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594582#comment-15594582
 ] 

ASF GitHub Bot commented on KAFKA-4309:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2034


> Allow "pluggable" properties in KafkaService in System Tests
> 
>
> Key: KAFKA-4309
> URL: https://issues.apache.org/jira/browse/KAFKA-4309
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
> Fix For: 0.10.1.1, 0.10.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-4309) Allow "pluggable" properties in KafkaService in System Tests

2016-10-21 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4309.

   Resolution: Fixed
Fix Version/s: 0.10.1.1
   0.10.2.0

Issue resolved by pull request 2034
[https://github.com/apache/kafka/pull/2034]

> Allow "pluggable" properties in KafkaService in System Tests
> 
>
> Key: KAFKA-4309
> URL: https://issues.apache.org/jira/browse/KAFKA-4309
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
> Fix For: 0.10.2.0, 0.10.1.1
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-4328) The parameters for creating the ZkUtils object is reverse

2016-10-21 Thread Matt Wang (JIRA)
Matt Wang created KAFKA-4328:


 Summary: The parameters for creating the ZkUtils object is reverse
 Key: KAFKA-4328
 URL: https://issues.apache.org/jira/browse/KAFKA-4328
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.0.1, 0.10.0.0
 Environment: software platform
Reporter: Matt Wang


When creating the ZkUtils object, the parameters of zkSessionTimeOutMs and 
zkConnectionTimeoutMs is reverse. Though the default values of these parameters 
are both 6000, it will have some problem, especially when we want to reset 
these values. 
The pull requests address is:
https://github.com/apache/kafka/pull/1646



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Kafka KIP meeting Oct 19 at 11:00am PST

2016-10-21 Thread Michael Pearce
I had noted that what ever the solution having compaction based on null payload 
was agreed isn't elegant.

Shall we raise another kip to : as discussed propose using a attribute bit for 
delete/compaction flag as well/or instead of null value and updating compaction 
logic to look at that delelete/compaction attribute

I believe this is less contentious, so that at least we get that done 
alleviating some concerns whilst the below gets discussed further?


From: Jun Rao 
Sent: Wednesday, October 19, 2016 8:56:52 PM
To: dev@kafka.apache.org
Subject: Re: Kafka KIP meeting Oct 19 at 11:00am PST

The following are the notes from today's KIP discussion.


   - KIP-82 - add record header: We agreed that there are use cases for
   third-party vendors building tools around Kafka. We haven't reached the
   conclusion whether the added complexity justifies the use cases. We will
   follow up on the mailing list with use cases, container format people have
   been using, and details on the proposal.


The video will be uploaded soon in https://cwiki.apache.org/
confluence/display/KAFKA/Kafka+Improvement+Proposals .

Thanks,

Jun

On Mon, Oct 17, 2016 at 10:49 AM, Jun Rao  wrote:

> Hi, Everyone.,
>
> We plan to have a Kafka KIP meeting this coming Wednesday at 11:00am PST.
> If you plan to attend but haven't received an invite, please let me know.
> The following is the tentative agenda.
>
> Agenda:
> KIP-82: add record header
>
> Thanks,
>
> Jun
>
The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company registered in England 
and Wales, company number 04008957) and IG Index Limited (a company registered 
in England and Wales, company number 01190902). Registered address at Cannon 
Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited 
(register number 195355) and IG Index Limited (register number 114059) are 
authorised and regulated by the Financial Conduct Authority.


Re: [DISCUSS] KIP-80: Kafka REST Server

2016-10-21 Thread Edoardo Comar
Harsha Ch  wrote on 20/10/2016 22:26:53:

> The Streams and Connect projects that were pushed into Kafka 
could
> have been left in their own Github projects based on your arguments. 
What
> about the REST API is so different that such that it should stay out of 
the
> Kafka project? From my experience, more users are asking for the REST 
API.

Thanks Harsha
for keeping pushing this issue.
Your experience about users matches ours !

cheers
Edo

--
Edoardo Comar
IBM MessageHub
eco...@uk.ibm.com
IBM UK Ltd, Hursley Park, SO21 2JN

IBM United Kingdom Limited Registered in England and Wales with number 
741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 
3AU

Harsha Ch  wrote on 20/10/2016 22:26:53:

> From: Harsha Ch 
> To: dev@kafka.apache.org
> Date: 20/10/2016 22:32
> Subject: Re: [DISCUSS] KIP-80: Kafka REST Server
> 
> Jay,
>   REST API is something every user is in need of. If the argument is 
to
> clone and write your  API, this will do a disservice to the users as 
they
> now have to choose one vs. others instead of keeping one API that is
> supported in Kafka community.
> 
> "Pre-emptively re-creating another
> REST layer when it seems like we all quite agree on what needs to be 
done
> and we have an existing code base for HTTP/Kafka access that is heavily
> used in production seems quite silly."
> 
>Exactly our point. Why can't we develop this in Apache Kafka
> community? Instead of us open sourcing another GitHub project and 
creating
> a divide in users and another version of API. Let's build this in Kafka
> Community and use the governance model that is proven to provide vendor
> free user driven consensus features. The argument that is adding this 
REST
> server to Kafka will affect the agility of the project doesn't mak 
sense.
> 
> It looks like your argument is either we develop all these small tools 
or
> none at all. We as a community need to look at supporting critical
> tools/API. Instead of dividing this project into individual external
> communities. We should build this as part of Kafka which best serves the
> needs of users.
> The Streams and Connect projects that were pushed into Kafka 
could
> have been left in their own Github projects based on your arguments. 
What
> about the REST API is so different that such that it should stay out of 
the
> Kafka project? From my experience, more users are asking for the REST 
API.
> 
> Thanks,
> Harsha
> 
> 
> 
> 
> 
> On Wed, Oct 12, 2016 at 8:03 AM Jay Kreps  wrote:
> 
> > I think the questions around governance make sense, I think we should
> > really clarify that to make the process more clear so it can be fully
> > inclusive.
> >
> > The idea that we should not collaborate on what is there now, though,
> > because in the future we might disagree about direction does not 
really
> > make sense to me. If in the future we disagree, that is the beauty of 
open
> > source, you can always fork off a copy of the code and start an 
independent
> > project either in Apache or elsewhere. Pre-emptively re-creating 
another
> > REST layer when it seems like we all quite agree on what needs to be 
done
> > and we have an existing code base for HTTP/kafka access that is 
heavily
> > used in production seems quite silly.
> >
> > Let me give some background on how I at least think about these 
things.
> > I've participated in open source projects out of LinkedIn via github 
as
> > well as via the ASF. I don't think there is a "right" answer to how to 
do
> > these but rather some tradeoffs. We thought about this quite a lot in 
the
> > context of Kafka based on the experience with the Hadoop ecosystem as 
well
> > as from other open source communities.
> >
> > There is a rich ecosystem around Kafka. Many of the projects are quite
> > small--single clients or tools that do a single thing well--and almost 
none
> > of them are top level apache projects. I don't think trying to force 
each
> > of these to turn into independent Apache projects is necessarily the 
best
> > thing for the ecosystem.
> >
> > My observation of how this can go wrong is really what I think has 
happened
> > to the Hadoop ecosystem. There you see quite a zoo of projects which 
all
> > drift apart and don't quite work together well. Coordinating even 
simple
> > changes and standardization across these is exceptionally difficult. 
The
> > result is a bit of a mess for users--the pieces just don't really come
> > together very well. This makes sense for independent infrastructure 
systems
> > (Kudu vs HDFS) but I'm not at all convinced that doing this for every
> > little tool or helper library has lead to a desirable state. I think 
the
> > mode of operating where the Hadoop vendors spawn off a few new Apache
> > projects for each new product initiative, especially since often that
> > project is only 

Re: [ANNOUNCE] Apache Kafka 0.10.1.0 Released

2016-10-21 Thread Mickael Maison
Great job ! Thanks Jason and to everyone who contributed.

On Fri, Oct 21, 2016 at 6:09 AM, Manikumar  wrote:
> Thanks Jason. Great job.
>
> On Fri, Oct 21, 2016 at 10:29 AM, Becket Qin  wrote:
>
>> Congratulations! Great job, Jason.
>>
>> On Thu, Oct 20, 2016 at 11:57 AM, Vahid S Hashemian <
>> vahidhashem...@us.ibm.com> wrote:
>>
>> > +1
>> >
>> > Thanks Jason for driving the release. Great job!
>> >
>> > --Vahid
>> >
>> >
>> >
>> >
>> > From:   Sriram Subramanian 
>> > To: dev@kafka.apache.org
>> > Date:   10/20/2016 11:45 AM
>> > Subject:Re: [ANNOUNCE] Apache Kafka 0.10.1.0 Released
>> >
>> >
>> >
>> > Fantastic release and on time! Congratulations everyone for our first
>> time
>> > based release.
>> >
>> > On Thu, Oct 20, 2016 at 11:34 AM, Ismael Juma  wrote:
>> >
>> > > Thanks everyone for your contributions to this release. I think it's
>> > > impressive how much has been achieved in just under 5 months.
>> > >
>> > > And special thanks to Jason who has seemingly just been appointed RM
>> for
>> > > life? :)
>> > >
>> > > Ismael
>> > >
>> > > On Thu, Oct 20, 2016 at 7:22 PM, Guozhang Wang 
>> > wrote:
>> > >
>> > > > Thanks for driving the release Jason!
>> > > >
>> > > > You should just be the release person for all the future releases :P
>> > > >
>> > > >
>> > > > Guozhang
>> > > >
>> > > >
>> > > > On Thu, Oct 20, 2016 at 11:12 AM, Jason Gustafson 
>> > > wrote:
>> > > >
>> > > > > Had the wrong address for dev and users (haven't sent from this
>> > account
>> > > > > before).
>> > > > >
>> > > > > On Thu, Oct 20, 2016 at 11:05 AM, Jason Gustafson > >
>> > > > wrote:
>> > > > >
>> > > > > > The Apache Kafka community is pleased to announce the release for
>> > > > Apache
>> > > > > > Kafka 0.10.1.0. This is a feature release which includes the
>> > > completion
>> > > > > of
>> > > > > > 15 KIPs, over 200 bug fixes and improvements, and more than 500
>> > pull
>> > > > > > requests merged.
>> > > > > >
>> > > > > > All of the changes in this release can be found in the release
>> > notes:
>> > > > > > https://archive.apache.org/dist/kafka/0.10.1.0/RELEASE_
>> NOTES.html
>> > > > > >
>> > > > > > Apache Kafka is high-throughput, publish-subscribe messaging
>> > system
>> > > > > > rethought of as a distributed commit log.
>> > > > > >
>> > > > > > ** Fast => A single Kafka broker can handle hundreds of megabytes
>> > of
>> > > > > reads
>> > > > > > and writes per second from thousands of clients.
>> > > > > >
>> > > > > > ** Scalable => Kafka is designed to allow a single cluster to
>> > serve
>> > > as
>> > > > > the
>> > > > > > central data backbone for a large organization. It can be
>> > elastically
>> > > > and
>> > > > > > transparently expanded without downtime. Data streams are
>> > partitioned
>> > > > > > and spread over a cluster of machines to allow data streams
>> larger
>> > > than
>> > > > > > the capability of any single machine and to allow clusters of
>> > > > > co-ordinated
>> > > > > > consumers.
>> > > > > >
>> > > > > > ** Durable => Messages are persisted on disk and replicated
>> within
>> > > the
>> > > > > > cluster to prevent data loss. Each broker can handle terabytes of
>> > > > > messages
>> > > > > > without performance impact.
>> > > > > >
>> > > > > > ** Distributed by Design => Kafka has a modern cluster-centric
>> > design
>> > > > > that
>> > > > > > offers strong durability and fault-tolerance guarantees.
>> > > > > >
>> > > > > > You can download the source release from
>> > > > > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.1.0/k
>> > > > > > afka-0.10.1.0-src.tgz
>> > > > > >
>> > > > > > and binary releases from
>> > > > > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.1.0/k
>> > > > > > afka_2.11-0.10.1.0.tgz
>> > > > > > https://www.apache.org/dyn/closer.cgi?path=/kafka/0.10.1.0/k
>> > > > > > afka_2.10-0.10.1.0.tgz
>> > > > > >
>> > > > > > Thanks to the 115 contributors on this release!
>> > > > > >
>> > > > > > Alex Glikson, Alex Loddengaard, Alexey Ozeritsky, Alexey
>> > Romanchuk,
>> > > > > Andrea
>> > > > > > Cosentino, Andrew Otto, Andrey Neporada, Apurva Mehta, Arun
>> > > Mahadevan,
>> > > > > > Ashish Singh, Avi Flax, Ben Stopford, Bharat Viswanadham, Bill
>> > > Bejeck,
>> > > > > > Bryan Baugher, Chen Zhu, Christian Posta, Damian Guy, Dan
>> Norwood,
>> > > Dana
>> > > > > > Powers, David Chen, Derrick Or, Dong Lin, Dustin Cote, Edoardo
>> > Comar,
>> > > > > Elias
>> > > > > > Levy, Eno Thereska, Eric Wasserman, Ewen Cheslack-Postava, Filipe
>> > > > > Azevedo,
>> > > > > > Flavio Junqueira, Florian Hussonnois, Geoff Anderson, Grant
>> Henke,
>> > > Greg
>> > > > > > Fodor, Guozhang Wang, Gwen Shapira, Hans Deragon, Henry Cai,
>> > Ishita
>> > > > > > Mandhan, Ismael Juma, Jaikiran Pai, Jakub Dziworski, Jakub
>> > Pilimon,
>> > > > James
>> > > > > > Cheng, Jan 

[jira] [Created] (KAFKA-4329) The parameters for creating the ZkUtils object is reverse

2016-10-21 Thread Matt Wang (JIRA)
Matt Wang created KAFKA-4329:


 Summary: The parameters for creating the ZkUtils object is reverse
 Key: KAFKA-4329
 URL: https://issues.apache.org/jira/browse/KAFKA-4329
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.10.0.1, 0.10.0.0
 Environment: software platform
Reporter: Matt Wang
Priority: Critical


When creating the ZkUtils object, the parameters of zkSessionTimeOutMs and 
zkConnectionTimeoutMs is reverse. Though the default values of these parameters 
are both 6000, it will have some problem, especially when we want to reset 
these values. 
The pull requests address is:
https://github.com/apache/kafka/pull/1646



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4329) The parameters for creating the ZkUtils object is reverse

2016-10-21 Thread Matt Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Wang updated KAFKA-4329:
-
Description: 
When creating the ZkUtils object, the parameters of zkSessionTimeOutMs and 
zkConnectionTimeoutMs is reverse. Though the default values of these parameters 
are both 6000, it will have some problems, especially when we want to reset 
these values. 
The pull requests address is:
https://github.com/apache/kafka/pull/1646

  was:
When creating the ZkUtils object, the parameters of zkSessionTimeOutMs and 
zkConnectionTimeoutMs is reverse. Though the default values of these parameters 
are both 6000, it will have some problem, especially when we want to reset 
these values. 
The pull requests address is:
https://github.com/apache/kafka/pull/1646


> The parameters for creating the ZkUtils object is reverse
> -
>
> Key: KAFKA-4329
> URL: https://issues.apache.org/jira/browse/KAFKA-4329
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0, 0.10.0.1
> Environment: software platform
>Reporter: Matt Wang
>Priority: Critical
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When creating the ZkUtils object, the parameters of zkSessionTimeOutMs and 
> zkConnectionTimeoutMs is reverse. Though the default values of these 
> parameters are both 6000, it will have some problems, especially when we want 
> to reset these values. 
> The pull requests address is:
> https://github.com/apache/kafka/pull/1646



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-4329) The order of the parameters for creating the ZkUtils object is reversed

2016-10-21 Thread Matt Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Wang updated KAFKA-4329:
-
Summary: The order of the parameters for creating the ZkUtils object is 
reversed  (was: The parameters for creating the ZkUtils object is reverse)

> The order of the parameters for creating the ZkUtils object is reversed
> ---
>
> Key: KAFKA-4329
> URL: https://issues.apache.org/jira/browse/KAFKA-4329
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.10.0.0, 0.10.0.1
> Environment: software platform
>Reporter: Matt Wang
>Priority: Critical
>  Labels: patch
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When creating the ZkUtils object, the parameters of zkSessionTimeOutMs and 
> zkConnectionTimeoutMs is reverse. Though the default values of these 
> parameters are both 6000, it will have some problems, especially when we want 
> to reset these values. 
> The pull requests address is:
> https://github.com/apache/kafka/pull/1646



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #2034: KAFKA-4309: Allow "pluggable" properties in KafkaS...

2016-10-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2034


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #2050: Replaced unnecessary isDefined and get on option v...

2016-10-21 Thread himani1
GitHub user himani1 opened a pull request:

https://github.com/apache/kafka/pull/2050

Replaced unnecessary isDefined and get on option values with fold



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/himani1/kafka refactored_code

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2050


commit 0f47ff386f1bfeb17ef61ae9227a07b9558c875c
Author: himani1 <1himani.ar...@gmail.com>
Date:   2016-10-21T10:55:24Z

Replaced unnecessary isDefined and get on option values with fold.

commit 6ecace1d1d246f8009abc14b25d74b278a6082de
Author: himani1 <1himani.ar...@gmail.com>
Date:   2016-10-21T11:02:41Z

Error corrected in usage of fold




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4311) Exception in NamedCache.flush - Key found in dirty key set, but entry is null

2016-10-21 Thread Damian Guy (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15594852#comment-15594852
 ] 

Damian Guy commented on KAFKA-4311:
---

Hi Frank, 
if you have the time, would you mind trying to run your real data against this 
https://github.com/dguy/kafka/tree/kafka-4311 ?

I think i've found the problem, but i've yet to create a reliable test to 
reproduce it. However, hopefully that is not too far away. Anway, it'd be great 
if you could try it out and let me know how you get on. 

Thanks,
Damian

> Exception in NamedCache.flush -  Key found in dirty key set, but entry is 
> null 
> ---
>
> Key: KAFKA-4311
> URL: https://issues.apache.org/jira/browse/KAFKA-4311
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Damian Guy
>Assignee: Damian Guy
> Fix For: 0.10.1.1
>
>
> Reported on the mailing list. Needs looking into how it could get in this 
> state.
> [StreamThread-1] ERROR
> org.apache.kafka.streams.processor.internals.StreamThread - stream-thread
> [StreamThread-1] Failed to close state manager for StreamTask 0_0:
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_0] Failed
> to close state store addr-organization
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:342)
> at
> org.apache.kafka.streams.processor.internals.AbstractTask.closeStateManager(AbstractTask.java:121)
> at
> org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:341)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.performOnAllTasks(StreamThread.java:322)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.closeAllStateManagers(StreamThread.java:338)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdownTasksAndState(StreamThread.java:299)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.shutdown(StreamThread.java:262)
> at
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:245)
> Caused by: java.lang.IllegalStateException: Key found in dirty key set, but
> entry is null
> at
> org.apache.kafka.streams.state.internals.NamedCache.flush(NamedCache.java:112)
> at
> org.apache.kafka.streams.state.internals.ThreadCache.flush(ThreadCache.java:100)
> at
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.flush(CachingKeyValueStore.java:111)
> at
> org.apache.kafka.streams.state.internals.CachingKeyValueStore.close(CachingKeyValueStore.java:117)
> at
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.close(ProcessorStateManager.java:340)
> ... 7 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)