Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Jorge Esteban Quilcate Otoya
Congratulations Onur!!
On Tue, 7 Nov 2017 at 06:30, Jaikiran Pai  wrote:

> Congratulations Onur!
>
> -Jaikiran
>
>
> On 06/11/17 10:54 PM, Jun Rao wrote:
> > Hi, everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> > Karaman.
> >
> > Onur's most significant work is the improvement of Kafka controller,
> which
> > is the brain of a Kafka cluster. Over time, we have accumulated quite a
> few
> > correctness and performance issues in the controller. There have been
> > attempts to fix controller issues in isolation, which would make the code
> > base more complicated without a clear path of solving all problems. Onur
> is
> > the one who took a holistic approach, by first documenting all known
> > issues, writing down a new design, coming up with a plan to deliver the
> > changes in phases and executing on it. At this point, Onur has completed
> > the two most important phases: making the controller single threaded and
> > changing the controller to use the async ZK api. The former fixed
> multiple
> > deadlocks and race conditions. The latter significantly improved the
> > performance when there are many partitions. Experimental results show
> that
> > Onur's work reduced the controlled shutdown time by a factor of 100 times
> > and the controller failover time by a factor of 3 times.
> >
> > Congratulations, Onur!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
> >
>
>


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Jaikiran Pai

Congratulations Onur!

-Jaikiran


On 06/11/17 10:54 PM, Jun Rao wrote:

Hi, everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
Karaman.

Onur's most significant work is the improvement of Kafka controller, which
is the brain of a Kafka cluster. Over time, we have accumulated quite a few
correctness and performance issues in the controller. There have been
attempts to fix controller issues in isolation, which would make the code
base more complicated without a clear path of solving all problems. Onur is
the one who took a holistic approach, by first documenting all known
issues, writing down a new design, coming up with a plan to deliver the
changes in phases and executing on it. At this point, Onur has completed
the two most important phases: making the controller single threaded and
changing the controller to use the async ZK api. The former fixed multiple
deadlocks and race conditions. The latter significantly improved the
performance when there are many partitions. Experimental results show that
Onur's work reduced the controlled shutdown time by a factor of 100 times
and the controller failover time by a factor of 3 times.

Congratulations, Onur!

Thanks,

Jun (on behalf of the Apache Kafka PMC)





Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Onur Karaman
Thanks everyone!

On Mon, Nov 6, 2017 at 3:31 PM, Becket Qin  wrote:

> Congrats, Onur!
>
> On Mon, Nov 6, 2017 at 2:45 PM, James Cheng  wrote:
>
> > Congrats Onur! Well deserved!
> >
> > -James
> >
> > > On Nov 6, 2017, at 9:24 AM, Jun Rao  wrote:
> > >
> > > Hi, everyone,
> > >
> > > The PMC of Apache Kafka is pleased to announce a new Kafka committer
> Onur
> > > Karaman.
> > >
> > > Onur's most significant work is the improvement of Kafka controller,
> > which
> > > is the brain of a Kafka cluster. Over time, we have accumulated quite a
> > few
> > > correctness and performance issues in the controller. There have been
> > > attempts to fix controller issues in isolation, which would make the
> code
> > > base more complicated without a clear path of solving all problems.
> Onur
> > is
> > > the one who took a holistic approach, by first documenting all known
> > > issues, writing down a new design, coming up with a plan to deliver the
> > > changes in phases and executing on it. At this point, Onur has
> completed
> > > the two most important phases: making the controller single threaded
> and
> > > changing the controller to use the async ZK api. The former fixed
> > multiple
> > > deadlocks and race conditions. The latter significantly improved the
> > > performance when there are many partitions. Experimental results show
> > that
> > > Onur's work reduced the controlled shutdown time by a factor of 100
> times
> > > and the controller failover time by a factor of 3 times.
> > >
> > > Congratulations, Onur!
> > >
> > > Thanks,
> > >
> > > Jun (on behalf of the Apache Kafka PMC)
> >
> >
>


Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Jan Filipiak

I Aggree completely.

Exposing this information in a place where it has no _natural_ belonging 
might really be a bad blocker in the long run.


Concerning your first point. I would argue its not to hard to have a 
user keep track of these. If we still don't want the user
to keep track of these I would argue that all > projection only < 
transformations on a Source-backed KTable/KStream
could also return a Ktable/KStream instance of the type we return from 
the topology builder.
Only after any operation that exceeds projection or filter one would 
return a KTable not granting access to this any longer.


Even then its difficult already: I never ran a topology with caching but 
I am not even 100% sure what the record Context means behind
a materialized KTable with Caching? Topic and Partition are probably 
with some reasoning but offset is probably only the offset causing the 
flush?

So one might aswell think to drop offsets from this RecordContext.

Best Jan







On 07.11.2017 03:18, Guozhang Wang wrote:

Regarding the API design (the proposed set of overloads v.s. one overload
on #map to enrich the record), I think what we have represents a good
trade-off between API succinctness and user convenience: on one hand we
definitely want to keep as fewer overloaded functions as possible. But on
the other hand if we only do that in, say, the #map() function then this
enrichment could be an overkill: think of a topology that has 7 operators
in a chain, where users want to access the record context on operator #2
and #6 only, with the "enrichment" manner they need to do the enrichment on
operator #2 and keep it that way until #6. In addition, the RecordContext
fields (topic, offset, etc) are really orthogonal to the key-value payloads
themselves, so I think separating them into this object is a cleaner way.

Regarding the RecordContext inheritance, this is actually a good point that
have not been discussed thoroughly before. Here are my my two cents: one
natural way would be to inherit the record context from the "triggering"
record, for example in a join operator, if the record from stream A
triggers the join then the record context is inherited from with that
record. This is also aligned with the lower-level PAPI interface. A counter
argument, though, would be that this is sort of leaking the internal
implementations of the DSL, so that moving forward if we did some
refactoring to our join implementations so that the triggering record can
change, the RecordContext would also be different. I do not know how much
it would really affect end users, but would like to hear your opinions.

Agreed to 100% exposing this information



Guozhang


On Mon, Nov 6, 2017 at 1:00 PM, Jeyhun Karimov  wrote:


Hi Jan,

Sorry for late reply.


The API Design doesn't look appealing


In terms of API design we tried to preserve the java functional interfaces.
We applied the same set of rich methods for KTable to make it compatible
with the rest of overloaded APIs.

It should be 100% sufficient to offer a KTable + KStream that is directly

feed from a topic with 1 additional overload for the #map() methods to
cover every usecase while keeping the API in a way better state.


- IMO this seems a workaround, rather than a direct solution.

Perhaps we should continue this discussion in DISCUSS thread.


Cheers,
Jeyhun


On Mon, Nov 6, 2017 at 9:14 PM Jan Filipiak 
wrote:


Hi.

I do understand that it might come in Handy.
  From my POV in any relational algebra this is only a projection.
Currently we hide these "fields" that come with the input record.
It should be 100% sufficient to offer a KTable + KStream that is directly
feed from a topic with 1 additional overload for the #map() methods to
cover every usecase while keeping the API in a way better state.

best Jan

On 06.11.2017 17:52, Matthias J. Sax wrote:

Jan,

I understand what you are saying. However, having a RecordContext is
super useful for operations that are applied to input topic. Many users
requested this feature -- it's much more convenient that falling back

to

transform() to implement a a filter() for example that want to access
some meta data.

Because we cannot distinguish different "origins" of a KStream/KTable,

I

am not sure if there would be a better way to do this. The only
"workaround" I see, is to have two KStream/KTable interfaces each and

we

would use the first one for KStream/KTable with "proper" RecordContext.
But this does not seem to be a good solution either.

Note, a KTable can also be read directly from a topic, I agree that
using RecordContext on a KTable that is the result of an aggregation is
questionable. But I don't see a reason to down vote the KIP for this

reason.

WDYT about this?


-Matthias

On 11/1/17 10:19 PM, Jan Filipiak wrote:

-1 non binding

I don't get the motivation.
In 80% of my DSL processors there is no such thing as a reasonable
RecordContext.
After a join  the record I 

Re: [DISCUSS] KIP-219 - Improve Quota Communication

2017-11-06 Thread Becket Qin
Hi Jun,

Hmm, even if a connection is closed by the client when the channel is
muted. After the channel is unmuted, it seems Selector.select() will detect
this and close the socket.
It is true that before the channel is unmuted the socket will be in a
CLOSE_WAIT state though. So having an arbitrarily long muted duration may
indeed cause problem.

Thanks,

Jiangjie (Becket) Qin

On Mon, Nov 6, 2017 at 7:22 PM, Becket Qin  wrote:

> Hi Rajini,
>
> Thanks for the detail explanation. Please see the reply below:
>
> 2. Limiting the throttle time to connection.max.idle.ms on the broker
> side is probably fine. However, clients may have a different configuration
> of connection.max.idle.ms and still reconnect before the throttle time
> (which is the server side connection.max.idle.ms). It seems another back
> door for quota.
>
> 3. I agree we could just mute the server socket until
> connection.max.idle.ms if the massive CLOSE_WAIT is a big issue. This
> helps guarantee only connection_rate * connection.max.idle.ms sockets
> will be in CLOSE_WAIT state. For cooperative clients, unmuting the socket
> will not have negative impact.
>
> 4. My concern for capping the throttle time to metrics.window.ms is that
> we will not be able to enforce quota effectively. It might be useful to
> explain this with a real example we are trying to solve. We have a
> MapReduce job pushing data to a Kafka cluster. The MapReduce job has
> hundreds of producers and each of them sends a normal sized ProduceRequest
> (~2 MB) to each of the brokers in the cluster. Apparently the client id
> will ran out of bytes quota pretty quickly, and the broker started to
> throttle the producers. The throttle time could actually be pretty long
> (e.g. a few minute). At that point, request queue time on the brokers was
> around 30 seconds. After that, a bunch of producer hit request.timeout.ms
> and reconnected and sent the next request again, which causes another spike
> and a longer queue.
>
> In the above case, unless we set the quota window to be pretty big, we
> will not be able to enforce the quota. And if we set the window size to a
> large value, the request might be throttled for longer than
> connection.max.idle.ms.
>
> > We need a solution to improve flow control for well-behaved clients
> > which currently rely entirely on broker's throttling. The KIP addresses
> > this using co-operative clients that sleep for an unbounded throttle
> time.
> > I feel this is not ideal since the result is traffic with a lot of
> spikes.
> > Feedback from brokers to enable flow control in the client is a good
> idea,
> > but clients with excessive throttle times should really have been
> > configured with smaller batch sizes.
>
> This is not really about a single producer with large size, it is a lot of
> small producers talking to the client at the same time. Reducing the batch
> size does not help much here. Also note that after the spike traffic at
> the very beginning, the throttle time of the ProduceRequests processed
> later are actually going to be increasing (for example, the first throttled
> request will be throttled for 1 second, the second throttled request will
> be throttled for 10 sec, etc.). Due to the throttle time variation, if
> every producer honors the throttle time, there will not be a next spike
> after the first produce.
>
> > We need a solution to enforce smaller quotas to protect the broker
> > from misbehaving clients. The KIP addresses this by muting channels for
> an
> > unbounded time. This introduces problems of channels in CLOSE_WAIT. And
> > doesn't really solve all issues with misbehaving clients since new
> > connections can be created to bypass quotas.
>
> Our current quota only protects cooperating clients because our quota is
> really throttling the NEXT request after process a request even if this
> request itself has already violated quota. The misbehaving client are not
> protected at all with the current quota mechanism. Like you mentioned, a
> connection quota is required. We have been discussing about this at
> LinkedIn for some time. Doing it right requires some major changes such as
> partially reading a request to identify the client id at network level and
> disconnect misbehaving clients.
>
> While handling misbehaving clients is important, we are not trying to
> address that in this KIP. This KIP is trying to improve the communication
> with good clients. Muting the channel is more of a migration plan so that
> we don't have regression on the old innocent (but good) clients.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Nov 6, 2017 at 1:33 PM, Jun Rao  wrote:
>
>> Hi, Jiangjie,
>>
>> 3. If a client closes a socket connection, typically the server only finds
>> this out by reading from the channel and getting a negative size during
>> the
>> read. So, if a channel is muted by the server, the server won't be able to
>> detect the closing of the connection by the 

Re: [DISCUSS] KIP-219 - Improve Quota Communication

2017-11-06 Thread Becket Qin
Hi Rajini,

Thanks for the detail explanation. Please see the reply below:

2. Limiting the throttle time to connection.max.idle.ms on the broker side
is probably fine. However, clients may have a different configuration of
connection.max.idle.ms and still reconnect before the throttle time (which
is the server side connection.max.idle.ms). It seems another back door for
quota.

3. I agree we could just mute the server socket until connection.max.idle.ms
if the massive CLOSE_WAIT is a big issue. This helps guarantee only
connection_rate * connection.max.idle.ms sockets will be in CLOSE_WAIT
state. For cooperative clients, unmuting the socket will not have negative
impact.

4. My concern for capping the throttle time to metrics.window.ms is that we
will not be able to enforce quota effectively. It might be useful to
explain this with a real example we are trying to solve. We have a
MapReduce job pushing data to a Kafka cluster. The MapReduce job has
hundreds of producers and each of them sends a normal sized ProduceRequest
(~2 MB) to each of the brokers in the cluster. Apparently the client id
will ran out of bytes quota pretty quickly, and the broker started to
throttle the producers. The throttle time could actually be pretty long
(e.g. a few minute). At that point, request queue time on the brokers was
around 30 seconds. After that, a bunch of producer hit request.timeout.ms
and reconnected and sent the next request again, which causes another spike
and a longer queue.

In the above case, unless we set the quota window to be pretty big, we will
not be able to enforce the quota. And if we set the window size to a large
value, the request might be throttled for longer than connection.max.idle.ms
.

> We need a solution to improve flow control for well-behaved clients
> which currently rely entirely on broker's throttling. The KIP addresses
> this using co-operative clients that sleep for an unbounded throttle time.
> I feel this is not ideal since the result is traffic with a lot of spikes.
> Feedback from brokers to enable flow control in the client is a good idea,
> but clients with excessive throttle times should really have been
> configured with smaller batch sizes.

This is not really about a single producer with large size, it is a lot of
small producers talking to the client at the same time. Reducing the batch
size does not help much here. Also note that after the spike traffic at the
very beginning, the throttle time of the ProduceRequests processed later
are actually going to be increasing (for example, the first throttled
request will be throttled for 1 second, the second throttled request will
be throttled for 10 sec, etc.). Due to the throttle time variation, if
every producer honors the throttle time, there will not be a next spike
after the first produce.

> We need a solution to enforce smaller quotas to protect the broker
> from misbehaving clients. The KIP addresses this by muting channels for an
> unbounded time. This introduces problems of channels in CLOSE_WAIT. And
> doesn't really solve all issues with misbehaving clients since new
> connections can be created to bypass quotas.

Our current quota only protects cooperating clients because our quota is
really throttling the NEXT request after process a request even if this
request itself has already violated quota. The misbehaving client are not
protected at all with the current quota mechanism. Like you mentioned, a
connection quota is required. We have been discussing about this at
LinkedIn for some time. Doing it right requires some major changes such as
partially reading a request to identify the client id at network level and
disconnect misbehaving clients.

While handling misbehaving clients is important, we are not trying to
address that in this KIP. This KIP is trying to improve the communication
with good clients. Muting the channel is more of a migration plan so that
we don't have regression on the old innocent (but good) clients.

Thanks,

Jiangjie (Becket) Qin


On Mon, Nov 6, 2017 at 1:33 PM, Jun Rao  wrote:

> Hi, Jiangjie,
>
> 3. If a client closes a socket connection, typically the server only finds
> this out by reading from the channel and getting a negative size during the
> read. So, if a channel is muted by the server, the server won't be able to
> detect the closing of the connection by the client after the socket is
> unmuted. That's probably what Rajini wants to convey.
>
> Thanks,
>
> Jun
>
> On Fri, Nov 3, 2017 at 8:11 PM, Becket Qin  wrote:
>
> > Thanks Rajini.
> >
> > 1. Good point. We do need to bump up the protocol version so that the new
> > clients do not wait for another throttle time when they are talking to
> old
> > brokers. I'll update the KIP.
> >
> > 2. That is true. But the client was not supposed to send request to the
> > broker during that period anyways. So detecting the broker failure later
> > seems fine?
> >
> > 3. Wouldn't the CLOSE_WAIT handler 

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Guozhang Wang
Regarding the API design (the proposed set of overloads v.s. one overload
on #map to enrich the record), I think what we have represents a good
trade-off between API succinctness and user convenience: on one hand we
definitely want to keep as fewer overloaded functions as possible. But on
the other hand if we only do that in, say, the #map() function then this
enrichment could be an overkill: think of a topology that has 7 operators
in a chain, where users want to access the record context on operator #2
and #6 only, with the "enrichment" manner they need to do the enrichment on
operator #2 and keep it that way until #6. In addition, the RecordContext
fields (topic, offset, etc) are really orthogonal to the key-value payloads
themselves, so I think separating them into this object is a cleaner way.

Regarding the RecordContext inheritance, this is actually a good point that
have not been discussed thoroughly before. Here are my my two cents: one
natural way would be to inherit the record context from the "triggering"
record, for example in a join operator, if the record from stream A
triggers the join then the record context is inherited from with that
record. This is also aligned with the lower-level PAPI interface. A counter
argument, though, would be that this is sort of leaking the internal
implementations of the DSL, so that moving forward if we did some
refactoring to our join implementations so that the triggering record can
change, the RecordContext would also be different. I do not know how much
it would really affect end users, but would like to hear your opinions.


Guozhang


On Mon, Nov 6, 2017 at 1:00 PM, Jeyhun Karimov  wrote:

> Hi Jan,
>
> Sorry for late reply.
>
>
> The API Design doesn't look appealing
>
>
> In terms of API design we tried to preserve the java functional interfaces.
> We applied the same set of rich methods for KTable to make it compatible
> with the rest of overloaded APIs.
>
> It should be 100% sufficient to offer a KTable + KStream that is directly
> > feed from a topic with 1 additional overload for the #map() methods to
> > cover every usecase while keeping the API in a way better state.
>
>
> - IMO this seems a workaround, rather than a direct solution.
>
> Perhaps we should continue this discussion in DISCUSS thread.
>
>
> Cheers,
> Jeyhun
>
>
> On Mon, Nov 6, 2017 at 9:14 PM Jan Filipiak 
> wrote:
>
> > Hi.
> >
> > I do understand that it might come in Handy.
> >  From my POV in any relational algebra this is only a projection.
> > Currently we hide these "fields" that come with the input record.
> > It should be 100% sufficient to offer a KTable + KStream that is directly
> > feed from a topic with 1 additional overload for the #map() methods to
> > cover every usecase while keeping the API in a way better state.
> >
> > best Jan
> >
> > On 06.11.2017 17:52, Matthias J. Sax wrote:
> > > Jan,
> > >
> > > I understand what you are saying. However, having a RecordContext is
> > > super useful for operations that are applied to input topic. Many users
> > > requested this feature -- it's much more convenient that falling back
> to
> > > transform() to implement a a filter() for example that want to access
> > > some meta data.
> > >
> > > Because we cannot distinguish different "origins" of a KStream/KTable,
> I
> > > am not sure if there would be a better way to do this. The only
> > > "workaround" I see, is to have two KStream/KTable interfaces each and
> we
> > > would use the first one for KStream/KTable with "proper" RecordContext.
> > > But this does not seem to be a good solution either.
> > >
> > > Note, a KTable can also be read directly from a topic, I agree that
> > > using RecordContext on a KTable that is the result of an aggregation is
> > > questionable. But I don't see a reason to down vote the KIP for this
> > reason.
> > >
> > > WDYT about this?
> > >
> > >
> > > -Matthias
> > >
> > > On 11/1/17 10:19 PM, Jan Filipiak wrote:
> > >> -1 non binding
> > >>
> > >> I don't get the motivation.
> > >> In 80% of my DSL processors there is no such thing as a reasonable
> > >> RecordContext.
> > >> After a join  the record I am processing belongs to at least 2 topics.
> > >> After a Group by the record I am processing was created from multiple
> > >> offsets.
> > >>
> > >> The API Design doesn't look appealing
> > >>
> > >> Best Jan
> > >>
> > >>
> > >>
> > >> On 01.11.2017 22:02, Jeyhun Karimov wrote:
> > >>> Dear community,
> > >>>
> > >>> It seems the discussion for KIP-159 [1] converged finally. I would
> > >>> like to
> > >>> initiate voting for the particular KIP.
> > >>>
> > >>>
> > >>>
> > >>> [1]
> > >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 159%3A+Introducing+Rich+functions+to+Streams
> > >>>
> > >>>
> > >>> Cheers,
> > >>> Jeyhun
> > >>>
> >
> >
>



-- 
-- Guozhang


Jenkins build is back to normal : kafka-trunk-jdk7 #2951

2017-11-06 Thread Apache Jenkins Server
See 




[GitHub] kafka pull request #4186: [WIP] KAFKA-6179: Clear min timestamp tracker upon...

2017-11-06 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/4186

[WIP] KAFKA-6179: Clear min timestamp tracker upon partition queue cleanup



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
K6179-cleanup-timestamp-tracker-on-clear

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4186


commit c92eaa566334e8fe6f20a37f7196db807c4b01b2
Author: Guozhang Wang 
Date:   2017-11-07T01:46:14Z

clear min timestamp tracker upon partition queue cleanup




---


Build failed in Jenkins: kafka-trunk-jdk8 #2193

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[okaraman] KAFKA-5894; add the notion of max inflight requests to async

--
[...truncated 381.03 KB...]

kafka.security.auth.ResourceTypeTest > testJavaConversions STARTED

kafka.security.auth.ResourceTypeTest > testJavaConversions PASSED

kafka.security.auth.ResourceTypeTest > testFromString STARTED

kafka.security.auth.ResourceTypeTest > testFromString PASSED

kafka.security.auth.OperationTest > testJavaConversions STARTED

kafka.security.auth.OperationTest > testJavaConversions PASSED

kafka.security.auth.AclTest > testAclJsonConversion STARTED

kafka.security.auth.AclTest > testAclJsonConversion PASSED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled STARTED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled PASSED

kafka.security.auth.ZkAuthorizationTest > testZkUtils STARTED

kafka.security.auth.ZkAuthorizationTest > testZkUtils PASSED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testZkMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testChroot STARTED

kafka.security.auth.ZkAuthorizationTest > testChroot PASSED

kafka.security.auth.ZkAuthorizationTest > testDelete STARTED

kafka.security.auth.ZkAuthorizationTest > testDelete PASSED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive STARTED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED


Build failed in Jenkins: kafka-trunk-jdk9 #174

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[okaraman] KAFKA-5894; add the notion of max inflight requests to async

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H25 (couchdbtest ubuntu xenial) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 58138126ce93efc48291fe444f554eabdf4a5609 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 58138126ce93efc48291fe444f554eabdf4a5609
Commit message: "KAFKA-5894; add the notion of max inflight requests to async 
ZooKeeperClient"
 > git rev-list 6f96d7f1735c53b97ebac43168ded64007277beb # timeout=10
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
[kafka-trunk-jdk9] $ /bin/bash -xe /tmp/jenkins1152440614654967419.sh
+ rm -rf 
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2/bin/gradle

FAILURE: Build failed with an exception.

* What went wrong:
Could not determine java version from '9.0.1'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 10 days old

Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
Not sending mail to unregistered user ism...@juma.me.uk
Not sending mail to unregistered user rajinisiva...@googlemail.com
Not sending mail to unregistered user wangg...@gmail.com
Not sending mail to unregistered user becket@gmail.com


[jira] [Created] (KAFKA-6179) RecordQueue.clear() does not clear MinTimestampTracker's maintained list

2017-11-06 Thread Guozhang Wang (JIRA)
Guozhang Wang created KAFKA-6179:


 Summary: RecordQueue.clear() does not clear MinTimestampTracker's 
maintained list
 Key: KAFKA-6179
 URL: https://issues.apache.org/jira/browse/KAFKA-6179
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 1.0.0, 0.10.2.1, 0.11.0.1
Reporter: Guozhang Wang
Assignee: Guozhang Wang


When a stream task is being suspended, in {{RecordQueue.clear()}} we will clear 
the {{ArrayDeque fifoQueue}}, but we do not clear the {{MinTimestampTracker}}'s 
maintained list. As a result if the task gets resumed we will live with an 
empty {{fifoQueue}} while a populated {{tracker}}. And hence we use reference 
equality to check if the smallest timestamp record can be popped, we would 
never be able to pop any more records and hence effectively leading to memory 
leak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #3860: KAFKA-5894: add the notion of max inflight request...

2017-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/3860


---


Re: [DISCUSS] KIP-220: Add AdminClient into Kafka Streams' ClientSupplier

2017-11-06 Thread Guozhang Wang
Hello folks,

One update I'd like to propose regarding "compatibility checking":
currently we create a single StreamsKafkaClient at the beginning to issue
an ApiVersionsRequest to a random broker and then check on its versions,
and fail if it does not satisfy the version (0.10.1+ without EOS, 0.11.0
with EOS); after this check we throw this client away. My original plan is
to replace this logic with the NetworkClient's own apiVersions, but after
some digging while working on the PR I found that exposing this apiVersions
variable from NetworkClient through AdminClient is not very straight
forward, plus it would need an API change on the AdminClient itself as well
to expose the versions information.

On the other hand, this one-time compatibility checking's benefit may be
not significant: 1) it asks a random broker upon starting up, and hence
does not guarantee all broker's support the corresponding API versions at
that time; 2) brokers can still be upgraded / downgraded after the streams
app is up and running, and hence we still need to handle
UnsupportedVersionExceptions thrown from the producer / consumer / admin
client during the runtime anyways.

So I'd like to propose two options in this KIP:

1) we remove this one-time compatibility check on Streams starting up in
this KIP, and solely rely on handling producer / consumer / admin client's
API UnsupportedVersionException throwable exceptions. Please share your
thoughts about this.

2) we create a one-time NetworkClient upon starting up, send the
ApiVersionsRequest and get the response and do the checking; after that
throw this client away.

Please let me know what do you think. Thanks!


Guozhang


On Mon, Nov 6, 2017 at 7:56 AM, Matthias J. Sax 
wrote:

> Thanks for the update and clarification.
>
> Sounds good to me :)
>
>
> -Matthias
>
>
>
> On 11/6/17 12:16 AM, Guozhang Wang wrote:
> > Thanks Matthias,
> >
> > 1) Updated the KIP page to include KAFKA-6126.
> > 2) For passing configs, I agree, will make a pass over the existing
> configs
> > passed to StreamsKafkaClient and update the wiki page accordingly, to
> > capture all changes that would happen for the replacement in this single
> > KIP.
> > 3) For internal topic purging, I'm not sure if we need to include this
> as a
> > public change since internal topics are meant for abstracted away from
> the
> > Streams users; they should not leverage such internal topics elsewhere
> > themselves. The only thing I can think of is for Kafka operators this
> would
> > mean that such internal topics would be largely reduced in their
> footprint,
> > but that would not be needed in the KIP as well.
> >
> >
> > Guozhang
> >
> >
> > On Sat, Nov 4, 2017 at 9:00 AM, Matthias J. Sax 
> > wrote:
> >
> >> I like this KIP. Can you also link to
> >> https://issues.apache.org/jira/browse/KAFKA-6126 in the KIP?
> >>
> >> What I am wondering though: if we start to partially (ie, step by step)
> >> replace the existing StreamsKafkaClient with Java AdminClient, don't we
> >> need more KIPs? For example, if we use purge-api for internal topics, it
> >> seems like a change that requires a KIP. Similar for passing configs --
> >> the old client might have different config than the old client? Can we
> >> double check this?
> >>
> >> Thus, it might make sense to replace the old client with the new one in
> >> one shot.
> >>
> >>
> >> -Matthias
> >>
> >> On 11/4/17 4:01 AM, Ted Yu wrote:
> >>> Looks good overall.
> >>>
> >>> bq. the creation within StreamsPartitionAssignor
> >>>
> >>> Typo above: should be StreamPartitionAssignor
> >>>
> >>> On Fri, Nov 3, 2017 at 4:49 PM, Guozhang Wang 
> >> wrote:
> >>>
>  Hello folks,
> 
>  I have filed a new KIP on adding AdminClient into Streams for internal
>  topic management.
> 
>  Looking for feedback on
> 
>  *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>  220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier
>    220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier>*
> 
>  --
>  -- Guozhang
> 
> >>>
> >>
> >>
> >
> >
>
>


-- 
-- Guozhang


[GitHub] kafka pull request #4185: MINOR: Update Scala 2.11 to 2.11.12

2017-11-06 Thread ijuma
GitHub user ijuma opened a pull request:

https://github.com/apache/kafka/pull/4185

MINOR: Update Scala 2.11 to 2.11.12

The main change is Java 9 support.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ijuma/kafka scala-2.11.12

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4185.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4185


commit 832a5ea7e2b12a0f50565a541d5f3edd660e2832
Author: Ismael Juma 
Date:   2017-11-06T23:39:27Z

MINOR: Update Scala 2.11 to 2.11.12

The main change is Java 9 support.




---


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Becket Qin
Congrats, Onur!

On Mon, Nov 6, 2017 at 2:45 PM, James Cheng  wrote:

> Congrats Onur! Well deserved!
>
> -James
>
> > On Nov 6, 2017, at 9:24 AM, Jun Rao  wrote:
> >
> > Hi, everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> > Karaman.
> >
> > Onur's most significant work is the improvement of Kafka controller,
> which
> > is the brain of a Kafka cluster. Over time, we have accumulated quite a
> few
> > correctness and performance issues in the controller. There have been
> > attempts to fix controller issues in isolation, which would make the code
> > base more complicated without a clear path of solving all problems. Onur
> is
> > the one who took a holistic approach, by first documenting all known
> > issues, writing down a new design, coming up with a plan to deliver the
> > changes in phases and executing on it. At this point, Onur has completed
> > the two most important phases: making the controller single threaded and
> > changing the controller to use the async ZK api. The former fixed
> multiple
> > deadlocks and race conditions. The latter significantly improved the
> > performance when there are many partitions. Experimental results show
> that
> > Onur's work reduced the controlled shutdown time by a factor of 100 times
> > and the controller failover time by a factor of 3 times.
> >
> > Congratulations, Onur!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
>
>


Build failed in Jenkins: kafka-trunk-jdk8 #2192

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[ismael] MINOR: PartitionReassignmentHandler should only generate event when

--
[...truncated 382.77 KB...]

kafka.security.auth.ResourceTypeTest > testJavaConversions STARTED

kafka.security.auth.ResourceTypeTest > testJavaConversions PASSED

kafka.security.auth.ResourceTypeTest > testFromString STARTED

kafka.security.auth.ResourceTypeTest > testFromString PASSED

kafka.security.auth.OperationTest > testJavaConversions STARTED

kafka.security.auth.OperationTest > testJavaConversions PASSED

kafka.security.auth.AclTest > testAclJsonConversion STARTED

kafka.security.auth.AclTest > testAclJsonConversion PASSED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled STARTED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled PASSED

kafka.security.auth.ZkAuthorizationTest > testZkUtils STARTED

kafka.security.auth.ZkAuthorizationTest > testZkUtils PASSED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testZkMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testChroot STARTED

kafka.security.auth.ZkAuthorizationTest > testChroot PASSED

kafka.security.auth.ZkAuthorizationTest > testDelete STARTED

kafka.security.auth.ZkAuthorizationTest > testDelete PASSED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive STARTED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce STARTED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch STARTED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize 
STARTED

kafka.integration.PrimitiveApiTest > testFetchRequestCanProperlySerialize PASSED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests STARTED

kafka.integration.PrimitiveApiTest > testPipelinedProduceRequests PASSED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch STARTED

kafka.integration.PrimitiveApiTest > testProduceAndMultiFetch PASSED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression STARTED

kafka.integration.PrimitiveApiTest > 
testDefaultEncoderProducerAndFetchWithCompression PASSED

kafka.integration.PrimitiveApiTest > testConsumerEmptyTopic STARTED


Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-06 Thread Matthias J. Sax
It very useful! Thanks a lot! I can follow now whats going on :)

One clarification question: Second input is for example

key: B0 : value [A2,B0 ...]

The second B0 in the value is actually irrelevant, right? The partition
key A2B0 is build from B0 from the key, right?

-Matthias

On 11/6/17 10:46 PM, Jan Filipiak wrote:
> Going to put all possible szenarios. Just wanted to layout the table and
> ask if its usefull and understandable first
> 
> On 06.11.2017 22:33, Ted Yu wrote:
>> bq. Update in A delete in A update in B delete in B
>>
>> Are you going to fill in the above scenario (currently blank) ?
>>
>> On Mon, Nov 6, 2017 at 12:31 PM, Jan Filipiak 
>> wrote:
>>
>>> I created an example Table in the WIKI page
>>> Can you quickly check if that would be a good format?
>>> I tried todo it ~like the unit tests but with the information of what
>>> state is there _AFTER_
>>> processing happend.
>>> I make the first 2 columns exclusive even though the in fact run in
>>> parallel but the joining
>>> task serializes the effects.
>>>
>>> Best Jan
>>>
>>> On 06.11.2017 21:20, Jan Filipiak wrote:
>>>
 Will do! Need to do it carefully. One mistake in this detailed approach
 and confusion is perfect ;)
 Hope I can deliver this week.

 Best Jan


 On 06.11.2017 17:21, Matthias J. Sax wrote:

> Jan,
>
> thanks a lot for this KIP. I did an initial pass over it, but feel a
> little lost. Maybe I need to read it more carefully, but atm it's not
> clear to me at all what algorithm you propose.
>
> I think it would be super helpful, to do an example with concrete data
> that show how records are stored, what the different value mappers
> extract, and what is written into repartitioning topics.
>
>
>
> -Matthias
>
>
> On 11/5/17 2:09 AM, Jan Filipiak wrote:
>
>> Hi Gouzhang
>>
>> I hope the wikipage looks better now. made a little more effort
>> into the
>> diagram. Still not ideal but I think it serves its purpose.
>>
>>
>>
>> On 02.11.2017 01:17, Guozhang Wang wrote:
>>
>>> Thanks for the KIP writeup Jan. I made a first pass and here are
>>> some
>>> quick
>>> comments:
>>>
>>>
>>> 1. Could we use K0 / V0 and K1 / V1 etc since K0 and KO are a bit
>>> harder to
>>> differentiate when reading.
>>>
>>> 2. I think you missed the key type in the intrusive approach example
>>> code
>>> snippet regarding "KTable  oneToManyJoin"? Should that be
>>>
>>> KTable, V0> oneToManyJoin
>>>
>>> 3. Some of the arrows in your algorithm section's diagrams seems
>>> reversed.
>>>
>>> 4. In the first step of the algorithm, "Materialize B first", that
>>> happens
>>> in the "Repartition by A's key" block right? If yes, could you
>>> clarify
>>> it
>>> in the block?
>>>
>>> 5. "skip old if A's key didn't change": hmm, not sure if we can skip
>>> it.
>>> What if other fields (neither A's key or B's key) changes?
>>> Suppose you
>>> have
>>> an aggregation after the join, we still need to subtract the old
>>> value
>>> from
>>> the aggregation right?
>>>
>>> 6. In the block of "Materialize B", I think from your description we
>>> are
>>> actually materializing both A and B right? If yes could you
>>> update the
>>> diagram?
>>>
>>> 7. This is a meta question: "in the sink, only use A's key to
>>> determine
>>> partition" I think we had the discussion long time ago, that if
>>> we are
>>> sending the old and new entries of the pair to different partitions,
>>> their
>>> ordering may get reversed later when reading from the join operator
>>> (i.e.
>>> the "Materialize B" block in your diagram). How did you address that
>>> with
>>> this proposal?
>>>
>>> 8. "B records with a 'null' A-key value would be silently dropped"
>>> Where
>>> are we dropping it, do we drop it at the first sub-topology (i.e the
>>> "Repartition by A's key" block)?
>>>
>>> Guozhang
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Nov 1, 2017 at 12:18 PM, Jan Filipiak <
>>> jan.filip...@trivago.com>
>>> wrote:
>>>
>>> Hi thanks for the feedback
 On 01.11.2017 12:58, Damian Guy wrote:

 Hi Jan, Thanks for the KIP!
> In both alternatives the API will need to use the `Joined` class
> rather
> than than passing in `Serde`s. Also, as with all other joins etc,
> there
> probably should be an overload that doesn't require any `Serdes`.
>
> Will check again how current API looks. I remember loosing the
 argument
 with this IQ overloads things.
 Didn't expect something to have happend already so I just copied
 from
 the

[GitHub] kafka pull request #4085: HOTFIX: poll with zero millis during restoration

2017-11-06 Thread guozhangwang
Github user guozhangwang closed the pull request at:

https://github.com/apache/kafka/pull/4085


---


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread James Cheng
Congrats Onur! Well deserved!

-James

> On Nov 6, 2017, at 9:24 AM, Jun Rao  wrote:
> 
> Hi, everyone,
> 
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> Karaman.
> 
> Onur's most significant work is the improvement of Kafka controller, which
> is the brain of a Kafka cluster. Over time, we have accumulated quite a few
> correctness and performance issues in the controller. There have been
> attempts to fix controller issues in isolation, which would make the code
> base more complicated without a clear path of solving all problems. Onur is
> the one who took a holistic approach, by first documenting all known
> issues, writing down a new design, coming up with a plan to deliver the
> changes in phases and executing on it. At this point, Onur has completed
> the two most important phases: making the controller single threaded and
> changing the controller to use the async ZK api. The former fixed multiple
> deadlocks and race conditions. The latter significantly improved the
> performance when there are many partitions. Experimental results show that
> Onur's work reduced the controlled shutdown time by a factor of 100 times
> and the controller failover time by a factor of 3 times.
> 
> Congratulations, Onur!
> 
> Thanks,
> 
> Jun (on behalf of the Apache Kafka PMC)



[GitHub] kafka pull request #4130: HOTFIX: Remove sysout logging

2017-11-06 Thread guozhangwang
Github user guozhangwang closed the pull request at:

https://github.com/apache/kafka/pull/4130


---


[GitHub] kafka pull request #4184: MINOR: Remove FanoutIntegrationTest.java

2017-11-06 Thread guozhangwang
GitHub user guozhangwang opened a pull request:

https://github.com/apache/kafka/pull/4184

MINOR: Remove FanoutIntegrationTest.java

This test has been completely subsumed by the coverage of reset integration 
test, and hence can be removed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/guozhangwang/kafka 
KMinor-remove-fanout-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4184.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4184


commit 9e189664fb9c1ab882403f4174393e8f2aa66047
Author: Guozhang Wang 
Date:   2017-11-06T22:40:13Z

Remove FanoutIntegrationTest.java




---


Build failed in Jenkins: kafka-trunk-jdk7 #2950

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[ismael] MINOR: PartitionReassignmentHandler should only generate event when

--
[...truncated 379.91 KB...]

kafka.server.DynamicConfigChangeTest > testConfigChange STARTED

kafka.server.DynamicConfigChangeTest > testConfigChange PASSED

kafka.server.ServerGenerateBrokerIdTest > testGetSequenceIdMethod STARTED

kafka.server.ServerGenerateBrokerIdTest > testGetSequenceIdMethod PASSED

kafka.server.ServerGenerateBrokerIdTest > testBrokerMetadataOnIdCollision 
STARTED

kafka.server.ServerGenerateBrokerIdTest > testBrokerMetadataOnIdCollision PASSED

kafka.server.ServerGenerateBrokerIdTest > testAutoGenerateBrokerId STARTED

kafka.server.ServerGenerateBrokerIdTest > testAutoGenerateBrokerId PASSED

kafka.server.ServerGenerateBrokerIdTest > testMultipleLogDirsMetaProps STARTED

kafka.server.ServerGenerateBrokerIdTest > testMultipleLogDirsMetaProps PASSED

kafka.server.ServerGenerateBrokerIdTest > testDisableGeneratedBrokerId STARTED

kafka.server.ServerGenerateBrokerIdTest > testDisableGeneratedBrokerId PASSED

kafka.server.ServerGenerateBrokerIdTest > testUserConfigAndGeneratedBrokerId 
STARTED

kafka.server.ServerGenerateBrokerIdTest > testUserConfigAndGeneratedBrokerId 
PASSED

kafka.server.ServerGenerateBrokerIdTest > 
testConsistentBrokerIdFromUserConfigAndMetaProps STARTED

kafka.server.ServerGenerateBrokerIdTest > 
testConsistentBrokerIdFromUserConfigAndMetaProps PASSED

kafka.server.AdvertiseBrokerTest > testBrokerAdvertiseHostNameAndPortToZK 
STARTED

kafka.server.AdvertiseBrokerTest > testBrokerAdvertiseHostNameAndPortToZK PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testValidCreateTopicsRequests 
PASSED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
STARTED

kafka.server.CreateTopicsRequestWithPolicyTest > testErrorCreateTopicsRequests 
PASSED

kafka.server.SaslApiVersionsRequestTest > 
testApiVersionsRequestWithUnsupportedVersion STARTED

kafka.server.SaslApiVersionsRequestTest > 
testApiVersionsRequestWithUnsupportedVersion PASSED

kafka.server.SaslApiVersionsRequestTest > 
testApiVersionsRequestBeforeSaslHandshakeRequest STARTED

kafka.server.SaslApiVersionsRequestTest > 
testApiVersionsRequestBeforeSaslHandshakeRequest PASSED

kafka.server.SaslApiVersionsRequestTest > 
testApiVersionsRequestAfterSaslHandshakeRequest STARTED

kafka.server.SaslApiVersionsRequestTest > 
testApiVersionsRequestAfterSaslHandshakeRequest PASSED

kafka.server.ServerGenerateClusterIdTest > 
testAutoGenerateClusterIdForKafkaClusterParallel STARTED

kafka.server.ServerGenerateClusterIdTest > 
testAutoGenerateClusterIdForKafkaClusterParallel PASSED

kafka.server.ServerGenerateClusterIdTest > testAutoGenerateClusterId STARTED

kafka.server.ServerGenerateClusterIdTest > testAutoGenerateClusterId PASSED

kafka.server.ServerGenerateClusterIdTest > 
testAutoGenerateClusterIdForKafkaClusterSequential STARTED

kafka.server.ServerGenerateClusterIdTest > 
testAutoGenerateClusterIdForKafkaClusterSequential PASSED

kafka.server.ApiVersionsTest > testApiVersions STARTED

kafka.server.ApiVersionsTest > testApiVersions PASSED

kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 STARTED

kafka.server.KafkaApisTest > 
shouldRespondWithUnsupportedForMessageFormatOnHandleWriteTxnMarkersWhenMagicLowerThanRequired
 PASSED

kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 STARTED

kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleTxnOffsetCommitRequestWhenInterBrokerProtocolNotSupported
 PASSED

kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 STARTED

kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleAddPartitionsToTxnRequestWhenInterBrokerProtocolNotSupported
 PASSED

kafka.server.KafkaApisTest > testReadUncommittedConsumerListOffsetLatest STARTED

kafka.server.KafkaApisTest > testReadUncommittedConsumerListOffsetLatest PASSED

kafka.server.KafkaApisTest > 
shouldAppendToLogOnWriteTxnMarkersWhenCorrectMagicVersion STARTED

kafka.server.KafkaApisTest > 
shouldAppendToLogOnWriteTxnMarkersWhenCorrectMagicVersion PASSED

kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleWriteTxnMarkersRequestWhenInterBrokerProtocolNotSupported
 STARTED

kafka.server.KafkaApisTest > 
shouldThrowUnsupportedVersionExceptionOnHandleWriteTxnMarkersRequestWhenInterBrokerProtocolNotSupported
 PASSED

kafka.server.KafkaApisTest > 
shouldRespondWithUnknownTopicWhenPartitionIsNotHosted STARTED

kafka.server.KafkaApisTest > 
shouldRespondWithUnknownTopicWhenPartitionIsNotHosted 

Jenkins build is back to normal : kafka-trunk-jdk8 #2191

2017-11-06 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2017-11-06 Thread Jorge Esteban Quilcate Otoya
Thanks for the feedback!

@Ted Yu: Links added.

KIP updated. Changes:

* `#listConsumerGroups(ListConsumerGroupsOptions options)` added to the API.
* `DescribeConsumerGroupResult` and `ConsumerGroupDescription` classes
described.

Cheers,
Jorge.




El lun., 6 nov. 2017 a las 20:28, Guozhang Wang ()
escribió:

> Hi Matthias,
>
> You meant "list groups" I think?
>
> Guozhang
>
> On Mon, Nov 6, 2017 at 11:17 AM, Matthias J. Sax 
> wrote:
>
> > The main goal of this KIP is to enable decoupling StreamsResetter from
> > core module. For this case (ie, using AdminClient within
> > StreamsResetter) we get the group.id from the user as command line
> > argument. Thus, I think the KIP is useful without "describe group"
> > command to.
> >
> > I am happy to include "describe group" command in the KIP. Just want to
> > point out, that there is no reason to insist on it IMHO.
> >
> >
> > -Matthias
> >
> > On 11/6/17 7:06 PM, Guozhang Wang wrote:
> > > A quick question: I think we do not yet have the `list consumer groups`
> > > func as in the old AdminClient. Without this `describe group` given the
> > > group id would not be very useful. Could you include this as well in
> your
> > > KIP? More specifically, you can look at kafka.admin.AdminClientfor more
> > > details on the APIs.
> > >
> > >
> > > Guozhang
> > >
> > > On Mon, Nov 6, 2017 at 7:22 AM, Ted Yu  wrote:
> > >
> > >> Please fill out Discussion thread and JIRA fields.
> > >>
> > >> Thanks
> > >>
> > >> On Mon, Nov 6, 2017 at 2:02 AM, Tom Bentley 
> > wrote:
> > >>
> > >>> Hi Jorge,
> > >>>
> > >>> Thanks for the KIP. A few initial comments:
> > >>>
> > >>> 1. The AdminClient doesn't have any API like `listConsumerGroups()`
> > >>> currently, so in general how does a client know the group ids it is
> > >>> interested in?
> > >>> 2. Could you fill in the API of DescribeConsumerGroupResult, just so
> > >>> everyone knows exactly what being proposed.
> > >>> 3. Can you describe the ConsumerGroupDescription class?
> > >>> 4. Probably worth mentioning that this will use
> > >>> DescribeGroupsRequest/Response, and also enumerating the error codes
> > >> that
> > >>> can return (or, equivalently, enumerate the exceptions throw from the
> > >>> futures obtained from the DescribeConsumerGroupResult).
> > >>>
> > >>> Cheers,
> > >>>
> > >>> Tom
> > >>>
> > >>> On 6 November 2017 at 08:19, Jorge Esteban Quilcate Otoya <
> > >>> quilcate.jo...@gmail.com> wrote:
> > >>>
> >  Hi everyone,
> > 
> >  I would like to start a discussion on KIP-222 [1] based on issue
> [2].
> > 
> >  Looking forward to feedback.
> > 
> >  [1]
> >  https://cwiki.apache.org/confluence/pages/viewpage.
> > >>> action?pageId=74686265
> >  [2] https://issues.apache.org/jira/browse/KAFKA-6058
> > 
> >  Cheers,
> >  Jorge.
> > 
> > >>>
> > >>
> > >
> > >
> > >
> >
> >
>
>
> --
> -- Guozhang
>


[GitHub] kafka pull request #4183: Kafka 5692 elect preferred

2017-11-06 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/4183

Kafka 5692 elect preferred



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-5692-elect-preferred

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4183


commit f34bd4ba0e61a910e77ba0a6c9152974737412ca
Author: Tom Bentley 
Date:   2017-09-06T14:39:24Z

KAFKA-5692: Change PreferredReplicaLeaderElectionCommand to use AdminClient

See also KIP-183.

This implements the following algorithm:

1. AdminClient sends ElectPreferredLeadersRequest.
2. KafakApis receives ElectPreferredLeadersRequest and delegates to
   ReplicaManager.electPreferredLeaders()
3. ReplicaManager delegates to KafkaController.electPreferredLeaders()
4. KafkaController adds a PreferredReplicaLeaderElection to the 
EventManager,
5. ReplicaManager.electPreferredLeaders()'s callback uses the
   delayedElectPreferredReplicasPurgatory to wait for the results of the
   election to appear in the metadata cache. If there are no results
   because of errors, or because the preferred leaders are already leading
   the partitions then a response is returned immediately.

In the EventManager work thread the preferred leader is elected as follows:

1. The EventManager runs PreferredReplicaLeaderElection.process()
2. process() calls KafkaController.onPreferredReplicaElectionWithResults()
3. KafkaController.onPreferredReplicaElectionWithResults()
   calls the PartitionStateMachine.handleStateChangesWithResults() to
   perform the election (asynchronously the PSM will send 
LeaderAndIsrRequest
   to the new and old leaders and UpdateMetadataRequest to all brokers)
   then invokes the callback.

Note: the change in parameter type for CollectionUtils.groupDataByTopic().
This makes sense because the AdminClient APIs use Collection consistently,
rather than List or Set. If binary compatiblity is a consideration the old
version should be kept, delegating to the new version.

I had to add PartitionStateMachine.handleStateChangesWithResults()
in order to be able to process a set of state changes in the
PartitionStateMachine *and get back individual results*.
At the same time I noticed that all callers of existing handleStateChange()
were destructuring a TopicAndPartition that they already had in order
to call handleStateChange(), and that handleStateChange() immediately
instantiated a new TopicAndPartition. Since TopicAndPartition is immutable
this is pointless, so I refactored it. handleStateChange() also now returns
any exception it caught, which is necessary for 
handleStateChangesWithResults()

commit 74677b59bf925a834b4f84c5c5718aa4eeb38902
Author: Tom Bentley 
Date:   2017-09-20T11:57:19Z

Style

commit d3cc0f5c272573d688e614e08b66bfec9cf49fa7
Author: Tom Bentley 
Date:   2017-09-20T13:19:26Z

WIP

commit 89a3bf441cbf55fbc41be188b18a9404cbb8e6ee
Author: Tom Bentley 
Date:   2017-09-22T14:30:58Z

Properly detect and throw correct exception when leader unavailable

commit 98fe16acccf9a42732ae159bc84abb308be7a4da
Author: Tom Bentley 
Date:   2017-09-22T14:34:49Z

Javadoc + formatting

commit 756a5d558f8a03f2e8709bef8eddef702fcb26bf
Author: Tom Bentley 
Date:   2017-09-22T14:36:10Z

Formatting and DelayedElectionOperation

commit c734c0eef9e77f9efe620d969158e9f8eb73bf9a
Author: Tom Bentley 
Date:   2017-09-22T14:36:37Z

Add a test

TODO: Add Authorizer test

commit 736383b7e6801270e33c6cb1aab0aa27f8f115a5
Author: Tom Bentley 
Date:   2017-09-27T15:18:44Z

Add ELECT_PREFERRED_LEADERS to AuthorizerIntegrationTests

commit f8c1d1ba0fe33ea9258a4b99a071a53d3c77028a
Author: Tom Bentley 
Date:   2017-09-27T20:15:45Z

Fix flaky test

commit 64fcbfc1e3cbf5949431f8c7ca4011f19fa3ac3d
Author: Tom Bentley 
Date:   2017-10-12T11:30:58Z

Add errorCounts() method

commit b4787c9a6d1dc8b0fbc7d78833cd19389cdc06cf
Author: Tom Bentley 
Date:   2017-10-30T11:16:59Z

Move exceptionalFuture() to KafkaFuture.

commit 3a92833ed321e7e9d885497fcf286cdfaede27e7
Author: Tom Bentley 
Date:   2017-10-30T18:30:19Z

WIP solution using thenCompose()

commit 0a57af9172985c4c382087aa167bd4421b375d5d
Author: Colin P. Mccabe 
Date:   2017-11-06T22:06:32Z

Rework futures




---


[jira] [Resolved] (KAFKA-6172) Cache lastEntry in TimeIndex to avoid unnecessary disk access

2017-11-06 Thread Dong Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Lin resolved KAFKA-6172.
-
Resolution: Fixed

> Cache lastEntry in TimeIndex to avoid unnecessary disk access
> -
>
> Key: KAFKA-6172
> URL: https://issues.apache.org/jira/browse/KAFKA-6172
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 1.0.1
>
>
> LogSegment.close() calls timeIndex.maybeAppend(...), which in turns make a 
> number of calls to timeIndex.lastEntry(). Currently timeIndex.lastEntry() 
> involves disk seek operation because it tries to read the content of the last 
> few bytes of the index files on the disk. This slows down the broker shutdown 
> process.
> Here is the time of LogManager.shutdown() in various settings. In all these 
> tests, broker has roughly 6k partitions and 20k segments.
> - If broker does not have this patch and `log.dirs` is configured with 1 JBOD 
> log directory, LogManager.shutdown() takes 15 minutes (roughly 900 seconds).
> - If broker does not have this patch and `log.dirs` is configured with 10 
> JBOD log directories, LogManager.shutdown() takes 84 seconds.
> - If broker have this patch and `log.dirs` is configured with 10 JBOD log 
> directories, LogManager.shutdown() takes 24 seconds.
> Thus we expect to save 71% time in LogManager.shutdown() by having this 
> optimization. This patch intends to reduce the broker shutdown time by 
> caching the lastEntry in memory so that broker does not have to always read 
> disk to get the lastEntry.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] 0.11.0.2 bug fix release

2017-11-06 Thread Rajini Sivaram
There is currently no one assigned to
https://issues.apache.org/jira/browse/KAFKA-4972 and
https://issues.apache.org/jira/browse/KAFKA-3955. I am not sure either of
these has an easy fix. So unless some one picks them up soon, they may need
to get moved to the next release.

On Mon, Nov 6, 2017 at 9:29 PM, Rajini Sivaram 
wrote:

> Thanks, Guozhang. I will move  KAFKA-1923 out of 0.11.0.2.
>
> On Mon, Nov 6, 2017 at 5:36 PM, Guozhang Wang  wrote:
>
>> Thanks Rajini, +1 on releasing a 0.11.0.2.
>>
>> I made a pass over the outstanding issues, and added another related
>> issue (
>> https://issues.apache.org/jira/browse/KAFKA-4767) to KAFKA-5936 to the
>> list.
>>
>> For the other outstanding ones that do not yet have assigned to anyone
>> yet,
>> we'd need to start working on them asap if we want to include their fixes
>> by Friday. Among them, only KAFKA-1923 seems not straight forward, so my
>> suggestion is to move that issue out of the scope while driving to finish
>> others sooner.
>>
>> Guozhang
>>
>>
>> On Mon, Nov 6, 2017 at 2:47 AM, Rajini Sivaram 
>> wrote:
>>
>> > Hi all,
>> >
>> >
>> > Since we have fixed some critical issues since 0.11.0.1, it must be time
>> > for a 0.11.0.2 release.
>> >
>> >
>> > Since the 0.11.0.1 release we've fixed 14 JIRAs that are targeted for
>> > 0.11.0.2:
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-6134?jql=
>> > project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%
>> > 20Closed)%20AND%20fixVersion%20%3D%200.11.0.2
>> >
>> >
>> >
>> > We have 11 outstanding issues that are targeted for 0.11.0.2:
>> >
>> > https://issues.apache.org/jira/browse/KAFKA-6007?jql=
>> > project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%
>> > 22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%
>> > 20fixVersion%20%3D%200.11.0.2
>> >
>> >
>> > Can the owners of these issues please resolve them soon or move them to
>> a
>> > future release?
>> >
>> >
>> > I will aim to create the first RC for 0.11.0.2 this Friday.
>> >
>> > Thank you!
>> >
>> > Regards,
>> >
>> > Rajini
>> >
>>
>>
>>
>> --
>> -- Guozhang
>>
>
>


Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-06 Thread Jan Filipiak
Going to put all possible szenarios. Just wanted to layout the table and 
ask if its usefull and understandable first


On 06.11.2017 22:33, Ted Yu wrote:

bq. Update in A delete in A update in B delete in B

Are you going to fill in the above scenario (currently blank) ?

On Mon, Nov 6, 2017 at 12:31 PM, Jan Filipiak 
wrote:


I created an example Table in the WIKI page
Can you quickly check if that would be a good format?
I tried todo it ~like the unit tests but with the information of what
state is there _AFTER_
processing happend.
I make the first 2 columns exclusive even though the in fact run in
parallel but the joining
task serializes the effects.

Best Jan

On 06.11.2017 21:20, Jan Filipiak wrote:


Will do! Need to do it carefully. One mistake in this detailed approach
and confusion is perfect ;)
Hope I can deliver this week.

Best Jan


On 06.11.2017 17:21, Matthias J. Sax wrote:


Jan,

thanks a lot for this KIP. I did an initial pass over it, but feel a
little lost. Maybe I need to read it more carefully, but atm it's not
clear to me at all what algorithm you propose.

I think it would be super helpful, to do an example with concrete data
that show how records are stored, what the different value mappers
extract, and what is written into repartitioning topics.



-Matthias


On 11/5/17 2:09 AM, Jan Filipiak wrote:


Hi Gouzhang

I hope the wikipage looks better now. made a little more effort into the
diagram. Still not ideal but I think it serves its purpose.



On 02.11.2017 01:17, Guozhang Wang wrote:


Thanks for the KIP writeup Jan. I made a first pass and here are some
quick
comments:


1. Could we use K0 / V0 and K1 / V1 etc since K0 and KO are a bit
harder to
differentiate when reading.

2. I think you missed the key type in the intrusive approach example
code
snippet regarding "KTable  oneToManyJoin"? Should that be

KTable, V0> oneToManyJoin

3. Some of the arrows in your algorithm section's diagrams seems
reversed.

4. In the first step of the algorithm, "Materialize B first", that
happens
in the "Repartition by A's key" block right? If yes, could you clarify
it
in the block?

5. "skip old if A's key didn't change": hmm, not sure if we can skip
it.
What if other fields (neither A's key or B's key) changes? Suppose you
have
an aggregation after the join, we still need to subtract the old value
from
the aggregation right?

6. In the block of "Materialize B", I think from your description we
are
actually materializing both A and B right? If yes could you update the
diagram?

7. This is a meta question: "in the sink, only use A's key to determine
partition" I think we had the discussion long time ago, that if we are
sending the old and new entries of the pair to different partitions,
their
ordering may get reversed later when reading from the join operator
(i.e.
the "Materialize B" block in your diagram). How did you address that
with
this proposal?

8. "B records with a 'null' A-key value would be silently dropped"
Where
are we dropping it, do we drop it at the first sub-topology (i.e the
"Repartition by A's key" block)?

Guozhang





On Wed, Nov 1, 2017 at 12:18 PM, Jan Filipiak <
jan.filip...@trivago.com>
wrote:

Hi thanks for the feedback

On 01.11.2017 12:58, Damian Guy wrote:

Hi Jan, Thanks for the KIP!

In both alternatives the API will need to use the `Joined` class
rather
than than passing in `Serde`s. Also, as with all other joins etc,
there
probably should be an overload that doesn't require any `Serdes`.

Will check again how current API looks. I remember loosing the

argument
with this IQ overloads things.
Didn't expect something to have happend already so I just copied from
the
PR. Will update.
Will also add the overload.

It isn't clear to me what `joinPrefixFaker` is doing? In the comment

it
says "returning an outputKey that when serialized only produces a
prefix
of
the output key which is the same serializing K" So why not just use
"K" ?

The faker in fact returns K wich can be serialized by the Key Serde

in the
rocks. But it needs to only contain A's key and it needs to be a
strict
prefix
byte[] of all K with this A's key. We gonna seek there with an
RocksIterator and continue to read as long as the "faked key"
serialized
form is a prefix
This is easy todo for Avro + Protobuf +  custom Serdes and Hadoop
Writables. Its a nightmare for JSON serdes.



Thanks,

Damian


On Fri, 27 Oct 2017 at 10:27 Ted Yu  wrote:

I think if you explain what A and B are in the beginning, it makes
sense


to
use them since readers would know who they reference.

Cheers

On Thu, Oct 26, 2017 at 11:04 PM, Jan Filipiak


Build failed in Jenkins: kafka-trunk-jdk9 #173

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[ismael] MINOR: PartitionReassignmentHandler should only generate event when

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-4 (ubuntu trusty) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 6f96d7f1735c53b97ebac43168ded64007277beb 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6f96d7f1735c53b97ebac43168ded64007277beb
Commit message: "MINOR: PartitionReassignmentHandler should only generate event 
when znode is created"
 > git rev-list 2b5a21395cf8ce6e3e29a9a778bc20f727ec35fd # timeout=10
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
[kafka-trunk-jdk9] $ /bin/bash -xe /tmp/jenkins6105021679238773676.sh
+ rm -rf 
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2/bin/gradle

FAILURE: Build failed with an exception.

* What went wrong:
Could not determine java version from '9.0.1'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
ERROR: Step ?Publish JUnit test result report? failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 10 days old

Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
Not sending mail to unregistered user ism...@juma.me.uk
Not sending mail to unregistered user rajinisiva...@googlemail.com
Not sending mail to unregistered user wangg...@gmail.com
Not sending mail to unregistered user becket@gmail.com


Re: [DISCUSS]: KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Jeyhun Karimov
Hi Matthias,

Thanks a lot for correcting. It is a leftover from the past designs when
punctuate() was not deprecated.
I corrected.

Cheers,
Jeyhun

On Mon, Nov 6, 2017 at 5:30 PM Matthias J. Sax 
wrote:

> I just re-read the KIP.
>
> One minor comment: we don't need to introduce any deprecated methods.
> Thus, RichValueTransformer#punctuate can be removed completely instead
> of introducing it as deprecated.
>
> Otherwise looks good to me.
>
> Thanks for being so patient!
>
>
> -Matthias
>
> On 11/1/17 9:16 PM, Guozhang Wang wrote:
> > Jeyhun,
> >
> > I think I'm convinced to not do KAFKA-3907 in this KIP. We should think
> > carefully if we should add this functionality to the DSL layer moving
> > forward since from what we discovered working on it the conclusion is
> that
> > it would require revamping the public APIs quite a lot, and it's not
> clear
> > if it is a good trade-off than asking users to call process() instead.
> >
> >
> > Guozhang
> >
> >
> > On Wed, Nov 1, 2017 at 4:50 AM, Damian Guy  wrote:
> >
> >> Hi Jeyhun, thanks, looks good.
> >> Do we need to remove the line that says:
> >>
> >>- on-demand commit() feature
> >>
> >> Cheers,
> >> Damian
> >>
> >> On Tue, 31 Oct 2017 at 23:07 Jeyhun Karimov 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I removed the 'commit()' feature, as we discussed. It simplified  the
> >>> overall design of KIP a lot.
> >>> If it is ok, I would like to start a VOTE thread.
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>>
> >>> On Fri, Oct 27, 2017 at 5:28 PM Matthias J. Sax  >
> >>> wrote:
> >>>
>  Thanks. I understand what you are saying, but I don't agree that
> 
> > but also we need a commit() method
> 
>  I would just not provide `commit()` at DSL level and close the
>  corresponding Jira as "not a problem" or similar.
> 
> 
>  -Matthias
> 
>  On 10/27/17 3:42 PM, Jeyhun Karimov wrote:
> > Hi Matthias,
> >
> > Thanks for your comments. I agree that this is not the best way to
> >> do.
> >>> A
> > bit of history behind this design.
> >
> > Prior doing this, I tried to provide ProcessorContext itself as an
>  argument
> > in Rich interfaces. However, we dont want to give users that
> >>> flexibility
> > and “power”. Moreover, ProcessorContext contains processor level
> > information and not Record level info. The only thing we need ij
> > ProcessorContext is commit() method.
> >
> > So, as far as I understood, we need recor context (offset, timestamp
> >>> and
> > etc) but also we need a commit() method ( we dont want to provide
> > ProcessorContext as a parameter so users can use
>  ProcessorContext.commit()
> > ).
> >
> > As a result, I thought to “propagate” commit() call from
> >> RecordContext
> >>> to
> > ProcessorContext() .
> >
> >
> > If there is a misunderstanding in motvation/discussion of
> >> KIP/included
> > jiras please let me know.
> >
> >
> > Cheers,
> > Jeyhun
> >
> >
> > On Fri 27. Oct 2017 at 12:39, Matthias J. Sax  >>>
>  wrote:
> >
> >> I am personally still not convinced, that we should add `commit()`
> >> at
>  all.
> >>
> >> @Guozhang: you created the original Jira. Can you elaborate a little
> >> bit? Isn't requesting commits a low level API that should not be
> >>> exposed
> >> in the DSL? Just want to understand the motivation better. Why would
> >> anybody that uses the DSL ever want to request a commit? To me,
> >> requesting commits is useful if you manipulated state explicitly,
> >> ie,
> >> via Processor API.
> >>
> >> Also, for the solution: it seem rather unnatural to me, that we add
> >> `commit()` to `RecordContext` -- from my understanding,
> >>> `RecordContext`
> >> is an helper object that provide access to record meta data.
> >>> Requesting
> >> a commit is something quite different. Additionally, a commit does
> >> not
> >> commit a specific record but a `RecrodContext` is for a specific
> >>> record.
> >>
> >> To me, this does not seem to be a sound API design if we follow this
>  path.
> >>
> >>
> >> -Matthias
> >>
> >>
> >>
> >> On 10/26/17 10:41 PM, Jeyhun Karimov wrote:
> >>> Hi,
> >>>
> >>> Thanks for your suggestions.
> >>>
> >>> I have some comments, to make sure that there is no
> >> misunderstanding.
> >>>
> >>>
> >>> 1. Maybe we can deprecate the `commit()` in ProcessorContext, to
>  enforce
>  user to consolidate this call as
>  "processorContext.recordContext().commit()". And internal
>  implementation
>  of
>  `ProcessorContext.commit()` in `ProcessorContextImpl` is also
> >>> changed
>  to
>  this call.
> >>>
> >>>
> >>> - I think we should 

[GitHub] kafka pull request #4143: MINOR: PartitionReassignmentHandler should only ge...

2017-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4143


---


Re: [DISCUSS] KIP-219 - Improve Quota Communication

2017-11-06 Thread Jun Rao
Hi, Jiangjie,

3. If a client closes a socket connection, typically the server only finds
this out by reading from the channel and getting a negative size during the
read. So, if a channel is muted by the server, the server won't be able to
detect the closing of the connection by the client after the socket is
unmuted. That's probably what Rajini wants to convey.

Thanks,

Jun

On Fri, Nov 3, 2017 at 8:11 PM, Becket Qin  wrote:

> Thanks Rajini.
>
> 1. Good point. We do need to bump up the protocol version so that the new
> clients do not wait for another throttle time when they are talking to old
> brokers. I'll update the KIP.
>
> 2. That is true. But the client was not supposed to send request to the
> broker during that period anyways. So detecting the broker failure later
> seems fine?
>
> 3. Wouldn't the CLOSE_WAIT handler number be the same as the current state?
> Currently the broker will still mute the socket until it sends the response
> back. If the clients disconnect while they are being throttled, the closed
> socket will not be detected until the throttle time has passed.
>
> Jun also suggested to bound the time by metric.sample.window.ms in the
> ticket. I am not sure about the bound on throttle time. It seems a little
> difficult to come up with a good bound. If the bound is too large, it does
> not really help solve the various timeout issue we may face. If the bound
> is too low, the quota is essentially not honored. We may potentially treat
> different requests differently, but that seems too complicated and error
> prone.
>
> IMO, the key improvement we want to make is to tell the clients how long
> they will be throttled so the clients knows what happened so they can act
> accordingly instead of waiting naively. Muting the socket on the broker
> side is just in case of non-cooperating clients. For the existing clients,
> it seems this does not have much impact compare with what we have now.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Fri, Nov 3, 2017 at 3:09 PM, Rajini Sivaram 
> wrote:
>
> > Hi Becket,
> >
> > Thank you for the KIP. A few comments:
> >
> > 1.KIP says:  "*No public interface changes are needed. We only propose
> > behavior change on the broker side.*"
> >
> > But from the proposed changes, it sounds like clients will be updated to
> > wait for throttle-time before sending next response, and also not handle
> > idle disconnections during that time. Doesn't that mean that clients need
> > to know that the broker has sent the response before throttling,
> requiring
> > protocol/version change?
> >
> >
> > 2. At the moment, broker failures are detected by clients (and vice
> versa)
> > within connections.max.idle.ms. By removing this check for an unlimited
> > throttle time, failure detection could be delayed.
> >
> >
> > 3. KIP says  "*Since this subsequent request is not actually handled
> until
> > the broker unmutes the channel, the client can hit request.timeout.ms
> >  and reconnect. However, this is no worse
> than
> > the current state.*"
> >
> > I think this could be worse than the current state because broker doesn't
> > detect and close the channel for an unlimited throttle time, while new
> > connections will get accepted. As a result, lot of connections could be
> in
> > CLOSE_WAIT state when throttle time is high.
> >
> >
> > Perhaps it is better to combine this KIP with a bound on throttle time?
> >
> >
> > Regards,
> >
> >
> > Rajini
> >
> >
> > On Fri, Nov 3, 2017 at 8:09 PM, Becket Qin  wrote:
> >
> > > Thanks for the comment, Jun,
> > >
> > > 1. Yes, you are right. This could also happen with the current quota
> > > mechanism because we are essentially muting the socket during throttle
> > > time. There might be two ways to solve this.
> > > A) use another socket to send heartbeat.
> > > B) let the GroupCoordinator know that the client will not heartbeat for
> > > some time.
> > > It seems the first solution is cleaner.
> > >
> > > 2. For consumer it seems returning an empty response is a better
> option.
> > In
> > > the producer case, if there is a spike in traffic. The brokers will see
> > > queued up requests, but that is hard to avoid unless we have connection
> > > level quota, which is a bigger change and may be easier to discuss it
> in
> > a
> > > separate KIP.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > > On Fri, Nov 3, 2017 at 10:28 AM, Jun Rao  wrote:
> > >
> > > > Hi, Jiangjie,
> > > >
> > > > Thanks for bringing this up. A couple of quick thoughts.
> > > >
> > > > 1. If the throttle time is large, what can happen is that a consumer
> > > won't
> > > > be able to heart beat to the group coordinator frequent enough. In
> that
> > > > case, even with this KIP, it seems there could be frequent consumer
> > group
> > > > rebalances.
> > > >
> > > > 2. If we return a response immediately, for the 

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-06 Thread Ted Yu
bq. Update in A delete in A update in B delete in B

Are you going to fill in the above scenario (currently blank) ?

On Mon, Nov 6, 2017 at 12:31 PM, Jan Filipiak 
wrote:

> I created an example Table in the WIKI page
> Can you quickly check if that would be a good format?
> I tried todo it ~like the unit tests but with the information of what
> state is there _AFTER_
> processing happend.
> I make the first 2 columns exclusive even though the in fact run in
> parallel but the joining
> task serializes the effects.
>
> Best Jan
>
> On 06.11.2017 21:20, Jan Filipiak wrote:
>
>> Will do! Need to do it carefully. One mistake in this detailed approach
>> and confusion is perfect ;)
>> Hope I can deliver this week.
>>
>> Best Jan
>>
>>
>> On 06.11.2017 17:21, Matthias J. Sax wrote:
>>
>>> Jan,
>>>
>>> thanks a lot for this KIP. I did an initial pass over it, but feel a
>>> little lost. Maybe I need to read it more carefully, but atm it's not
>>> clear to me at all what algorithm you propose.
>>>
>>> I think it would be super helpful, to do an example with concrete data
>>> that show how records are stored, what the different value mappers
>>> extract, and what is written into repartitioning topics.
>>>
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 11/5/17 2:09 AM, Jan Filipiak wrote:
>>>
 Hi Gouzhang

 I hope the wikipage looks better now. made a little more effort into the
 diagram. Still not ideal but I think it serves its purpose.



 On 02.11.2017 01:17, Guozhang Wang wrote:

> Thanks for the KIP writeup Jan. I made a first pass and here are some
> quick
> comments:
>
>
> 1. Could we use K0 / V0 and K1 / V1 etc since K0 and KO are a bit
> harder to
> differentiate when reading.
>
> 2. I think you missed the key type in the intrusive approach example
> code
> snippet regarding "KTable  oneToManyJoin"? Should that be
>
> KTable, V0> oneToManyJoin
>
> 3. Some of the arrows in your algorithm section's diagrams seems
> reversed.
>
> 4. In the first step of the algorithm, "Materialize B first", that
> happens
> in the "Repartition by A's key" block right? If yes, could you clarify
> it
> in the block?
>
> 5. "skip old if A's key didn't change": hmm, not sure if we can skip
> it.
> What if other fields (neither A's key or B's key) changes? Suppose you
> have
> an aggregation after the join, we still need to subtract the old value
> from
> the aggregation right?
>
> 6. In the block of "Materialize B", I think from your description we
> are
> actually materializing both A and B right? If yes could you update the
> diagram?
>
> 7. This is a meta question: "in the sink, only use A's key to determine
> partition" I think we had the discussion long time ago, that if we are
> sending the old and new entries of the pair to different partitions,
> their
> ordering may get reversed later when reading from the join operator
> (i.e.
> the "Materialize B" block in your diagram). How did you address that
> with
> this proposal?
>
> 8. "B records with a 'null' A-key value would be silently dropped"
> Where
> are we dropping it, do we drop it at the first sub-topology (i.e the
> "Repartition by A's key" block)?
>
> Guozhang
>
>
>
>
>
> On Wed, Nov 1, 2017 at 12:18 PM, Jan Filipiak <
> jan.filip...@trivago.com>
> wrote:
>
> Hi thanks for the feedback
>>
>> On 01.11.2017 12:58, Damian Guy wrote:
>>
>> Hi Jan, Thanks for the KIP!
>>>
>>> In both alternatives the API will need to use the `Joined` class
>>> rather
>>> than than passing in `Serde`s. Also, as with all other joins etc,
>>> there
>>> probably should be an overload that doesn't require any `Serdes`.
>>>
>>> Will check again how current API looks. I remember loosing the
>> argument
>> with this IQ overloads things.
>> Didn't expect something to have happend already so I just copied from
>> the
>> PR. Will update.
>> Will also add the overload.
>>
>> It isn't clear to me what `joinPrefixFaker` is doing? In the comment
>>> it
>>> says "returning an outputKey that when serialized only produces a
>>> prefix
>>> of
>>> the output key which is the same serializing K" So why not just use
>>> "K" ?
>>>
>>> The faker in fact returns K wich can be serialized by the Key Serde
>> in the
>> rocks. But it needs to only contain A's key and it needs to be a
>> strict
>> prefix
>> byte[] of all K with this A's key. We gonna seek there with an
>> RocksIterator and continue to read as long as the "faked key"
>> serialized
>> form is a prefix
>> This is easy todo for Avro + Protobuf +  custom Serdes and Hadoop
>> Writables. 

Re: [DISCUSS] 0.11.0.2 bug fix release

2017-11-06 Thread Rajini Sivaram
Thanks, Guozhang. I will move  KAFKA-1923 out of 0.11.0.2.

On Mon, Nov 6, 2017 at 5:36 PM, Guozhang Wang  wrote:

> Thanks Rajini, +1 on releasing a 0.11.0.2.
>
> I made a pass over the outstanding issues, and added another related issue
> (
> https://issues.apache.org/jira/browse/KAFKA-4767) to KAFKA-5936 to the
> list.
>
> For the other outstanding ones that do not yet have assigned to anyone yet,
> we'd need to start working on them asap if we want to include their fixes
> by Friday. Among them, only KAFKA-1923 seems not straight forward, so my
> suggestion is to move that issue out of the scope while driving to finish
> others sooner.
>
> Guozhang
>
>
> On Mon, Nov 6, 2017 at 2:47 AM, Rajini Sivaram 
> wrote:
>
> > Hi all,
> >
> >
> > Since we have fixed some critical issues since 0.11.0.1, it must be time
> > for a 0.11.0.2 release.
> >
> >
> > Since the 0.11.0.1 release we've fixed 14 JIRAs that are targeted for
> > 0.11.0.2:
> >
> > https://issues.apache.org/jira/browse/KAFKA-6134?jql=
> > project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%
> > 20Closed)%20AND%20fixVersion%20%3D%200.11.0.2
> >
> >
> >
> > We have 11 outstanding issues that are targeted for 0.11.0.2:
> >
> > https://issues.apache.org/jira/browse/KAFKA-6007?jql=
> > project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%
> > 22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%
> > 20fixVersion%20%3D%200.11.0.2
> >
> >
> > Can the owners of these issues please resolve them soon or move them to a
> > future release?
> >
> >
> > I will aim to create the first RC for 0.11.0.2 this Friday.
> >
> > Thank you!
> >
> > Regards,
> >
> > Rajini
> >
>
>
>
> --
> -- Guozhang
>


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Rajini Sivaram
Congratulations, Onur!

On Mon, Nov 6, 2017 at 8:10 PM, Dong Lin  wrote:

> Congratulations Onur!
>
> On Mon, Nov 6, 2017 at 9:24 AM, Jun Rao  wrote:
>
> > Hi, everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> > Karaman.
> >
> > Onur's most significant work is the improvement of Kafka controller,
> which
> > is the brain of a Kafka cluster. Over time, we have accumulated quite a
> few
> > correctness and performance issues in the controller. There have been
> > attempts to fix controller issues in isolation, which would make the code
> > base more complicated without a clear path of solving all problems. Onur
> is
> > the one who took a holistic approach, by first documenting all known
> > issues, writing down a new design, coming up with a plan to deliver the
> > changes in phases and executing on it. At this point, Onur has completed
> > the two most important phases: making the controller single threaded and
> > changing the controller to use the async ZK api. The former fixed
> multiple
> > deadlocks and race conditions. The latter significantly improved the
> > performance when there are many partitions. Experimental results show
> that
> > Onur's work reduced the controlled shutdown time by a factor of 100 times
> > and the controller failover time by a factor of 3 times.
> >
> > Congratulations, Onur!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
> >
>


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-06 Thread Ted Yu
bq. enlarge the score of through()

I guess you meant scope.

On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov  wrote:

> Hi,
>
> Sorry for the late reply. I am convinced that we should enlarge the score
> of through() (add more overloads) instead of introducing a separate set of
> overloads to other methods.
> I will update the KIP soon based on the discussion and inform.
>
>
> Cheers,
> Jeyhun
>
> On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak 
> wrote:
>
> > Sorry for not beeing 100% up to date.
> > Back then we had the discussion that when an operation puts a >Sink<
> > into the topology, a >Produced<
> > parameter is added. This produced parameter could have internal or
> > external. If internal I think the name would still make
> > a great suffix for the topic name
> >
> > Is this plan still around? Otherwise having the name as suffix is
> > probably always good it can help the user quicker to identify hot topics
> > that need more
> > partitions if he has many of these internal repartitions
> >
> > Best Jan
> >
> >
> > On 06.11.2017 20:13, Matthias J. Sax wrote:
> > > I absolute agree with what you say. It's not a requirement to specify a
> > > topic name -- and this was the idea -- if user does specify a name, we
> > > treat as is -- if users does not specify a name, Streams create an
> > > internal topic.
> > >
> > > The goal of the Jira is to allow a simplified way to control
> > > repartitioning (atm, user needs to manually create a topic and use via
> > > through()).
> > >
> > > Thus, the idea is to make the topic name parameter of through optional.
> > >
> > > It's of course just an idea. Happy do have a other API design. The goal
> > > was, to avoid to many new overloads.
> > >
> > >>> Could you clarify exactly what you mean by keeping the current
> > distinction?
> > > Current distinction is: user topics are created manually and user
> > > specifies the name -- internal topics are created by Kafka Streams and
> > > an name is generated automatically.
> > >
> > > -> through("user-topic")
> > > -> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates
> an
> > > internal topic
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 11/6/17 6:56 PM, Thomas Becker wrote:
> > >> Could you clarify exactly what you mean by keeping the current
> > distinction?
> > >>
> > >> Actually, re-reading the KIP and JIRA, it's not clear that being able
> > to specify a custom name is actually a requirement. If the goal is to
> > control repartitioning and tune parallelism, maybe we can just sidestep
> > this issue altogether by removing the ability to set a different name.
> > >>
> > >> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote:
> > >>
> > >> That's a good point. In current design, we strictly distinguish both.
> > >> For example, the reset tools deletes internal topics (starting with
> > >> prefix `-` and ending with either `-repartition` or
> > >> `-changelog`.
> > >>
> > >> Thus, from my point of view, it would make sense to keep the current
> > >> distinction.
> > >>
> > >> -Matthias
> > >>
> > >> On 11/6/17 4:45 PM, Thomas Becker wrote:
> > >>
> > >>
> > >> I think this sounds good as well. It's worth clarifying whether topics
> > that are named by the user but created by streams are considered
> "internal"
> > topics also.
> > >>
> > >> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:
> > >>
> > >> My idea was, to relax the requirement for through() that a topic must
> be
> > >> created manually before startup.
> > >>
> > >> Thus, if no through() call is made, a (internal) topic is created the
> > >> same way we do it currently.
> > >>
> > >> If one uses `through(String topicName)` we keep the current behavior
> and
> > >> require users to create the topic manually.
> > >>
> > >> The reasoning is as follows: if a user creates a topic manually, a
> user
> > >> can just use it for repartitioning. As the topic is already there,
> there
> > >> is no need to specify any topic configs.
> > >>
> > >> We add a new `through()` overload (details TBD) that allows to specify
> > >> topic configs and Streams create the topic with those configs.
> > >>
> > >> Reasoning: user don't want to manage topic manually, thus, it's still
> an
> > >> internal topic and Streams create the topic name automatically as for
> > >> all other internal topics. However, users gets some more control about
> > >> topic parameters like number of partitions (we should discuss what
> other
> > >> configs would be useful).
> > >>
> > >>
> > >> Does this make sense?
> > >>
> > >>
> > >> -Matthias
> > >>
> > >>
> > >> On 11/5/17 1:21 AM, Jan Filipiak wrote:
> > >>
> > >>
> > >> Hi.
> > >>
> > >>
> > >> Im not 100 % up to date what version 1.0 DSL looks like ATM.
> > >> I just would argue that repartitioning should be an own API call like
> > >> through or something.
> > >> One can use through or to already to get this. I would argue one
> should
> > >> look there instead of 

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-06 Thread Jeyhun Karimov
Hi,

Sorry for the late reply. I am convinced that we should enlarge the score
of through() (add more overloads) instead of introducing a separate set of
overloads to other methods.
I will update the KIP soon based on the discussion and inform.


Cheers,
Jeyhun

On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak 
wrote:

> Sorry for not beeing 100% up to date.
> Back then we had the discussion that when an operation puts a >Sink<
> into the topology, a >Produced<
> parameter is added. This produced parameter could have internal or
> external. If internal I think the name would still make
> a great suffix for the topic name
>
> Is this plan still around? Otherwise having the name as suffix is
> probably always good it can help the user quicker to identify hot topics
> that need more
> partitions if he has many of these internal repartitions
>
> Best Jan
>
>
> On 06.11.2017 20:13, Matthias J. Sax wrote:
> > I absolute agree with what you say. It's not a requirement to specify a
> > topic name -- and this was the idea -- if user does specify a name, we
> > treat as is -- if users does not specify a name, Streams create an
> > internal topic.
> >
> > The goal of the Jira is to allow a simplified way to control
> > repartitioning (atm, user needs to manually create a topic and use via
> > through()).
> >
> > Thus, the idea is to make the topic name parameter of through optional.
> >
> > It's of course just an idea. Happy do have a other API design. The goal
> > was, to avoid to many new overloads.
> >
> >>> Could you clarify exactly what you mean by keeping the current
> distinction?
> > Current distinction is: user topics are created manually and user
> > specifies the name -- internal topics are created by Kafka Streams and
> > an name is generated automatically.
> >
> > -> through("user-topic")
> > -> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates an
> > internal topic
> >
> >
> > -Matthias
> >
> >
> > On 11/6/17 6:56 PM, Thomas Becker wrote:
> >> Could you clarify exactly what you mean by keeping the current
> distinction?
> >>
> >> Actually, re-reading the KIP and JIRA, it's not clear that being able
> to specify a custom name is actually a requirement. If the goal is to
> control repartitioning and tune parallelism, maybe we can just sidestep
> this issue altogether by removing the ability to set a different name.
> >>
> >> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote:
> >>
> >> That's a good point. In current design, we strictly distinguish both.
> >> For example, the reset tools deletes internal topics (starting with
> >> prefix `-` and ending with either `-repartition` or
> >> `-changelog`.
> >>
> >> Thus, from my point of view, it would make sense to keep the current
> >> distinction.
> >>
> >> -Matthias
> >>
> >> On 11/6/17 4:45 PM, Thomas Becker wrote:
> >>
> >>
> >> I think this sounds good as well. It's worth clarifying whether topics
> that are named by the user but created by streams are considered "internal"
> topics also.
> >>
> >> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:
> >>
> >> My idea was, to relax the requirement for through() that a topic must be
> >> created manually before startup.
> >>
> >> Thus, if no through() call is made, a (internal) topic is created the
> >> same way we do it currently.
> >>
> >> If one uses `through(String topicName)` we keep the current behavior and
> >> require users to create the topic manually.
> >>
> >> The reasoning is as follows: if a user creates a topic manually, a user
> >> can just use it for repartitioning. As the topic is already there, there
> >> is no need to specify any topic configs.
> >>
> >> We add a new `through()` overload (details TBD) that allows to specify
> >> topic configs and Streams create the topic with those configs.
> >>
> >> Reasoning: user don't want to manage topic manually, thus, it's still an
> >> internal topic and Streams create the topic name automatically as for
> >> all other internal topics. However, users gets some more control about
> >> topic parameters like number of partitions (we should discuss what other
> >> configs would be useful).
> >>
> >>
> >> Does this make sense?
> >>
> >>
> >> -Matthias
> >>
> >>
> >> On 11/5/17 1:21 AM, Jan Filipiak wrote:
> >>
> >>
> >> Hi.
> >>
> >>
> >> Im not 100 % up to date what version 1.0 DSL looks like ATM.
> >> I just would argue that repartitioning should be an own API call like
> >> through or something.
> >> One can use through or to already to get this. I would argue one should
> >> look there instead of overloads
> >>
> >> Best Jan
> >>
> >> On 04.11.2017 16:01, Jeyhun Karimov wrote:
> >>
> >>
> >> Dear community,
> >>
> >> I would like to initiate discussion on KIP-221 [1] based on issue [2].
> >> Please feel free to comment.
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Repartition+Topic+Hints+in+Streams
> >>
> >> [2] https://issues.apache.org/jira/browse/KAFKA-6037

Re: [DISCUSS] KIP-217: Expose a timeout to allow an expired ZK session to be re-created

2017-11-06 Thread Jun Rao
Ok. Based on the discussion, it seems that doing infinite re-creation is
better. I will cancel the KIP.

Thanks,

Jun

On Thu, Nov 2, 2017 at 6:14 PM, Jeff Widman  wrote:

> +1 for permanent retry under the covers (without an exposed/later
> deprecated config).
>
> That said, I understand the reality that sometimes we have to workaround an
> unfixed issue in another project, so if you think best to expose a config,
> then I have no objections. Mainly I wanted to make sure you'd tried to get
> upstream to fix as that is almost always a cleaner solution.
>
> > The above fact implies some reluctance from the zookeeper community to
> fully
> solve the issue (maybe due to technical issues).
>
> @Ted - I spent some time a few months ago poking through issues on the ZK
> issue tracker, and it looked like there wasn't much activity on the project
> lately. So my guess is that it's less about problems with this particular
> solution, and more that the solution has just enough moving parts that no
> one with commit rights has had the time to review it. As a volunteer
> maintainer on a number of projects, I certainly empathize with them,
> although it would be nice to get some more committers onto the Zookeeper
> project who have the time to review some of these semi-abandoned PRs and
> either accept or reject them.
>
>
>
> On Thu, Nov 2, 2017 at 3:00 PM, Ted Yu  wrote:
>
> > Stephane:
> > bq. hasn't acted in over a year
> >
> > The above fact implies some reluctance from the zookeeper community to
> > fully solve the issue (maybe due to technical issues).
> > Anyway, we should plan on not relying on the fix to go through in the
> near
> > future.
> >
> > As for Jun's latest suggestion, I think we should add periodic logging
> > indicating the retry.
> >
> > A KIP is not needed if we go that route.
> >
> > Cheers
> >
> > On Thu, Nov 2, 2017 at 2:54 PM, Stephane Maarek <
> > steph...@simplemachines.com.au> wrote:
> >
> > > Hi Jun
> > >
> > > I think this is a better option. Would that change require a kip then
> as
> > > it's not a change in public API ?
> > >
> > > @ted it was marked as a blocked for 3.4.11 but they pushed it. It seems
> > > that the owner of the pr hasn't acted in over a year and I think
> someone
> > > needs to take ownership of that. Additionally, this would be a change
> in
> > > Kafka zookeeper client dependency, so no need to update your zookeeper
> > > quorum to benefit from the change
> > >
> > > Thanks
> > > Stéphane
> > >
> > >
> > > On 3 Nov. 2017 8:45 am, "Jun Rao"  wrote:
> > >
> > > Stephane, Jeff,
> > >
> > > Another option is to not expose the reconnect timeout config and just
> > retry
> > > the creation of Zookeeper forever. This is an improvement from the
> > current
> > > situation and if zookeeper-2184 is fixed in the future, we don't need
> to
> > > deprecate the config.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, Nov 2, 2017 at 9:02 AM, Ted Yu  wrote:
> > >
> > > > ZOOKEEPER-2184 is scheduled for 3.4.12 whose release is unknown.
> > > >
> > > > I think adding the session recreation on Kafka side should benefit
> > Kafka
> > > > users, especially those who don't plan to move to 3.4.12+ in the near
> > > > future.
> > > >
> > > > On Wed, Nov 1, 2017 at 6:34 PM, Jun Rao  wrote:
> > > >
> > > > > Hi, Stephane,
> > > > >
> > > > > 3) The difference is that currently, there is no retry when
> > re-creating
> > > > the
> > > > > Zookeeper object when a ZK session expires. So, if the re-creation
> of
> > > > > Zookeeper fails, the broker just logs the error and the Zookeeper
> > > object
> > > > > will never be created again. With this KIP, we will keep retrying
> the
> > > > > creation of Zookeeper until success.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Tue, Oct 31, 2017 at 3:28 PM, Stephane Maarek <
> > > > > steph...@simplemachines.com.au> wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 1) The reason I'm asking about it is I wonder if it's not worth
> > > > focusing
> > > > > > the development efforts on taking ownership of the existing PR (
> > > > > > https://github.com/apache/zookeeper/pull/150)  to fix
> > > ZOOKEEPER-2184,
> > > > > > rebase it and have it merged into the ZK codebase shortly.  I
> feel
> > > this
> > > > > KIP
> > > > > > might introduce a setting that could be deprecated shortly and
> > > confuse
> > > > > the
> > > > > > end user a bit further with one more knob to turn.
> > > > > >
> > > > > > 3) I'm not sure if I fully understand, sorry for the beginner's
> > > > question:
> > > > > > if the default timeout is infinite, then it won't change anything
> > to
> > > > how
> > > > > > Kafka works from today, does it? (unless I'm missing something
> > > sorry).
> > > > If
> > > > > > not set to infinite, then we introduce the risk of a whole
> cluster
> > > > 

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Jeyhun Karimov
Hi Jan,

Sorry for late reply.


The API Design doesn't look appealing


In terms of API design we tried to preserve the java functional interfaces.
We applied the same set of rich methods for KTable to make it compatible
with the rest of overloaded APIs.

It should be 100% sufficient to offer a KTable + KStream that is directly
> feed from a topic with 1 additional overload for the #map() methods to
> cover every usecase while keeping the API in a way better state.


- IMO this seems a workaround, rather than a direct solution.

Perhaps we should continue this discussion in DISCUSS thread.


Cheers,
Jeyhun


On Mon, Nov 6, 2017 at 9:14 PM Jan Filipiak 
wrote:

> Hi.
>
> I do understand that it might come in Handy.
>  From my POV in any relational algebra this is only a projection.
> Currently we hide these "fields" that come with the input record.
> It should be 100% sufficient to offer a KTable + KStream that is directly
> feed from a topic with 1 additional overload for the #map() methods to
> cover every usecase while keeping the API in a way better state.
>
> best Jan
>
> On 06.11.2017 17:52, Matthias J. Sax wrote:
> > Jan,
> >
> > I understand what you are saying. However, having a RecordContext is
> > super useful for operations that are applied to input topic. Many users
> > requested this feature -- it's much more convenient that falling back to
> > transform() to implement a a filter() for example that want to access
> > some meta data.
> >
> > Because we cannot distinguish different "origins" of a KStream/KTable, I
> > am not sure if there would be a better way to do this. The only
> > "workaround" I see, is to have two KStream/KTable interfaces each and we
> > would use the first one for KStream/KTable with "proper" RecordContext.
> > But this does not seem to be a good solution either.
> >
> > Note, a KTable can also be read directly from a topic, I agree that
> > using RecordContext on a KTable that is the result of an aggregation is
> > questionable. But I don't see a reason to down vote the KIP for this
> reason.
> >
> > WDYT about this?
> >
> >
> > -Matthias
> >
> > On 11/1/17 10:19 PM, Jan Filipiak wrote:
> >> -1 non binding
> >>
> >> I don't get the motivation.
> >> In 80% of my DSL processors there is no such thing as a reasonable
> >> RecordContext.
> >> After a join  the record I am processing belongs to at least 2 topics.
> >> After a Group by the record I am processing was created from multiple
> >> offsets.
> >>
> >> The API Design doesn't look appealing
> >>
> >> Best Jan
> >>
> >>
> >>
> >> On 01.11.2017 22:02, Jeyhun Karimov wrote:
> >>> Dear community,
> >>>
> >>> It seems the discussion for KIP-159 [1] converged finally. I would
> >>> like to
> >>> initiate voting for the particular KIP.
> >>>
> >>>
> >>>
> >>> [1]
> >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 159%3A+Introducing+Rich+functions+to+Streams
> >>>
> >>>
> >>> Cheers,
> >>> Jeyhun
> >>>
>
>


[jira] [Created] (KAFKA-6178) Broker is listed as only ISR for all partitions it is leader of

2017-11-06 Thread AS (JIRA)
AS created KAFKA-6178:
-

 Summary: Broker is listed as only ISR for all partitions it is 
leader of
 Key: KAFKA-6178
 URL: https://issues.apache.org/jira/browse/KAFKA-6178
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.1.0
 Environment: Windows
Reporter: AS
 Attachments: KafkaServiceOutput.txt, log-cleaner.log, server.log

We're running a 15 broker cluster on windows machines, and one of the brokers, 
10, is the only ISR on all partitions that it is the leader of. On partitions 
where it isn't the leader, it seems to follow the leadeer fine. This is an 
excerpt from 'describe':

{{Topic: ClientQosCombined  Partition: 458  Leader: 10  Replicas: 
10,6,7,8,9,0,1   Isr: 10
Topic: ClientQosCombined  Partition: 459  Leader: 11  Replicas: 
11,7,8,9,0,1,10 Isr: 0,10,1,9,7,11,8}}

The server.log files all seem to be pretty standard, and the only indication of 
this issue is the following pattern that often repeats:

{{2017-11-06 20:28:25,207 [INFO] kafka.cluster.Partition 
[kafka-request-handler-8:] - Partition [ClientQosCombined,398] on broker 10: 
Expanding ISR for partition [ClientQosCombined,398] from 10 to 5,10
2017-11-06 20:28:39,382 [INFO] kafka.cluster.Partition [kafka-scheduler-1:] - 
Partition [ClientQosCombined,398] on broker 10: Shrinking ISR for partition 
[ClientQosCombined,398] from 5,10 to 10}}

For each of the partitions that 10 leads. This is the only topic that we 
currently have in our cluster. The __consumer_offsets topic seems completely 
normal in terms of isr counts. The controller is broker 5, which is cycling 
through attempting and failing to trigger leader elections on broker 10 led 
partitions. From the controller log in broker 5:

{{2017-11-06 20:45:04,857 [INFO] kafka.controller.KafkaController 
[kafka-scheduler-0:] - [Controller 5]: Starting preferred replica leader 
election for partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] kafka.controller.PartitionStateMachine 
[kafka-scheduler-0:] - [Partition state machine on Controller 5]: Invoking 
state change to OnlinePartition for partitions [ClientQosCombined,375]
2017-11-06 20:45:04,857 [INFO] 
kafka.controller.PreferredReplicaPartitionLeaderSelector [kafka-scheduler-0:] - 
[PreferredReplicaPartitionLeaderSelector]: Current leader 10 for partition 
[ClientQosCombined,375] is not the preferred replica. Trigerring preferred 
replica leader election
2017-11-06 20:45:04,857 [WARN] kafka.controller.KafkaController 
[kafka-scheduler-0:] - [Controller 5]: Partition [ClientQosCombined,375] failed 
to complete preferred replica leader election. Leader is 10}}

I've also attached the logs and output from broker 10. Any idea what's wrong 
here? 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Jenkins build is back to normal : kafka-trunk-jdk7 #2949

2017-11-06 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-06 Thread Jan Filipiak

I created an example Table in the WIKI page
Can you quickly check if that would be a good format?
I tried todo it ~like the unit tests but with the information of what 
state is there _AFTER_

processing happend.
I make the first 2 columns exclusive even though the in fact run in 
parallel but the joining

task serializes the effects.

Best Jan

On 06.11.2017 21:20, Jan Filipiak wrote:
Will do! Need to do it carefully. One mistake in this detailed 
approach and confusion is perfect ;)

Hope I can deliver this week.

Best Jan

On 06.11.2017 17:21, Matthias J. Sax wrote:

Jan,

thanks a lot for this KIP. I did an initial pass over it, but feel a
little lost. Maybe I need to read it more carefully, but atm it's not
clear to me at all what algorithm you propose.

I think it would be super helpful, to do an example with concrete data
that show how records are stored, what the different value mappers
extract, and what is written into repartitioning topics.



-Matthias


On 11/5/17 2:09 AM, Jan Filipiak wrote:

Hi Gouzhang

I hope the wikipage looks better now. made a little more effort into 
the

diagram. Still not ideal but I think it serves its purpose.



On 02.11.2017 01:17, Guozhang Wang wrote:

Thanks for the KIP writeup Jan. I made a first pass and here are some
quick
comments:


1. Could we use K0 / V0 and K1 / V1 etc since K0 and KO are a bit
harder to
differentiate when reading.

2. I think you missed the key type in the intrusive approach 
example code

snippet regarding "KTable  oneToManyJoin"? Should that be

KTable, V0> oneToManyJoin

3. Some of the arrows in your algorithm section's diagrams seems
reversed.

4. In the first step of the algorithm, "Materialize B first", that
happens
in the "Repartition by A's key" block right? If yes, could you 
clarify it

in the block?

5. "skip old if A's key didn't change": hmm, not sure if we can 
skip it.

What if other fields (neither A's key or B's key) changes? Suppose you
have
an aggregation after the join, we still need to subtract the old value
from
the aggregation right?

6. In the block of "Materialize B", I think from your description 
we are

actually materializing both A and B right? If yes could you update the
diagram?

7. This is a meta question: "in the sink, only use A's key to 
determine

partition" I think we had the discussion long time ago, that if we are
sending the old and new entries of the pair to different partitions,
their
ordering may get reversed later when reading from the join operator 
(i.e.
the "Materialize B" block in your diagram). How did you address 
that with

this proposal?

8. "B records with a 'null' A-key value would be silently dropped" 
Where

are we dropping it, do we drop it at the first sub-topology (i.e the
"Repartition by A's key" block)?

Guozhang





On Wed, Nov 1, 2017 at 12:18 PM, Jan Filipiak 


wrote:


Hi thanks for the feedback

On 01.11.2017 12:58, Damian Guy wrote:


Hi Jan, Thanks for the KIP!

In both alternatives the API will need to use the `Joined` class 
rather
than than passing in `Serde`s. Also, as with all other joins etc, 
there

probably should be an overload that doesn't require any `Serdes`.

Will check again how current API looks. I remember loosing the 
argument

with this IQ overloads things.
Didn't expect something to have happend already so I just copied from
the
PR. Will update.
Will also add the overload.

It isn't clear to me what `joinPrefixFaker` is doing? In the 
comment it

says "returning an outputKey that when serialized only produces a
prefix
of
the output key which is the same serializing K" So why not just use
"K" ?


The faker in fact returns K wich can be serialized by the Key Serde
in the
rocks. But it needs to only contain A's key and it needs to be a 
strict

prefix
byte[] of all K with this A's key. We gonna seek there with an
RocksIterator and continue to read as long as the "faked key" 
serialized

form is a prefix
This is easy todo for Avro + Protobuf +  custom Serdes and Hadoop
Writables. Its a nightmare for JSON serdes.




Thanks,
Damian


On Fri, 27 Oct 2017 at 10:27 Ted Yu  wrote:

I think if you explain what A and B are in the beginning, it makes
sense

to
use them since readers would know who they reference.

Cheers

On Thu, Oct 26, 2017 at 11:04 PM, Jan Filipiak


Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-06 Thread Jan Filipiak
Will do! Need to do it carefully. One mistake in this detailed approach 
and confusion is perfect ;)

Hope I can deliver this week.

Best Jan

On 06.11.2017 17:21, Matthias J. Sax wrote:

Jan,

thanks a lot for this KIP. I did an initial pass over it, but feel a
little lost. Maybe I need to read it more carefully, but atm it's not
clear to me at all what algorithm you propose.

I think it would be super helpful, to do an example with concrete data
that show how records are stored, what the different value mappers
extract, and what is written into repartitioning topics.



-Matthias


On 11/5/17 2:09 AM, Jan Filipiak wrote:

Hi Gouzhang

I hope the wikipage looks better now. made a little more effort into the
diagram. Still not ideal but I think it serves its purpose.



On 02.11.2017 01:17, Guozhang Wang wrote:

Thanks for the KIP writeup Jan. I made a first pass and here are some
quick
comments:


1. Could we use K0 / V0 and K1 / V1 etc since K0 and KO are a bit
harder to
differentiate when reading.

2. I think you missed the key type in the intrusive approach example code
snippet regarding "KTable  oneToManyJoin"? Should that be

KTable, V0> oneToManyJoin

3. Some of the arrows in your algorithm section's diagrams seems
reversed.

4. In the first step of the algorithm, "Materialize B first", that
happens
in the "Repartition by A's key" block right? If yes, could you clarify it
in the block?

5. "skip old if A's key didn't change": hmm, not sure if we can skip it.
What if other fields (neither A's key or B's key) changes? Suppose you
have
an aggregation after the join, we still need to subtract the old value
from
the aggregation right?

6. In the block of "Materialize B", I think from your description we are
actually materializing both A and B right? If yes could you update the
diagram?

7. This is a meta question: "in the sink, only use A's key to determine
partition" I think we had the discussion long time ago, that if we are
sending the old and new entries of the pair to different partitions,
their
ordering may get reversed later when reading from the join operator (i.e.
the "Materialize B" block in your diagram). How did you address that with
this proposal?

8. "B records with a 'null' A-key value would be silently dropped" Where
are we dropping it, do we drop it at the first sub-topology (i.e the
"Repartition by A's key" block)?

Guozhang





On Wed, Nov 1, 2017 at 12:18 PM, Jan Filipiak 
wrote:


Hi thanks for the feedback

On 01.11.2017 12:58, Damian Guy wrote:


Hi Jan, Thanks for the KIP!

In both alternatives the API will need to use the `Joined` class rather
than than passing in `Serde`s. Also, as with all other joins etc, there
probably should be an overload that doesn't require any `Serdes`.


Will check again how current API looks. I remember loosing the argument
with this IQ overloads things.
Didn't expect something to have happend already so I just copied from
the
PR. Will update.
Will also add the overload.


It isn't clear to me what `joinPrefixFaker` is doing? In the comment it
says "returning an outputKey that when serialized only produces a
prefix
of
the output key which is the same serializing K" So why not just use
"K" ?


The faker in fact returns K wich can be serialized by the Key Serde
in the
rocks. But it needs to only contain A's key and it needs to be a strict
prefix
byte[] of all K with this A's key. We gonna seek there with an
RocksIterator and continue to read as long as the "faked key" serialized
form is a prefix
This is easy todo for Avro + Protobuf +  custom Serdes and Hadoop
Writables. Its a nightmare for JSON serdes.




Thanks,
Damian


On Fri, 27 Oct 2017 at 10:27 Ted Yu  wrote:

I think if you explain what A and B are in the beginning, it makes
sense

to
use them since readers would know who they reference.

Cheers

On Thu, Oct 26, 2017 at 11:04 PM, Jan Filipiak


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-06 Thread Jan Filipiak

Sorry for not beeing 100% up to date.
Back then we had the discussion that when an operation puts a >Sink< 
into the topology, a >Produced<
parameter is added. This produced parameter could have internal or 
external. If internal I think the name would still make

a great suffix for the topic name

Is this plan still around? Otherwise having the name as suffix is 
probably always good it can help the user quicker to identify hot topics 
that need more

partitions if he has many of these internal repartitions

Best Jan


On 06.11.2017 20:13, Matthias J. Sax wrote:

I absolute agree with what you say. It's not a requirement to specify a
topic name -- and this was the idea -- if user does specify a name, we
treat as is -- if users does not specify a name, Streams create an
internal topic.

The goal of the Jira is to allow a simplified way to control
repartitioning (atm, user needs to manually create a topic and use via
through()).

Thus, the idea is to make the topic name parameter of through optional.

It's of course just an idea. Happy do have a other API design. The goal
was, to avoid to many new overloads.


Could you clarify exactly what you mean by keeping the current distinction?

Current distinction is: user topics are created manually and user
specifies the name -- internal topics are created by Kafka Streams and
an name is generated automatically.

-> through("user-topic")
-> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates an
internal topic


-Matthias


On 11/6/17 6:56 PM, Thomas Becker wrote:

Could you clarify exactly what you mean by keeping the current distinction?

Actually, re-reading the KIP and JIRA, it's not clear that being able to 
specify a custom name is actually a requirement. If the goal is to control 
repartitioning and tune parallelism, maybe we can just sidestep this issue 
altogether by removing the ability to set a different name.

On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote:

That's a good point. In current design, we strictly distinguish both.
For example, the reset tools deletes internal topics (starting with
prefix `-` and ending with either `-repartition` or
`-changelog`.

Thus, from my point of view, it would make sense to keep the current
distinction.

-Matthias

On 11/6/17 4:45 PM, Thomas Becker wrote:


I think this sounds good as well. It's worth clarifying whether topics that are named by 
the user but created by streams are considered "internal" topics also.

On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:

My idea was, to relax the requirement for through() that a topic must be
created manually before startup.

Thus, if no through() call is made, a (internal) topic is created the
same way we do it currently.

If one uses `through(String topicName)` we keep the current behavior and
require users to create the topic manually.

The reasoning is as follows: if a user creates a topic manually, a user
can just use it for repartitioning. As the topic is already there, there
is no need to specify any topic configs.

We add a new `through()` overload (details TBD) that allows to specify
topic configs and Streams create the topic with those configs.

Reasoning: user don't want to manage topic manually, thus, it's still an
internal topic and Streams create the topic name automatically as for
all other internal topics. However, users gets some more control about
topic parameters like number of partitions (we should discuss what other
configs would be useful).


Does this make sense?


-Matthias


On 11/5/17 1:21 AM, Jan Filipiak wrote:


Hi.


Im not 100 % up to date what version 1.0 DSL looks like ATM.
I just would argue that repartitioning should be an own API call like
through or something.
One can use through or to already to get this. I would argue one should
look there instead of overloads

Best Jan

On 04.11.2017 16:01, Jeyhun Karimov wrote:


Dear community,

I would like to initiate discussion on KIP-221 [1] based on issue [2].
Please feel free to comment.

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Repartition+Topic+Hints+in+Streams

[2] https://issues.apache.org/jira/browse/KAFKA-6037



Cheers,
Jeyhun











This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.








This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email 

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Jan Filipiak

Hi.

I do understand that it might come in Handy.
From my POV in any relational algebra this is only a projection.
Currently we hide these "fields" that come with the input record.
It should be 100% sufficient to offer a KTable + KStream that is directly
feed from a topic with 1 additional overload for the #map() methods to
cover every usecase while keeping the API in a way better state.

best Jan

On 06.11.2017 17:52, Matthias J. Sax wrote:

Jan,

I understand what you are saying. However, having a RecordContext is
super useful for operations that are applied to input topic. Many users
requested this feature -- it's much more convenient that falling back to
transform() to implement a a filter() for example that want to access
some meta data.

Because we cannot distinguish different "origins" of a KStream/KTable, I
am not sure if there would be a better way to do this. The only
"workaround" I see, is to have two KStream/KTable interfaces each and we
would use the first one for KStream/KTable with "proper" RecordContext.
But this does not seem to be a good solution either.

Note, a KTable can also be read directly from a topic, I agree that
using RecordContext on a KTable that is the result of an aggregation is
questionable. But I don't see a reason to down vote the KIP for this reason.

WDYT about this?


-Matthias

On 11/1/17 10:19 PM, Jan Filipiak wrote:

-1 non binding

I don't get the motivation.
In 80% of my DSL processors there is no such thing as a reasonable
RecordContext.
After a join  the record I am processing belongs to at least 2 topics.
After a Group by the record I am processing was created from multiple
offsets.

The API Design doesn't look appealing

Best Jan



On 01.11.2017 22:02, Jeyhun Karimov wrote:

Dear community,

It seems the discussion for KIP-159 [1] converged finally. I would
like to
initiate voting for the particular KIP.



[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-159%3A+Introducing+Rich+functions+to+Streams


Cheers,
Jeyhun





Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Dong Lin
Congratulations Onur!

On Mon, Nov 6, 2017 at 9:24 AM, Jun Rao  wrote:

> Hi, everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> Karaman.
>
> Onur's most significant work is the improvement of Kafka controller, which
> is the brain of a Kafka cluster. Over time, we have accumulated quite a few
> correctness and performance issues in the controller. There have been
> attempts to fix controller issues in isolation, which would make the code
> base more complicated without a clear path of solving all problems. Onur is
> the one who took a holistic approach, by first documenting all known
> issues, writing down a new design, coming up with a plan to deliver the
> changes in phases and executing on it. At this point, Onur has completed
> the two most important phases: making the controller single threaded and
> changing the controller to use the async ZK api. The former fixed multiple
> deadlocks and race conditions. The latter significantly improved the
> performance when there are many partitions. Experimental results show that
> Onur's work reduced the controlled shutdown time by a factor of 100 times
> and the controller failover time by a factor of 3 times.
>
> Congratulations, Onur!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)
>


Build failed in Jenkins: kafka-trunk-jdk8 #2190

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] KAFKA-6156; Metric tag values with colons must be sanitized

[becket.qin] KAFKA-6172; Cache lastEntry in TimeIndex to avoid unnecessary disk

--
[...truncated 1.79 MB...]

org.apache.kafka.streams.KafkaStreamsTest > 
shouldThrowExceptionSettingUncaughtExceptionHandlerNotInCreateState STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldThrowExceptionSettingUncaughtExceptionHandlerNotInCreateState PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCleanup PASSED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldThrowExceptionSettingStateListenerNotInCreateState STARTED

org.apache.kafka.streams.KafkaStreamsTest > 
shouldThrowExceptionSettingStateListenerNotInCreateState PASSED

org.apache.kafka.streams.KafkaStreamsTest > testNumberDefaultMetrics STARTED

org.apache.kafka.streams.KafkaStreamsTest > testNumberDefaultMetrics PASSED

org.apache.kafka.streams.KafkaStreamsTest > shouldReturnThreadMetadata STARTED

org.apache.kafka.streams.KafkaStreamsTest > shouldReturnThreadMetadata PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCloseIsIdempotent PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning 
STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotCleanupWhileRunning PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStateThreadClose STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStateThreadClose PASSED

org.apache.kafka.streams.KafkaStreamsTest > testStateChanges STARTED

org.apache.kafka.streams.KafkaStreamsTest > testStateChanges PASSED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice STARTED

org.apache.kafka.streams.KafkaStreamsTest > testCannotStartTwice PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfProducerEnableIdempotenceIsOverriddenIfEosEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfProducerEnableIdempotenceIsOverriddenIfEosEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldUseNewConfigsWhenPresent 
STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldUseNewConfigsWhenPresent 
PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldUseCorrectDefaultsWhenNoneSpecified STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldUseCorrectDefaultsWhenNoneSpecified PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
STARTED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetDifferentDefaultsIfEosEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetDifferentDefaultsIfEosEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigRetriesIfExactlyOnceEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigRetriesIfExactlyOnceEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerMaxInFlightRequestPerConnectionsWhenEosDisabled 
STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerMaxInFlightRequestPerConnectionsWhenEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfValueSerdeConfigFails STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldThrowStreamsExceptionIfValueSerdeConfigFails PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerAutoCommitIsOverridden STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldResetToDefaultIfConsumerAutoCommitIsOverridden PASSED

org.apache.kafka.streams.StreamsConfigTest > 

Re: [DISCUSS] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2017-11-06 Thread Guozhang Wang
Hi Matthias,

You meant "list groups" I think?

Guozhang

On Mon, Nov 6, 2017 at 11:17 AM, Matthias J. Sax 
wrote:

> The main goal of this KIP is to enable decoupling StreamsResetter from
> core module. For this case (ie, using AdminClient within
> StreamsResetter) we get the group.id from the user as command line
> argument. Thus, I think the KIP is useful without "describe group"
> command to.
>
> I am happy to include "describe group" command in the KIP. Just want to
> point out, that there is no reason to insist on it IMHO.
>
>
> -Matthias
>
> On 11/6/17 7:06 PM, Guozhang Wang wrote:
> > A quick question: I think we do not yet have the `list consumer groups`
> > func as in the old AdminClient. Without this `describe group` given the
> > group id would not be very useful. Could you include this as well in your
> > KIP? More specifically, you can look at kafka.admin.AdminClientfor more
> > details on the APIs.
> >
> >
> > Guozhang
> >
> > On Mon, Nov 6, 2017 at 7:22 AM, Ted Yu  wrote:
> >
> >> Please fill out Discussion thread and JIRA fields.
> >>
> >> Thanks
> >>
> >> On Mon, Nov 6, 2017 at 2:02 AM, Tom Bentley 
> wrote:
> >>
> >>> Hi Jorge,
> >>>
> >>> Thanks for the KIP. A few initial comments:
> >>>
> >>> 1. The AdminClient doesn't have any API like `listConsumerGroups()`
> >>> currently, so in general how does a client know the group ids it is
> >>> interested in?
> >>> 2. Could you fill in the API of DescribeConsumerGroupResult, just so
> >>> everyone knows exactly what being proposed.
> >>> 3. Can you describe the ConsumerGroupDescription class?
> >>> 4. Probably worth mentioning that this will use
> >>> DescribeGroupsRequest/Response, and also enumerating the error codes
> >> that
> >>> can return (or, equivalently, enumerate the exceptions throw from the
> >>> futures obtained from the DescribeConsumerGroupResult).
> >>>
> >>> Cheers,
> >>>
> >>> Tom
> >>>
> >>> On 6 November 2017 at 08:19, Jorge Esteban Quilcate Otoya <
> >>> quilcate.jo...@gmail.com> wrote:
> >>>
>  Hi everyone,
> 
>  I would like to start a discussion on KIP-222 [1] based on issue [2].
> 
>  Looking forward to feedback.
> 
>  [1]
>  https://cwiki.apache.org/confluence/pages/viewpage.
> >>> action?pageId=74686265
>  [2] https://issues.apache.org/jira/browse/KAFKA-6058
> 
>  Cheers,
>  Jorge.
> 
> >>>
> >>
> >
> >
> >
>
>


-- 
-- Guozhang


Re: [DISCUSS] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2017-11-06 Thread Colin McCabe
Hi Jorge,

This looks like a great improvement.

I will echo the question that others had about listGroups.  Can we just
add it to this KIP?  It's weird to be able to describe groups, but not
list them.  In the past, for topics and ACLs, we added describing and
listing functionality together.

In the KIP you write:

> This API returns a future object whose result will be available within
> RequestTimeoutMs, which is configured when user constructs the AdminClient.

To be consistent with the other APIs, we should have two functions,
right?  Something like this:

> KafkaFuture future(String group);
> KafkaFuture> all();

The rationale behind having two functions is that the future returned by
"all" will throw an exception if there any problem whatsoever.  So, for
example, if you ask about 10 groups and one of them is missing, you get
no result, just an exception.  Whereas the individual futures will have
9 futures that have a useful result, and one that throws an exception.

best,
Colin


On Mon, Nov 6, 2017, at 11:17, Matthias J. Sax wrote:
> The main goal of this KIP is to enable decoupling StreamsResetter from
> core module. For this case (ie, using AdminClient within
> StreamsResetter) we get the group.id from the user as command line
> argument. Thus, I think the KIP is useful without "describe group"
> command to.
> 
> I am happy to include "describe group" command in the KIP. Just want to
> point out, that there is no reason to insist on it IMHO.
> 
> 
> -Matthias
> 
> On 11/6/17 7:06 PM, Guozhang Wang wrote:
> > A quick question: I think we do not yet have the `list consumer groups`
> > func as in the old AdminClient. Without this `describe group` given the
> > group id would not be very useful. Could you include this as well in your
> > KIP? More specifically, you can look at kafka.admin.AdminClientfor more
> > details on the APIs.
> > 
> > 
> > Guozhang
> > 
> > On Mon, Nov 6, 2017 at 7:22 AM, Ted Yu  wrote:
> > 
> >> Please fill out Discussion thread and JIRA fields.
> >>
> >> Thanks
> >>
> >> On Mon, Nov 6, 2017 at 2:02 AM, Tom Bentley  wrote:
> >>
> >>> Hi Jorge,
> >>>
> >>> Thanks for the KIP. A few initial comments:
> >>>
> >>> 1. The AdminClient doesn't have any API like `listConsumerGroups()`
> >>> currently, so in general how does a client know the group ids it is
> >>> interested in?
> >>> 2. Could you fill in the API of DescribeConsumerGroupResult, just so
> >>> everyone knows exactly what being proposed.
> >>> 3. Can you describe the ConsumerGroupDescription class?
> >>> 4. Probably worth mentioning that this will use
> >>> DescribeGroupsRequest/Response, and also enumerating the error codes
> >> that
> >>> can return (or, equivalently, enumerate the exceptions throw from the
> >>> futures obtained from the DescribeConsumerGroupResult).
> >>>
> >>> Cheers,
> >>>
> >>> Tom
> >>>
> >>> On 6 November 2017 at 08:19, Jorge Esteban Quilcate Otoya <
> >>> quilcate.jo...@gmail.com> wrote:
> >>>
>  Hi everyone,
> 
>  I would like to start a discussion on KIP-222 [1] based on issue [2].
> 
>  Looking forward to feedback.
> 
>  [1]
>  https://cwiki.apache.org/confluence/pages/viewpage.
> >>> action?pageId=74686265
>  [2] https://issues.apache.org/jira/browse/KAFKA-6058
> 
>  Cheers,
>  Jorge.
> 
> >>>
> >>
> > 
> > 
> > 
> 
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)


Build failed in Jenkins: kafka-trunk-jdk9 #172

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-6115: TaskManager should be type aware

[wangguoz] KAFKA-6120: RecordCollector should not retry sending

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-us1 (ubuntu trusty) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 2b5a21395cf8ce6e3e29a9a778bc20f727ec35fd 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2b5a21395cf8ce6e3e29a9a778bc20f727ec35fd
Commit message: "KAFKA-6120: RecordCollector should not retry sending"
 > git rev-list 0c895706e8ab511efe352a824a0c9e2dab62499e # timeout=10
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
[kafka-trunk-jdk9] $ /bin/bash -xe /tmp/jenkins3181605471812859315.sh
+ rm -rf 
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2/bin/gradle

FAILURE: Build failed with an exception.

* What went wrong:
Could not determine java version from '9.0.1'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
ERROR: Step ‘Publish JUnit test result report’ failed: Test reports were found 
but none of them are new. Did tests run? 
For example, 

 is 17 days old

Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
Not sending mail to unregistered user rajinisiva...@googlemail.com
Not sending mail to unregistered user wangg...@gmail.com
Not sending mail to unregistered user becket@gmail.com


Build failed in Jenkins: kafka-trunk-jdk7 #2948

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] KAFKA-6156; Metric tag values with colons must be sanitized

[becket.qin] KAFKA-6172; Cache lastEntry in TimeIndex to avoid unnecessary disk

--
[...truncated 383.45 KB...]

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testControlledShutdownPartitionLeaderElectionAllIsrSimultaneouslyShutdown PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionEnabled 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionEnabled 
PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrNotLive 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testPreferredReplicaPartitionLeaderElectionPreferredReplicaNotInIsrNotLive 
PASSED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionDisabled 
STARTED

kafka.controller.PartitionLeaderElectionAlgorithmsTest > 
testOfflinePartitionLeaderElectionLastIsrOfflineUncleanLeaderElectionDisabled 
PASSED

kafka.controller.PartitionStateMachineTest > 
testNonexistentPartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNonexistentPartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionErrorCodeFromCreateStates STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionErrorCodeFromCreateStates PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOfflineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOfflineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOfflinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNonexistentPartitionToOfflinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransitionZkUtilsExceptionFromCreateStates 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidNewPartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testNewPartitionToOnlinePartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNewPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionErrorCodeFromStateLookup PASSED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown STARTED

kafka.controller.PartitionStateMachineTest > 
testOnlinePartitionToOnlineTransitionForControlledShutdown PASSED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
STARTED

kafka.controller.PartitionStateMachineTest > 
testOfflinePartitionToOnlinePartitionTransitionZkUtilsExceptionFromStateLookup 
PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition STARTED

kafka.controller.PartitionStateMachineTest > 
testInvalidOnlinePartitionToNonexistentPartitionTransition PASSED

kafka.controller.PartitionStateMachineTest > 
testInvalidOfflinePartitionToNewPartitionTransition 

Re: [DISCUSS] KIP-218: Make KafkaFuture.Function java 8 lambda compatible

2017-11-06 Thread Colin McCabe
It would definitely be nice to use the jdk8 CompletableFuture.  I think
that's a bit of a separate discussion, though, since it has such heavy
compatibility implications.

How about making KIP-218 backwards compatible?  As a starting point, you
can change KafkaFuture#BiConsumer to an interface with no compatibility
implications, since there are currently no public functions exposed that
use it.  That leaves KafkaFuture#Function, which is publicly used now.

For the purposes of KIP-218, how about adding a new interface
FunctionInterface?  Then you can add a function like this:

>  public abstract  KafkaFuture thenApply(FunctionInterface 
> function);

And mark the older declaration as deprecated:

>  @deprecated
>  public abstract  KafkaFuture thenApply(Function function);

This is a 100% compatible way to make things nicer for java 8.

cheers,
Colin


On Thu, Nov 2, 2017, at 10:38, Steven Aerts wrote:
> Hi Tom,
> 
> Nice observation.
> I changed "Rejected Alternatives" section to "Other Alternatives", as
> I see myself as too much of an outsider to the kafka community to be
> able to decide without this discussion.
> 
> I see two major factors to decide:
>  - how soon will KIP-118 (drop support of java 7) be implemented?
>  - for which reasons do we drop backwards compatibility for public
> interfaces marked as Evolving
> 
> If KIP-118 which is scheduled for version 2.0.0 is going to be
> implemented soon, I agree with you that replacing KafkaFuture with
> CompletableFuture (or CompletionStage) is a preferable option.
> But as I am not familiar with the roadmap it is difficult to tell for me.
> 
> 
> Thanks,
> 
> 
>Steven
> 
> 
> 2017-11-02 11:27 GMT+01:00 Tom Bentley :
> > Hi Steven,
> >
> > I notice you've renamed the template's "Rejected Alternatives" section to
> > "Other Alternatives", suggesting they're not rejected yet (or, if you have
> > rejected them, I think you should give your reasons).
> >
> > Personally, I'd like to understand the arguments against simply replacing
> > KafkaFuture with CompletableFuture in Kafka 2.0. In other words, if we were
> > starting without needing to support Java 7 what would be the arguments for
> > having our own KafkaFuture?
> >
> > Thanks,
> >
> > Tom
> >
> > On 1 November 2017 at 16:01, Ted Yu  wrote:
> >
> >> KAFKA-4423 is still open.
> >> When would Java 7 be dropped ?
> >>
> >> Thanks
> >>
> >> On Wed, Nov 1, 2017 at 8:56 AM, Ismael Juma  wrote:
> >>
> >> > On Wed, Nov 1, 2017 at 3:51 PM, Ted Yu  wrote:
> >> >
> >> > > bq. Wait for a kafka release which will not support java 7 anymore
> >> > >
> >> > > Do you want to raise a separate thread for the above ?
> >> > >
> >> >
> >> > There is already a KIP for this so a separate thread is not needed.
> >> >
> >> > Ismael
> >> >
> >>


Re: [DISCUSS] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2017-11-06 Thread Matthias J. Sax
The main goal of this KIP is to enable decoupling StreamsResetter from
core module. For this case (ie, using AdminClient within
StreamsResetter) we get the group.id from the user as command line
argument. Thus, I think the KIP is useful without "describe group"
command to.

I am happy to include "describe group" command in the KIP. Just want to
point out, that there is no reason to insist on it IMHO.


-Matthias

On 11/6/17 7:06 PM, Guozhang Wang wrote:
> A quick question: I think we do not yet have the `list consumer groups`
> func as in the old AdminClient. Without this `describe group` given the
> group id would not be very useful. Could you include this as well in your
> KIP? More specifically, you can look at kafka.admin.AdminClientfor more
> details on the APIs.
> 
> 
> Guozhang
> 
> On Mon, Nov 6, 2017 at 7:22 AM, Ted Yu  wrote:
> 
>> Please fill out Discussion thread and JIRA fields.
>>
>> Thanks
>>
>> On Mon, Nov 6, 2017 at 2:02 AM, Tom Bentley  wrote:
>>
>>> Hi Jorge,
>>>
>>> Thanks for the KIP. A few initial comments:
>>>
>>> 1. The AdminClient doesn't have any API like `listConsumerGroups()`
>>> currently, so in general how does a client know the group ids it is
>>> interested in?
>>> 2. Could you fill in the API of DescribeConsumerGroupResult, just so
>>> everyone knows exactly what being proposed.
>>> 3. Can you describe the ConsumerGroupDescription class?
>>> 4. Probably worth mentioning that this will use
>>> DescribeGroupsRequest/Response, and also enumerating the error codes
>> that
>>> can return (or, equivalently, enumerate the exceptions throw from the
>>> futures obtained from the DescribeConsumerGroupResult).
>>>
>>> Cheers,
>>>
>>> Tom
>>>
>>> On 6 November 2017 at 08:19, Jorge Esteban Quilcate Otoya <
>>> quilcate.jo...@gmail.com> wrote:
>>>
 Hi everyone,

 I would like to start a discussion on KIP-222 [1] based on issue [2].

 Looking forward to feedback.

 [1]
 https://cwiki.apache.org/confluence/pages/viewpage.
>>> action?pageId=74686265
 [2] https://issues.apache.org/jira/browse/KAFKA-6058

 Cheers,
 Jorge.

>>>
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Edoardo Comar
Congratulations Onur !

--

Edoardo Comar

IBM Message Hub

IBM UK Ltd, Hursley Park, SO21 2JN



From:   Jun Rao 
To: "dev@kafka.apache.org" , 
"us...@kafka.apache.org" 
Date:   06/11/2017 17:24
Subject:[ANNOUNCE] New committer: Onur Karaman



Hi, everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
Karaman.

Onur's most significant work is the improvement of Kafka controller, which
is the brain of a Kafka cluster. Over time, we have accumulated quite a 
few
correctness and performance issues in the controller. There have been
attempts to fix controller issues in isolation, which would make the code
base more complicated without a clear path of solving all problems. Onur 
is
the one who took a holistic approach, by first documenting all known
issues, writing down a new design, coming up with a plan to deliver the
changes in phases and executing on it. At this point, Onur has completed
the two most important phases: making the controller single threaded and
changing the controller to use the async ZK api. The former fixed multiple
deadlocks and race conditions. The latter significantly improved the
performance when there are many partitions. Experimental results show that
Onur's work reduced the controlled shutdown time by a factor of 100 times
and the controller failover time by a factor of 3 times.

Congratulations, Onur!

Thanks,

Jun (on behalf of the Apache Kafka PMC)



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-06 Thread Matthias J. Sax
I absolute agree with what you say. It's not a requirement to specify a
topic name -- and this was the idea -- if user does specify a name, we
treat as is -- if users does not specify a name, Streams create an
internal topic.

The goal of the Jira is to allow a simplified way to control
repartitioning (atm, user needs to manually create a topic and use via
through()).

Thus, the idea is to make the topic name parameter of through optional.

It's of course just an idea. Happy do have a other API design. The goal
was, to avoid to many new overloads.

>> Could you clarify exactly what you mean by keeping the current distinction?

Current distinction is: user topics are created manually and user
specifies the name -- internal topics are created by Kafka Streams and
an name is generated automatically.

-> through("user-topic")
-> through(TopicConfig.withNumberOfPartitions(5)) // Streams creates an
internal topic


-Matthias


On 11/6/17 6:56 PM, Thomas Becker wrote:
> Could you clarify exactly what you mean by keeping the current distinction?
> 
> Actually, re-reading the KIP and JIRA, it's not clear that being able to 
> specify a custom name is actually a requirement. If the goal is to control 
> repartitioning and tune parallelism, maybe we can just sidestep this issue 
> altogether by removing the ability to set a different name.
> 
> On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote:
> 
> That's a good point. In current design, we strictly distinguish both.
> For example, the reset tools deletes internal topics (starting with
> prefix `-` and ending with either `-repartition` or
> `-changelog`.
> 
> Thus, from my point of view, it would make sense to keep the current
> distinction.
> 
> -Matthias
> 
> On 11/6/17 4:45 PM, Thomas Becker wrote:
> 
> 
> I think this sounds good as well. It's worth clarifying whether topics that 
> are named by the user but created by streams are considered "internal" topics 
> also.
> 
> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:
> 
> My idea was, to relax the requirement for through() that a topic must be
> created manually before startup.
> 
> Thus, if no through() call is made, a (internal) topic is created the
> same way we do it currently.
> 
> If one uses `through(String topicName)` we keep the current behavior and
> require users to create the topic manually.
> 
> The reasoning is as follows: if a user creates a topic manually, a user
> can just use it for repartitioning. As the topic is already there, there
> is no need to specify any topic configs.
> 
> We add a new `through()` overload (details TBD) that allows to specify
> topic configs and Streams create the topic with those configs.
> 
> Reasoning: user don't want to manage topic manually, thus, it's still an
> internal topic and Streams create the topic name automatically as for
> all other internal topics. However, users gets some more control about
> topic parameters like number of partitions (we should discuss what other
> configs would be useful).
> 
> 
> Does this make sense?
> 
> 
> -Matthias
> 
> 
> On 11/5/17 1:21 AM, Jan Filipiak wrote:
> 
> 
> Hi.
> 
> 
> Im not 100 % up to date what version 1.0 DSL looks like ATM.
> I just would argue that repartitioning should be an own API call like
> through or something.
> One can use through or to already to get this. I would argue one should
> look there instead of overloads
> 
> Best Jan
> 
> On 04.11.2017 16:01, Jeyhun Karimov wrote:
> 
> 
> Dear community,
> 
> I would like to initiate discussion on KIP-221 [1] based on issue [2].
> Please feel free to comment.
> 
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Repartition+Topic+Hints+in+Streams
> 
> [2] https://issues.apache.org/jira/browse/KAFKA-6037
> 
> 
> 
> Cheers,
> Jeyhun
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> This email and any attachments may contain confidential and privileged 
> material for the sole use of the intended recipient. Any review, copying, or 
> distribution of this email (or any attachments) by others is prohibited. If 
> you are not the intended recipient, please contact the sender immediately and 
> permanently delete this email and any attachments. No employee or agent of 
> TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo 
> Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed 
> written agreement.
> 
> 
> 
> 
> 
> 
> 
> 
> This email and any attachments may contain confidential and privileged 
> material for the sole use of the intended recipient. Any review, copying, or 
> distribution of this email (or any attachments) by others is prohibited. If 
> you are not the intended recipient, please contact the sender immediately and 
> permanently delete this email and any attachments. No employee or agent of 
> TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo 
> Inc. by email. Binding agreements with TiVo 

Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Mickael Maison
Congratulations Onur !

On Mon, Nov 6, 2017 at 7:01 PM, Matthias J. Sax  wrote:
> Congrats!!!
>
>
> On 11/6/17 7:56 PM, Vahid S Hashemian wrote:
>> Congrats Onur!
>>
>> --Vahid
>>
>>
>>
>> From:   Ismael Juma 
>> To: dev@kafka.apache.org
>> Cc: "us...@kafka.apache.org" 
>> Date:   11/06/2017 10:13 AM
>> Subject:Re: [ANNOUNCE] New committer: Onur Karaman
>> Sent by:isma...@gmail.com
>>
>>
>>
>> Congratulations Onur!
>>
>> Ismael
>>
>> On Mon, Nov 6, 2017 at 5:24 PM, Jun Rao  wrote:
>>
>>> Hi, everyone,
>>>
>>> The PMC of Apache Kafka is pleased to announce a new Kafka committer
>> Onur
>>> Karaman.
>>>
>>> Onur's most significant work is the improvement of Kafka controller,
>> which
>>> is the brain of a Kafka cluster. Over time, we have accumulated quite a
>> few
>>> correctness and performance issues in the controller. There have been
>>> attempts to fix controller issues in isolation, which would make the
>> code
>>> base more complicated without a clear path of solving all problems. Onur
>> is
>>> the one who took a holistic approach, by first documenting all known
>>> issues, writing down a new design, coming up with a plan to deliver the
>>> changes in phases and executing on it. At this point, Onur has completed
>>> the two most important phases: making the controller single threaded and
>>> changing the controller to use the async ZK api. The former fixed
>> multiple
>>> deadlocks and race conditions. The latter significantly improved the
>>> performance when there are many partitions. Experimental results show
>> that
>>> Onur's work reduced the controlled shutdown time by a factor of 100
>> times
>>> and the controller failover time by a factor of 3 times.
>>>
>>> Congratulations, Onur!
>>>
>>> Thanks,
>>>
>>> Jun (on behalf of the Apache Kafka PMC)
>>>
>>
>>
>>
>>
>>
>


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Matthias J. Sax
Congrats!!!


On 11/6/17 7:56 PM, Vahid S Hashemian wrote:
> Congrats Onur!
> 
> --Vahid
> 
> 
> 
> From:   Ismael Juma 
> To: dev@kafka.apache.org
> Cc: "us...@kafka.apache.org" 
> Date:   11/06/2017 10:13 AM
> Subject:Re: [ANNOUNCE] New committer: Onur Karaman
> Sent by:isma...@gmail.com
> 
> 
> 
> Congratulations Onur!
> 
> Ismael
> 
> On Mon, Nov 6, 2017 at 5:24 PM, Jun Rao  wrote:
> 
>> Hi, everyone,
>>
>> The PMC of Apache Kafka is pleased to announce a new Kafka committer 
> Onur
>> Karaman.
>>
>> Onur's most significant work is the improvement of Kafka controller, 
> which
>> is the brain of a Kafka cluster. Over time, we have accumulated quite a 
> few
>> correctness and performance issues in the controller. There have been
>> attempts to fix controller issues in isolation, which would make the 
> code
>> base more complicated without a clear path of solving all problems. Onur 
> is
>> the one who took a holistic approach, by first documenting all known
>> issues, writing down a new design, coming up with a plan to deliver the
>> changes in phases and executing on it. At this point, Onur has completed
>> the two most important phases: making the controller single threaded and
>> changing the controller to use the async ZK api. The former fixed 
> multiple
>> deadlocks and race conditions. The latter significantly improved the
>> performance when there are many partitions. Experimental results show 
> that
>> Onur's work reduced the controlled shutdown time by a factor of 100 
> times
>> and the controller failover time by a factor of 3 times.
>>
>> Congratulations, Onur!
>>
>> Thanks,
>>
>> Jun (on behalf of the Apache Kafka PMC)
>>
> 
> 
> 
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Vahid S Hashemian
Congrats Onur!

--Vahid



From:   Ismael Juma 
To: dev@kafka.apache.org
Cc: "us...@kafka.apache.org" 
Date:   11/06/2017 10:13 AM
Subject:Re: [ANNOUNCE] New committer: Onur Karaman
Sent by:isma...@gmail.com



Congratulations Onur!

Ismael

On Mon, Nov 6, 2017 at 5:24 PM, Jun Rao  wrote:

> Hi, everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer 
Onur
> Karaman.
>
> Onur's most significant work is the improvement of Kafka controller, 
which
> is the brain of a Kafka cluster. Over time, we have accumulated quite a 
few
> correctness and performance issues in the controller. There have been
> attempts to fix controller issues in isolation, which would make the 
code
> base more complicated without a clear path of solving all problems. Onur 
is
> the one who took a holistic approach, by first documenting all known
> issues, writing down a new design, coming up with a plan to deliver the
> changes in phases and executing on it. At this point, Onur has completed
> the two most important phases: making the controller single threaded and
> changing the controller to use the async ZK api. The former fixed 
multiple
> deadlocks and race conditions. The latter significantly improved the
> performance when there are many partitions. Experimental results show 
that
> Onur's work reduced the controlled shutdown time by a factor of 100 
times
> and the controller failover time by a factor of 3 times.
>
> Congratulations, Onur!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)
>






[GitHub] kafka pull request #4148: KAFKA-6120: RecordCollector should not retry sendi...

2017-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4148


---


[GitHub] kafka pull request #4129: KAFKA-6115: TaskManager should be type aware

2017-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4129


---


Re: kafka-pr-jdk9-scala2.12 keeps failing

2017-11-06 Thread Ismael Juma
Thanks!

Ismael

On Mon, Nov 6, 2017 at 3:48 AM, Ted Yu  wrote:

> Logged https://issues.apache.org/jira/browse/INFRA-15448
>
> On Thu, Nov 2, 2017 at 11:39 PM, Ismael Juma  wrote:
>
> > This looks to be an issue in Jenkins, not in Kafka. Apache Infra updated
> > Java 9 to 9.0.1 and it seems to have broken some of the Jenkins code.
> >
> > Ismael
> >
> > On 3 Nov 2017 1:53 am, "Ted Yu"  wrote:
> >
> > > Looking at earlier runs, e.g. :
> > > https://builds.apache.org/job/kafka-pr-jdk9-scala2.12/2384/console
> > >
> > > FAILURE: Build failed with an exception.
> > >
> > > * What went wrong:
> > > Could not determine java version from '9.0.1'.
> > >
> > >
> > > This was the first build with 'out of range of int' exception:
> > >
> > >
> > > https://builds.apache.org/job/kafka-pr-jdk9-scala2.12/2389/console
> > >
> > >
> > > However, I haven't found the commit which was at the tip of repo at
> that
> > > time.
> > >
> > >
> > > On Thu, Nov 2, 2017 at 6:40 PM, Guozhang Wang 
> > wrote:
> > >
> > > > Noticed that as well, could we track down to which git commit /
> version
> > > > upgrade caused the issue?
> > > >
> > > >
> > > > Guozhang
> > > >
> > > > On Thu, Nov 2, 2017 at 6:25 PM, Ted Yu  wrote:
> > > >
> > > > > Hi,
> > > > > I took a look at recent runs under https://builds.apache.
> > > > > org/job/kafka-pr-jdk9-scala2.12
> > > > >
> > > > > All the recent runs failed with:
> > > > >
> > > > > Could not update commit status of the Pull Request on GitHub.
> > > > > org.kohsuke.github.HttpException: Server returned HTTP response
> > code:
> > > > > 201, message: 'Created' for URL:
> > > > > https://api.github.com/repos/apache/kafka/statuses/
> > > > > 3d96c6f5b2edd3c1dbea11dab003c4ac78ee141a
> > > > > at org.kohsuke.github.Requester.parse(Requester.java:633)
> > > > > at org.kohsuke.github.Requester.parse(Requester.java:594)
> > > > > at org.kohsuke.github.Requester._to(Requester.java:272)
> > > > > at org.kohsuke.github.Requester.to(Requester.java:234)
> > > > > at org.kohsuke.github.GHRepository.createCommitStatus(
> > > > > GHRepository.java:1071)
> > > > >
> > > > > ...
> > > > >
> > > > > Caused by: com.fasterxml.jackson.databind.JsonMappingException:
> > > > > Numeric value (4298492118) out of range of int
> > > > >  at [Source: {"url":"https://api.github.com/repos/apache/kafka/
> > > statuses/
> > > > > 3d96c6f5b2edd3c1dbea11dab003c4ac78ee141a","id":4298492118,"
> > > > > state":"pending","description":"Build
> > > > > started sha1 is
> > > > > merged.","target_url":"https://builds.apache.org/job/kafka-
> > > > > pr-jdk9-scala2.12/2397/","context":"JDK
> > > > > 9 and Scala 2.12",
> > > > >
> > > > >
> > > > > Should we upgrade the version for jackson ?
> > > > >
> > > > >
> > > > > Cheers
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -- Guozhang
> > > >
> > >
> >
>


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Ismael Juma
Congratulations Onur!

Ismael

On Mon, Nov 6, 2017 at 5:24 PM, Jun Rao  wrote:

> Hi, everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> Karaman.
>
> Onur's most significant work is the improvement of Kafka controller, which
> is the brain of a Kafka cluster. Over time, we have accumulated quite a few
> correctness and performance issues in the controller. There have been
> attempts to fix controller issues in isolation, which would make the code
> base more complicated without a clear path of solving all problems. Onur is
> the one who took a holistic approach, by first documenting all known
> issues, writing down a new design, coming up with a plan to deliver the
> changes in phases and executing on it. At this point, Onur has completed
> the two most important phases: making the controller single threaded and
> changing the controller to use the async ZK api. The former fixed multiple
> deadlocks and race conditions. The latter significantly improved the
> performance when there are many partitions. Experimental results show that
> Onur's work reduced the controlled shutdown time by a factor of 100 times
> and the controller failover time by a factor of 3 times.
>
> Congratulations, Onur!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)
>


Build failed in Jenkins: kafka-trunk-jdk9 #171

2017-11-06 Thread Apache Jenkins Server
See 


Changes:

[rajinisivaram] KAFKA-6156; Metric tag values with colons must be sanitized

[becket.qin] KAFKA-6172; Cache lastEntry in TimeIndex to avoid unnecessary disk

--
Started by an SCM change
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H14 (couchdbtest ubuntu xenial) in workspace 

Cloning the remote Git repository
Cloning repository https://github.com/apache/kafka.git
 > git init  # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # 
 > timeout=10
 > git config remote.origin.url https://github.com/apache/kafka.git # timeout=10
Fetching upstream changes from https://github.com/apache/kafka.git
 > git fetch --tags --progress https://github.com/apache/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 0c895706e8ab511efe352a824a0c9e2dab62499e 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0c895706e8ab511efe352a824a0c9e2dab62499e
Commit message: "KAFKA-6172; Cache lastEntry in TimeIndex to avoid unnecessary 
disk access"
 > git rev-list 86062e9a78dccad74e012f11755025512ad5cf63 # timeout=10
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
[kafka-trunk-jdk9] $ /bin/bash -xe /tmp/jenkins8119844706765058166.sh
+ rm -rf 
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2/bin/gradle

FAILURE: Build failed with an exception.

* What went wrong:
Could not determine java version from '9.0.1'.

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting 
GRADLE_3_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_3.4-rc-2
Not sending mail to unregistered user rajinisiva...@googlemail.com
Not sending mail to unregistered user wangg...@gmail.com
Not sending mail to unregistered user becket@gmail.com


Re: [DISCUSS] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2017-11-06 Thread Guozhang Wang
A quick question: I think we do not yet have the `list consumer groups`
func as in the old AdminClient. Without this `describe group` given the
group id would not be very useful. Could you include this as well in your
KIP? More specifically, you can look at kafka.admin.AdminClientfor more
details on the APIs.


Guozhang

On Mon, Nov 6, 2017 at 7:22 AM, Ted Yu  wrote:

> Please fill out Discussion thread and JIRA fields.
>
> Thanks
>
> On Mon, Nov 6, 2017 at 2:02 AM, Tom Bentley  wrote:
>
> > Hi Jorge,
> >
> > Thanks for the KIP. A few initial comments:
> >
> > 1. The AdminClient doesn't have any API like `listConsumerGroups()`
> > currently, so in general how does a client know the group ids it is
> > interested in?
> > 2. Could you fill in the API of DescribeConsumerGroupResult, just so
> > everyone knows exactly what being proposed.
> > 3. Can you describe the ConsumerGroupDescription class?
> > 4. Probably worth mentioning that this will use
> > DescribeGroupsRequest/Response, and also enumerating the error codes
> that
> > can return (or, equivalently, enumerate the exceptions throw from the
> > futures obtained from the DescribeConsumerGroupResult).
> >
> > Cheers,
> >
> > Tom
> >
> > On 6 November 2017 at 08:19, Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi everyone,
> > >
> > > I would like to start a discussion on KIP-222 [1] based on issue [2].
> > >
> > > Looking forward to feedback.
> > >
> > > [1]
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=74686265
> > > [2] https://issues.apache.org/jira/browse/KAFKA-6058
> > >
> > > Cheers,
> > > Jorge.
> > >
> >
>



-- 
-- Guozhang


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-06 Thread Thomas Becker
Could you clarify exactly what you mean by keeping the current distinction?

Actually, re-reading the KIP and JIRA, it's not clear that being able to 
specify a custom name is actually a requirement. If the goal is to control 
repartitioning and tune parallelism, maybe we can just sidestep this issue 
altogether by removing the ability to set a different name.

On Mon, 2017-11-06 at 16:51 +0100, Matthias J. Sax wrote:

That's a good point. In current design, we strictly distinguish both.
For example, the reset tools deletes internal topics (starting with
prefix `-` and ending with either `-repartition` or
`-changelog`.

Thus, from my point of view, it would make sense to keep the current
distinction.

-Matthias

On 11/6/17 4:45 PM, Thomas Becker wrote:


I think this sounds good as well. It's worth clarifying whether topics that are 
named by the user but created by streams are considered "internal" topics also.

On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:

My idea was, to relax the requirement for through() that a topic must be
created manually before startup.

Thus, if no through() call is made, a (internal) topic is created the
same way we do it currently.

If one uses `through(String topicName)` we keep the current behavior and
require users to create the topic manually.

The reasoning is as follows: if a user creates a topic manually, a user
can just use it for repartitioning. As the topic is already there, there
is no need to specify any topic configs.

We add a new `through()` overload (details TBD) that allows to specify
topic configs and Streams create the topic with those configs.

Reasoning: user don't want to manage topic manually, thus, it's still an
internal topic and Streams create the topic name automatically as for
all other internal topics. However, users gets some more control about
topic parameters like number of partitions (we should discuss what other
configs would be useful).


Does this make sense?


-Matthias


On 11/5/17 1:21 AM, Jan Filipiak wrote:


Hi.


Im not 100 % up to date what version 1.0 DSL looks like ATM.
I just would argue that repartitioning should be an own API call like
through or something.
One can use through or to already to get this. I would argue one should
look there instead of overloads

Best Jan

On 04.11.2017 16:01, Jeyhun Karimov wrote:


Dear community,

I would like to initiate discussion on KIP-221 [1] based on issue [2].
Please feel free to comment.

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Repartition+Topic+Hints+in+Streams

[2] https://issues.apache.org/jira/browse/KAFKA-6037



Cheers,
Jeyhun











This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.








This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


[jira] [Resolved] (KAFKA-1326) New consumer checklist

2017-11-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-1326.
--
Resolution: Fixed

> New consumer checklist
> --
>
> Key: KAFKA-1326
> URL: https://issues.apache.org/jira/browse/KAFKA-1326
> Project: Kafka
>  Issue Type: New Feature
>  Components: consumer
>Affects Versions: 0.8.2.1
>Reporter: Neha Narkhede
>Assignee: Neha Narkhede
>  Labels: feature
> Fix For: 0.9.0.1
>
>
> We will use this JIRA to track the list of issues to resolve to get a working 
> new consumer client. The consumer client can work in phases -
> 1. Add new consumer APIs and configs
> 2. Refactor Sender. We will need to use some common APIs from Sender.java 
> (https://issues.apache.org/jira/browse/KAFKA-1316)
> 3. Add metadata fetch and refresh functionality to the consumer (This will 
> require https://issues.apache.org/jira/browse/KAFKA-1316)
> 4. Add functionality to support subscribe(TopicPartition...partitions). This 
> will add SimpleConsumer functionality to the new consumer. This does not 
> include any group management related work.
> 5. Add ability to commit offsets to Kafka. This will include adding 
> functionality to the commit()/commitAsync()/committed() APIs. This still does 
> not include any group management related work.
> 6. Add functionality to the offsetsBeforeTime() API.
> 7. Add consumer co-ordinator election to the server. This will only add a new 
> module for the consumer co-ordinator, but not necessarily all the logic to do 
> group management. 
> At this point, we will have a fully functional standalone consumer and a 
> server side co-ordinator module. This will be a good time to start adding 
> group management functionality to the server and consumer.
> 8. Add failure detection capability to the consumer when group management is 
> used. This will not include any rebalancing logic, just the ability to detect 
> failures using session.timeout.ms.
> 9. Add rebalancing logic to the server and consumer. This will be a tricky 
> and potentially large change since it will involve implementing the group 
> management protocol.
> 10. Add system tests for the new consumer
> 11. Add metrics 
> 12. Convert mirror maker to use the new consumer.
> 13. Convert perf test to use the new consumer
> 14. Performance testing and analysis.
> 15. Review and fine tune log4j logging



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] 0.11.0.2 bug fix release

2017-11-06 Thread Ted Yu
After clicking the outstanding issues link, I only saw KAFKA-6007.

Here is the filter in case anyone encounters the same :

status in (Open, Reopened, "In Progress", "Patch Available") AND Project =
kafka AND fixVersion=0.11.0.2

On Mon, Nov 6, 2017 at 2:47 AM, Rajini Sivaram 
wrote:

> Hi all,
>
>
> Since we have fixed some critical issues since 0.11.0.1, it must be time
> for a 0.11.0.2 release.
>
>
> Since the 0.11.0.1 release we've fixed 14 JIRAs that are targeted for
> 0.11.0.2:
>
> https://issues.apache.org/jira/browse/KAFKA-6134?jql=
> project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%
> 20Closed)%20AND%20fixVersion%20%3D%200.11.0.2
>
>
>
> We have 11 outstanding issues that are targeted for 0.11.0.2:
>
> https://issues.apache.org/jira/browse/KAFKA-6007?jql=
> project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%
> 20fixVersion%20%3D%200.11.0.2
>
>
> Can the owners of these issues please resolve them soon or move them to a
> future release?
>
>
> I will aim to create the first RC for 0.11.0.2 this Friday.
>
> Thank you!
>
> Regards,
>
> Rajini
>


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Guozhang Wang
I'd want to add that Onur has also been contributing a lot to the new
consumer development in the past, making it happen in the 0.9 release.

Very well deserved. Congrats, Onur!


Guozhang

On Mon, Nov 6, 2017 at 9:28 AM, Ted Yu  wrote:

> Congratulations, Onur!
>
> On Mon, Nov 6, 2017 at 9:24 AM, Jun Rao  wrote:
>
> > Hi, everyone,
> >
> > The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> > Karaman.
> >
> > Onur's most significant work is the improvement of Kafka controller,
> which
> > is the brain of a Kafka cluster. Over time, we have accumulated quite a
> few
> > correctness and performance issues in the controller. There have been
> > attempts to fix controller issues in isolation, which would make the code
> > base more complicated without a clear path of solving all problems. Onur
> is
> > the one who took a holistic approach, by first documenting all known
> > issues, writing down a new design, coming up with a plan to deliver the
> > changes in phases and executing on it. At this point, Onur has completed
> > the two most important phases: making the controller single threaded and
> > changing the controller to use the async ZK api. The former fixed
> multiple
> > deadlocks and race conditions. The latter significantly improved the
> > performance when there are many partitions. Experimental results show
> that
> > Onur's work reduced the controlled shutdown time by a factor of 100 times
> > and the controller failover time by a factor of 3 times.
> >
> > Congratulations, Onur!
> >
> > Thanks,
> >
> > Jun (on behalf of the Apache Kafka PMC)
> >
>



-- 
-- Guozhang


Re: [DISCUSS] 0.11.0.2 bug fix release

2017-11-06 Thread Guozhang Wang
Thanks Rajini, +1 on releasing a 0.11.0.2.

I made a pass over the outstanding issues, and added another related issue (
https://issues.apache.org/jira/browse/KAFKA-4767) to KAFKA-5936 to the list.

For the other outstanding ones that do not yet have assigned to anyone yet,
we'd need to start working on them asap if we want to include their fixes
by Friday. Among them, only KAFKA-1923 seems not straight forward, so my
suggestion is to move that issue out of the scope while driving to finish
others sooner.

Guozhang


On Mon, Nov 6, 2017 at 2:47 AM, Rajini Sivaram 
wrote:

> Hi all,
>
>
> Since we have fixed some critical issues since 0.11.0.1, it must be time
> for a 0.11.0.2 release.
>
>
> Since the 0.11.0.1 release we've fixed 14 JIRAs that are targeted for
> 0.11.0.2:
>
> https://issues.apache.org/jira/browse/KAFKA-6134?jql=
> project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%
> 20Closed)%20AND%20fixVersion%20%3D%200.11.0.2
>
>
>
> We have 11 outstanding issues that are targeted for 0.11.0.2:
>
> https://issues.apache.org/jira/browse/KAFKA-6007?jql=
> project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%
> 22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%
> 20fixVersion%20%3D%200.11.0.2
>
>
> Can the owners of these issues please resolve them soon or move them to a
> future release?
>
>
> I will aim to create the first RC for 0.11.0.2 this Friday.
>
> Thank you!
>
> Regards,
>
> Rajini
>



-- 
-- Guozhang


[GitHub] kafka pull request #4177: KAFKA-6172; Cache lastEntry in TimeIndex to avoid ...

2017-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4177


---


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Ted Yu
Congratulations, Onur!

On Mon, Nov 6, 2017 at 9:24 AM, Jun Rao  wrote:

> Hi, everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> Karaman.
>
> Onur's most significant work is the improvement of Kafka controller, which
> is the brain of a Kafka cluster. Over time, we have accumulated quite a few
> correctness and performance issues in the controller. There have been
> attempts to fix controller issues in isolation, which would make the code
> base more complicated without a clear path of solving all problems. Onur is
> the one who took a holistic approach, by first documenting all known
> issues, writing down a new design, coming up with a plan to deliver the
> changes in phases and executing on it. At this point, Onur has completed
> the two most important phases: making the controller single threaded and
> changing the controller to use the async ZK api. The former fixed multiple
> deadlocks and race conditions. The latter significantly improved the
> performance when there are many partitions. Experimental results show that
> Onur's work reduced the controlled shutdown time by a factor of 100 times
> and the controller failover time by a factor of 3 times.
>
> Congratulations, Onur!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)
>


Re: kafka producer : number of record pushed to topic limited

2017-11-06 Thread Ted Yu
Can you give us some more information ?

Were you using a single node setup (127.0.0.1) ?
Which release of Kafka are you using ?

Anything interesting from broker log ?

On Mon, Nov 6, 2017 at 8:52 AM, Dhia Beji  wrote:

> Hello,
>
> Would you please help me, I m trying to create producer that pushs a large
> number of json records to topic.
>
> Topic creation :  kafka-topics --create --zookeeper 127.0.0.1:2181
> --replication-factor 1 --partitions 100  --topic  testkpa
> Topic producer :
>
> props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "http://"
> );
> props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG,
> "org.apache.kafka.common.serialization.StringSerializer");
> props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,
> "org.apache.kafka.common.serialization.StringSerializer");
>
> Producer producer = new KafkaProducer String>(props)
> ProducerRecord rec = new ProducerRecord String>(topicName,""+TestStreamCsv.incr ,value);
>
> producer.send(rec)
> //
>
> the problem is that the number of records viewed via landoop is limited to
> 87.
>
> Thank you.
>
>
> This message has been scanned for malware by Websense. www.websense.com
>


[jira] [Created] (KAFKA-6177) kafka-mirror-maker.sh RecordTooLargeException

2017-11-06 Thread JIRA
Rémi REY created KAFKA-6177:
---

 Summary: kafka-mirror-maker.sh RecordTooLargeException
 Key: KAFKA-6177
 URL: https://issues.apache.org/jira/browse/KAFKA-6177
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.10.1.1
 Environment: centos 7
Reporter: Rémi REY
Priority: Minor
 Attachments: consumer.config, producer.config

Hi all,

I am facing an issue with kafka-mirror-maker.sh.
We have 2 kafka clusters with the same configuration and mirror maker instances 
in charge of the mirroring between the clusters.

We haven't change the default configuration on the message size, so the 112 
bytes limitation is expected on both clusters.

we are facing the following error at the mirroring side:

Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: [2017-09-21 14:30:49,431] ERROR 
Error when sending message to topic my_topic_name with key: 81 bytes, value: 
1000272 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: 
org.apache.kafka.common.errors.RecordTooLargeException: The request included a 
message larger than the max message size the server will accept.
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: [2017-09-21 14:30:49,511] ERROR 
Error when sending message to topic my_topic_name with key: 81 bytes, value: 
13846 bytes with error: 
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: 
java.lang.IllegalStateException: Producer is closed forcefully.
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortBatches(RecordAccumulator.java:513)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.abortIncompleteBatches(RecordAccumulator.java:493)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:156)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
java.lang.Thread.run(Thread.java:745)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: [2017-09-21 14:30:49,511] FATAL 
[mirrormaker-thread-0] Mirror maker thread failure due to  
(kafka.tools.MirrorMaker$MirrorMakerThread)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: 
java.lang.IllegalStateException: Cannot send after the producer is closed.
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:185)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:474)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:436)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
kafka.tools.MirrorMaker$MirrorMakerProducer.send(MirrorMaker.scala:657)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
kafka.tools.MirrorMaker$MirrorMakerThread$$anonfun$run$6.apply(MirrorMaker.scala:434)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
kafka.tools.MirrorMaker$MirrorMakerThread$$anonfun$run$6.apply(MirrorMaker.scala:434)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
scala.collection.Iterator$class.foreach(Iterator.scala:893)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
scala.collection.AbstractIterable.foreach(Iterable.scala:54)
Sep 21 14:30:49 lpa2e194 kafka-mirror-maker.sh: at 
kafka.tools.MirrorMaker$MirrorMakerThread.run(MirrorMaker.scala:434)

Why am I getting this error ? 
I expect that messages that could enter the first cluster could be repicated to 
the second cluster without raising any error on the message size.
Is there any configuration adjustment required at mirror maker side to have it 
support the default message size on the brokers ?

Find the mirrormaker consumer and producer config files attached.

Thanks for your inputs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Abhishek Mendhekar
Congrats Onur!

On Mon, Nov 6, 2017 at 9:24 AM, Jun Rao  wrote:

> Hi, everyone,
>
> The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
> Karaman.
>
> Onur's most significant work is the improvement of Kafka controller, which
> is the brain of a Kafka cluster. Over time, we have accumulated quite a few
> correctness and performance issues in the controller. There have been
> attempts to fix controller issues in isolation, which would make the code
> base more complicated without a clear path of solving all problems. Onur is
> the one who took a holistic approach, by first documenting all known
> issues, writing down a new design, coming up with a plan to deliver the
> changes in phases and executing on it. At this point, Onur has completed
> the two most important phases: making the controller single threaded and
> changing the controller to use the async ZK api. The former fixed multiple
> deadlocks and race conditions. The latter significantly improved the
> performance when there are many partitions. Experimental results show that
> Onur's work reduced the controlled shutdown time by a factor of 100 times
> and the controller failover time by a factor of 3 times.
>
> Congratulations, Onur!
>
> Thanks,
>
> Jun (on behalf of the Apache Kafka PMC)
>



-- 
Abhishek Mendhekar
abhishek.mendhe...@gmail.com | 818.263.7030


[ANNOUNCE] New committer: Onur Karaman

2017-11-06 Thread Jun Rao
Hi, everyone,

The PMC of Apache Kafka is pleased to announce a new Kafka committer Onur
Karaman.

Onur's most significant work is the improvement of Kafka controller, which
is the brain of a Kafka cluster. Over time, we have accumulated quite a few
correctness and performance issues in the controller. There have been
attempts to fix controller issues in isolation, which would make the code
base more complicated without a clear path of solving all problems. Onur is
the one who took a holistic approach, by first documenting all known
issues, writing down a new design, coming up with a plan to deliver the
changes in phases and executing on it. At this point, Onur has completed
the two most important phases: making the controller single threaded and
changing the controller to use the async ZK api. The former fixed multiple
deadlocks and race conditions. The latter significantly improved the
performance when there are many partitions. Experimental results show that
Onur's work reduced the controlled shutdown time by a factor of 100 times
and the controller failover time by a factor of 3 times.

Congratulations, Onur!

Thanks,

Jun (on behalf of the Apache Kafka PMC)


[jira] [Resolved] (KAFKA-6157) Fix repeated words words in JavaDoc and comments.

2017-11-06 Thread Jiangjie Qin (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiangjie Qin resolved KAFKA-6157.
-
Resolution: Fixed

> Fix repeated words words in JavaDoc and comments.
> -
>
> Key: KAFKA-6157
> URL: https://issues.apache.org/jira/browse/KAFKA-6157
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 1.0.0
>Reporter: Adem Efe Gencer
>Assignee: Adem Efe Gencer
>Priority: Trivial
>  Labels: easyfix, newbie
> Fix For: 1.0.1
>
>
> There are repeated words in JavaDoc and comments within the code. This fix 
> remove the repeated words.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


kafka producer : number of record pushed to topic limited

2017-11-06 Thread Dhia Beji
Hello,

Would you please help me, I m trying to create producer that pushs a large 
number of json records to topic.

Topic creation :  kafka-topics --create --zookeeper 127.0.0.1:2181  
--replication-factor 1 --partitions 100  --topic  testkpa
Topic producer :

props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "http://" );
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, 
"org.apache.kafka.common.serialization.StringSerializer");

props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringSerializer");

Producer producer = new KafkaProducer(props)
ProducerRecord rec = new ProducerRecord(topicName,""+TestStreamCsv.incr ,value);

producer.send(rec)
//

the problem is that the number of records viewed via landoop is limited to 87.

Thank you.


This message has been scanned for malware by Websense. www.websense.com


Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Matthias J. Sax
Jan,

I understand what you are saying. However, having a RecordContext is
super useful for operations that are applied to input topic. Many users
requested this feature -- it's much more convenient that falling back to
transform() to implement a a filter() for example that want to access
some meta data.

Because we cannot distinguish different "origins" of a KStream/KTable, I
am not sure if there would be a better way to do this. The only
"workaround" I see, is to have two KStream/KTable interfaces each and we
would use the first one for KStream/KTable with "proper" RecordContext.
But this does not seem to be a good solution either.

Note, a KTable can also be read directly from a topic, I agree that
using RecordContext on a KTable that is the result of an aggregation is
questionable. But I don't see a reason to down vote the KIP for this reason.

WDYT about this?


-Matthias

On 11/1/17 10:19 PM, Jan Filipiak wrote:
> -1 non binding
> 
> I don't get the motivation.
> In 80% of my DSL processors there is no such thing as a reasonable
> RecordContext.
> After a join  the record I am processing belongs to at least 2 topics.
> After a Group by the record I am processing was created from multiple
> offsets.
> 
> The API Design doesn't look appealing
> 
> Best Jan
> 
> 
> 
> On 01.11.2017 22:02, Jeyhun Karimov wrote:
>> Dear community,
>>
>> It seems the discussion for KIP-159 [1] converged finally. I would
>> like to
>> initiate voting for the particular KIP.
>>
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-159%3A+Introducing+Rich+functions+to+Streams
>>
>>
>> Cheers,
>> Jeyhun
>>
> 



signature.asc
Description: OpenPGP digital signature


[GitHub] kafka pull request #4182: KAFKA-6163 fail fast on startup

2017-11-06 Thread xvrl
GitHub user xvrl opened a pull request:

https://github.com/apache/kafka/pull/4182

KAFKA-6163 fail fast on startup

skip loading remaining logs if we encounter an unrecoverable error on 
startup

@hachikuji @ijuma 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xvrl/kafka kafka-6163

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4182.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4182


commit c32202343fde01ef6d16c184d5b587ea946442ac
Author: Xavier Léauté 
Date:   2017-11-06T16:45:34Z

KAFKA-6163 fail fast on startup

skip loading remaining logs if we encounter an unrecoverable error on 
startup




---


Re: [VOTE] KIP-171 - Extend Consumer Group Reset Offset for Stream Application

2017-11-06 Thread Rajini Sivaram
+1 (binding)
Thanks for the KIP,  Jorge.

Regards,

Rajini

On Tue, Oct 31, 2017 at 9:58 AM, Damian Guy  wrote:

> Thanks for the KIP - +1 (binding)
>
> On Mon, 23 Oct 2017 at 18:39 Guozhang Wang  wrote:
>
> > Thanks Jorge for driving this KIP! +1 (binding).
> >
> >
> > Guozhang
> >
> > On Mon, Oct 16, 2017 at 2:11 PM, Bill Bejeck  wrote:
> >
> > > +1
> > >
> > > Thanks,
> > > Bill
> > >
> > > On Fri, Oct 13, 2017 at 6:36 PM, Ted Yu  wrote:
> > >
> > > > +1
> > > >
> > > > On Fri, Oct 13, 2017 at 3:32 PM, Matthias J. Sax <
> > matth...@confluent.io>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > >
> > > > >
> > > > > On 9/11/17 3:04 PM, Jorge Esteban Quilcate Otoya wrote:
> > > > > > Hi All,
> > > > > >
> > > > > > It seems that there is no further concern with the KIP-171.
> > > > > > At this point we would like to start the voting process.
> > > > > >
> > > > > > The KIP can be found here:
> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application
> > > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [DISCUSS]: KIP-159: Introducing Rich functions to Streams

2017-11-06 Thread Matthias J. Sax
I just re-read the KIP.

One minor comment: we don't need to introduce any deprecated methods.
Thus, RichValueTransformer#punctuate can be removed completely instead
of introducing it as deprecated.

Otherwise looks good to me.

Thanks for being so patient!


-Matthias

On 11/1/17 9:16 PM, Guozhang Wang wrote:
> Jeyhun,
> 
> I think I'm convinced to not do KAFKA-3907 in this KIP. We should think
> carefully if we should add this functionality to the DSL layer moving
> forward since from what we discovered working on it the conclusion is that
> it would require revamping the public APIs quite a lot, and it's not clear
> if it is a good trade-off than asking users to call process() instead.
> 
> 
> Guozhang
> 
> 
> On Wed, Nov 1, 2017 at 4:50 AM, Damian Guy  wrote:
> 
>> Hi Jeyhun, thanks, looks good.
>> Do we need to remove the line that says:
>>
>>- on-demand commit() feature
>>
>> Cheers,
>> Damian
>>
>> On Tue, 31 Oct 2017 at 23:07 Jeyhun Karimov  wrote:
>>
>>> Hi,
>>>
>>> I removed the 'commit()' feature, as we discussed. It simplified  the
>>> overall design of KIP a lot.
>>> If it is ok, I would like to start a VOTE thread.
>>>
>>> Cheers,
>>> Jeyhun
>>>
>>> On Fri, Oct 27, 2017 at 5:28 PM Matthias J. Sax 
>>> wrote:
>>>
 Thanks. I understand what you are saying, but I don't agree that

> but also we need a commit() method

 I would just not provide `commit()` at DSL level and close the
 corresponding Jira as "not a problem" or similar.


 -Matthias

 On 10/27/17 3:42 PM, Jeyhun Karimov wrote:
> Hi Matthias,
>
> Thanks for your comments. I agree that this is not the best way to
>> do.
>>> A
> bit of history behind this design.
>
> Prior doing this, I tried to provide ProcessorContext itself as an
 argument
> in Rich interfaces. However, we dont want to give users that
>>> flexibility
> and “power”. Moreover, ProcessorContext contains processor level
> information and not Record level info. The only thing we need ij
> ProcessorContext is commit() method.
>
> So, as far as I understood, we need recor context (offset, timestamp
>>> and
> etc) but also we need a commit() method ( we dont want to provide
> ProcessorContext as a parameter so users can use
 ProcessorContext.commit()
> ).
>
> As a result, I thought to “propagate” commit() call from
>> RecordContext
>>> to
> ProcessorContext() .
>
>
> If there is a misunderstanding in motvation/discussion of
>> KIP/included
> jiras please let me know.
>
>
> Cheers,
> Jeyhun
>
>
> On Fri 27. Oct 2017 at 12:39, Matthias J. Sax >>
 wrote:
>
>> I am personally still not convinced, that we should add `commit()`
>> at
 all.
>>
>> @Guozhang: you created the original Jira. Can you elaborate a little
>> bit? Isn't requesting commits a low level API that should not be
>>> exposed
>> in the DSL? Just want to understand the motivation better. Why would
>> anybody that uses the DSL ever want to request a commit? To me,
>> requesting commits is useful if you manipulated state explicitly,
>> ie,
>> via Processor API.
>>
>> Also, for the solution: it seem rather unnatural to me, that we add
>> `commit()` to `RecordContext` -- from my understanding,
>>> `RecordContext`
>> is an helper object that provide access to record meta data.
>>> Requesting
>> a commit is something quite different. Additionally, a commit does
>> not
>> commit a specific record but a `RecrodContext` is for a specific
>>> record.
>>
>> To me, this does not seem to be a sound API design if we follow this
 path.
>>
>>
>> -Matthias
>>
>>
>>
>> On 10/26/17 10:41 PM, Jeyhun Karimov wrote:
>>> Hi,
>>>
>>> Thanks for your suggestions.
>>>
>>> I have some comments, to make sure that there is no
>> misunderstanding.
>>>
>>>
>>> 1. Maybe we can deprecate the `commit()` in ProcessorContext, to
 enforce
 user to consolidate this call as
 "processorContext.recordContext().commit()". And internal
 implementation
 of
 `ProcessorContext.commit()` in `ProcessorContextImpl` is also
>>> changed
 to
 this call.
>>>
>>>
>>> - I think we should not deprecate `ProcessorContext.commit()`. The
>>> main
>>> intuition that we introduce `commit()` in `RecordContext` is that,
>>> `RecordContext` is the one which is provided in Rich interfaces. So
>>> if
>> user
>>> wants to commit, then there should be some method inside
 `RecordContext`
>> to
>>> do so. Internally, `RecordContext.commit()` calls
>>> `ProcessorContext.commit()`  (see the last code snippet in
>> KIP-159):
>>>
>>> @Override
>>> public void process(final K1 key, 

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-06 Thread Matthias J. Sax
Jan,

thanks a lot for this KIP. I did an initial pass over it, but feel a
little lost. Maybe I need to read it more carefully, but atm it's not
clear to me at all what algorithm you propose.

I think it would be super helpful, to do an example with concrete data
that show how records are stored, what the different value mappers
extract, and what is written into repartitioning topics.



-Matthias


On 11/5/17 2:09 AM, Jan Filipiak wrote:
> Hi Gouzhang
> 
> I hope the wikipage looks better now. made a little more effort into the
> diagram. Still not ideal but I think it serves its purpose.
> 
> 
> 
> On 02.11.2017 01:17, Guozhang Wang wrote:
>> Thanks for the KIP writeup Jan. I made a first pass and here are some
>> quick
>> comments:
>>
>>
>> 1. Could we use K0 / V0 and K1 / V1 etc since K0 and KO are a bit
>> harder to
>> differentiate when reading.
>>
>> 2. I think you missed the key type in the intrusive approach example code
>> snippet regarding "KTable  oneToManyJoin"? Should that be
>>
>> KTable, V0> oneToManyJoin
>>
>> 3. Some of the arrows in your algorithm section's diagrams seems
>> reversed.
>>
>> 4. In the first step of the algorithm, "Materialize B first", that
>> happens
>> in the "Repartition by A's key" block right? If yes, could you clarify it
>> in the block?
>>
>> 5. "skip old if A's key didn't change": hmm, not sure if we can skip it.
>> What if other fields (neither A's key or B's key) changes? Suppose you
>> have
>> an aggregation after the join, we still need to subtract the old value
>> from
>> the aggregation right?
>>
>> 6. In the block of "Materialize B", I think from your description we are
>> actually materializing both A and B right? If yes could you update the
>> diagram?
>>
>> 7. This is a meta question: "in the sink, only use A's key to determine
>> partition" I think we had the discussion long time ago, that if we are
>> sending the old and new entries of the pair to different partitions,
>> their
>> ordering may get reversed later when reading from the join operator (i.e.
>> the "Materialize B" block in your diagram). How did you address that with
>> this proposal?
>>
>> 8. "B records with a 'null' A-key value would be silently dropped" Where
>> are we dropping it, do we drop it at the first sub-topology (i.e the
>> "Repartition by A's key" block)?
>>
>> Guozhang
>>
>>
>>
>>
>>
>> On Wed, Nov 1, 2017 at 12:18 PM, Jan Filipiak 
>> wrote:
>>
>>> Hi thanks for the feedback
>>>
>>> On 01.11.2017 12:58, Damian Guy wrote:
>>>
 Hi Jan, Thanks for the KIP!

 In both alternatives the API will need to use the `Joined` class rather
 than than passing in `Serde`s. Also, as with all other joins etc, there
 probably should be an overload that doesn't require any `Serdes`.

>>> Will check again how current API looks. I remember loosing the argument
>>> with this IQ overloads things.
>>> Didn't expect something to have happend already so I just copied from
>>> the
>>> PR. Will update.
>>> Will also add the overload.
>>>
 It isn't clear to me what `joinPrefixFaker` is doing? In the comment it
 says "returning an outputKey that when serialized only produces a
 prefix
 of
 the output key which is the same serializing K" So why not just use
 "K" ?

>>> The faker in fact returns K wich can be serialized by the Key Serde
>>> in the
>>> rocks. But it needs to only contain A's key and it needs to be a strict
>>> prefix
>>> byte[] of all K with this A's key. We gonna seek there with an
>>> RocksIterator and continue to read as long as the "faked key" serialized
>>> form is a prefix
>>> This is easy todo for Avro + Protobuf +  custom Serdes and Hadoop
>>> Writables. Its a nightmare for JSON serdes.
>>>
>>>
>>>
 Thanks,
 Damian


 On Fri, 27 Oct 2017 at 10:27 Ted Yu  wrote:

 I think if you explain what A and B are in the beginning, it makes
 sense
> to
> use them since readers would know who they reference.
>
> Cheers
>
> On Thu, Oct 26, 2017 at 11:04 PM, Jan Filipiak
>  wrote:
>
>
>> Thanks for the remarks. hope I didn't miss any.
>> Not even sure if it makes sense to introduce A and B or just stick
>> with
>> "this ktable", "other ktable"
>>
>> Thank you
>> Jan
>>
>>
>> On 27.10.2017 06:58, Ted Yu wrote:
>>
>> Do you mind addressing my previous comments ?
>>> http://search-hadoop.com/m/Kafka/uyzND1hzF8SRzUqb?subj=Re+
>>> DISCUSS+KIP+213+Support+non+key+joining+in+KTable
>>>
>>> On Thu, Oct 26, 2017 at 9:38 PM, Jan Filipiak <
>>> jan.filip...@trivago.com
>>> wrote:
>>>
>>> Hello everyone,
>>>
 this is the new discussion thread after the ID-clash.

 Best
 Jan

 __


 Hello Kafka-users,

 I want to continue with 

[GitHub] kafka pull request #4173: KAFKA-6156: Metric tag name should not contain col...

2017-11-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4173


---


[jira] [Resolved] (KAFKA-6156) JmxReporter can't handle windows style directory paths

2017-11-06 Thread Rajini Sivaram (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-6156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-6156.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 4173
[https://github.com/apache/kafka/pull/4173]

> JmxReporter can't handle windows style directory paths
> --
>
> Key: KAFKA-6156
> URL: https://issues.apache.org/jira/browse/KAFKA-6156
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 1.0.0
> Environment: JDK 8u152 (64 Bit), Windows 10 (64 Bit)
>Reporter: Kedar Joshi
>Assignee: huxihx
>Priority: Minor
> Fix For: 1.1.0
>
> Attachments: Kafka.log
>
>
> *com.yammer.metrics.reporting.JmxReporter* complains about special characters 
> in windows log directory path. Following warning is printed during startup -
> {noformat}
> [2017-11-01 20:39:24,567] INFO Loading logs. (kafka.log.LogManager)
> [2017-11-01 20:39:24,583] INFO Logs loading complete in 16 ms. 
> (kafka.log.LogManager)
> [2017-11-01 20:39:24,630] WARN Error processing 
> kafka.log:type=LogManager,name=LogDirectoryOffline,logDirectory=D:\tmp\kafka-logs
>  (com.yammer.metrics.reporting.JmxReporter)
> javax.management.MalformedObjectNameException: Invalid character ':' in value 
> part of property
> at javax.management.ObjectName.construct(ObjectName.java:618)
> at javax.management.ObjectName.(ObjectName.java:1382)
> at 
> com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)
> at 
> com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)
> at 
> com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)
> at 
> com.yammer.metrics.core.MetricsRegistry.newGauge(MetricsRegistry.java:79)
> at 
> kafka.metrics.KafkaMetricsGroup.newGauge(KafkaMetricsGroup.scala:74)
> at 
> kafka.metrics.KafkaMetricsGroup.newGauge$(KafkaMetricsGroup.scala:73)
> at kafka.log.LogManager.newGauge(LogManager.scala:50)
> at kafka.log.LogManager.$anonfun$new$1(LogManager.scala:122)
> at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:59)
> at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:52)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at kafka.log.LogManager.(LogManager.scala:116)
> at kafka.log.LogManager$.apply(LogManager.scala:814)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:222)
> at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
> at kafka.Kafka$.main(Kafka.scala:92)
> at kafka.Kafka.main(Kafka.scala)
> {noformat}
> Complete log is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [DISCUSS] KIP-220: Add AdminClient into Kafka Streams' ClientSupplier

2017-11-06 Thread Matthias J. Sax
Thanks for the update and clarification.

Sounds good to me :)


-Matthias



On 11/6/17 12:16 AM, Guozhang Wang wrote:
> Thanks Matthias,
> 
> 1) Updated the KIP page to include KAFKA-6126.
> 2) For passing configs, I agree, will make a pass over the existing configs
> passed to StreamsKafkaClient and update the wiki page accordingly, to
> capture all changes that would happen for the replacement in this single
> KIP.
> 3) For internal topic purging, I'm not sure if we need to include this as a
> public change since internal topics are meant for abstracted away from the
> Streams users; they should not leverage such internal topics elsewhere
> themselves. The only thing I can think of is for Kafka operators this would
> mean that such internal topics would be largely reduced in their footprint,
> but that would not be needed in the KIP as well.
> 
> 
> Guozhang
> 
> 
> On Sat, Nov 4, 2017 at 9:00 AM, Matthias J. Sax 
> wrote:
> 
>> I like this KIP. Can you also link to
>> https://issues.apache.org/jira/browse/KAFKA-6126 in the KIP?
>>
>> What I am wondering though: if we start to partially (ie, step by step)
>> replace the existing StreamsKafkaClient with Java AdminClient, don't we
>> need more KIPs? For example, if we use purge-api for internal topics, it
>> seems like a change that requires a KIP. Similar for passing configs --
>> the old client might have different config than the old client? Can we
>> double check this?
>>
>> Thus, it might make sense to replace the old client with the new one in
>> one shot.
>>
>>
>> -Matthias
>>
>> On 11/4/17 4:01 AM, Ted Yu wrote:
>>> Looks good overall.
>>>
>>> bq. the creation within StreamsPartitionAssignor
>>>
>>> Typo above: should be StreamPartitionAssignor
>>>
>>> On Fri, Nov 3, 2017 at 4:49 PM, Guozhang Wang 
>> wrote:
>>>
 Hello folks,

 I have filed a new KIP on adding AdminClient into Streams for internal
 topic management.

 Looking for feedback on

 *https://cwiki.apache.org/confluence/display/KAFKA/KIP-
 220%3A+Add+AdminClient+into+Kafka+Streams%27+ClientSupplier
 *

 --
 -- Guozhang

>>>
>>
>>
> 
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-06 Thread Matthias J. Sax
That's a good point. In current design, we strictly distinguish both.
For example, the reset tools deletes internal topics (starting with
prefix `-` and ending with either `-repartition` or
`-changelog`.

Thus, from my point of view, it would make sense to keep the current
distinction.

-Matthias

On 11/6/17 4:45 PM, Thomas Becker wrote:
> I think this sounds good as well. It's worth clarifying whether topics that 
> are named by the user but created by streams are considered "internal" topics 
> also.
> 
> On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:
> 
> My idea was, to relax the requirement for through() that a topic must be
> created manually before startup.
> 
> Thus, if no through() call is made, a (internal) topic is created the
> same way we do it currently.
> 
> If one uses `through(String topicName)` we keep the current behavior and
> require users to create the topic manually.
> 
> The reasoning is as follows: if a user creates a topic manually, a user
> can just use it for repartitioning. As the topic is already there, there
> is no need to specify any topic configs.
> 
> We add a new `through()` overload (details TBD) that allows to specify
> topic configs and Streams create the topic with those configs.
> 
> Reasoning: user don't want to manage topic manually, thus, it's still an
> internal topic and Streams create the topic name automatically as for
> all other internal topics. However, users gets some more control about
> topic parameters like number of partitions (we should discuss what other
> configs would be useful).
> 
> 
> Does this make sense?
> 
> 
> -Matthias
> 
> 
> On 11/5/17 1:21 AM, Jan Filipiak wrote:
> 
> 
> Hi.
> 
> 
> Im not 100 % up to date what version 1.0 DSL looks like ATM.
> I just would argue that repartitioning should be an own API call like
> through or something.
> One can use through or to already to get this. I would argue one should
> look there instead of overloads
> 
> Best Jan
> 
> On 04.11.2017 16:01, Jeyhun Karimov wrote:
> 
> 
> Dear community,
> 
> I would like to initiate discussion on KIP-221 [1] based on issue [2].
> Please feel free to comment.
> 
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Repartition+Topic+Hints+in+Streams
> 
> [2] https://issues.apache.org/jira/browse/KAFKA-6037
> 
> 
> 
> Cheers,
> Jeyhun
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> This email and any attachments may contain confidential and privileged 
> material for the sole use of the intended recipient. Any review, copying, or 
> distribution of this email (or any attachments) by others is prohibited. If 
> you are not the intended recipient, please contact the sender immediately and 
> permanently delete this email and any attachments. No employee or agent of 
> TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo 
> Inc. by email. Binding agreements with TiVo Inc. may only be made by a signed 
> written agreement.
> 



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-06 Thread Thomas Becker
I think this sounds good as well. It's worth clarifying whether topics that are 
named by the user but created by streams are considered "internal" topics also.

On Sun, 2017-11-05 at 23:02 +0100, Matthias J. Sax wrote:

My idea was, to relax the requirement for through() that a topic must be
created manually before startup.

Thus, if no through() call is made, a (internal) topic is created the
same way we do it currently.

If one uses `through(String topicName)` we keep the current behavior and
require users to create the topic manually.

The reasoning is as follows: if a user creates a topic manually, a user
can just use it for repartitioning. As the topic is already there, there
is no need to specify any topic configs.

We add a new `through()` overload (details TBD) that allows to specify
topic configs and Streams create the topic with those configs.

Reasoning: user don't want to manage topic manually, thus, it's still an
internal topic and Streams create the topic name automatically as for
all other internal topics. However, users gets some more control about
topic parameters like number of partitions (we should discuss what other
configs would be useful).


Does this make sense?


-Matthias


On 11/5/17 1:21 AM, Jan Filipiak wrote:


Hi.


Im not 100 % up to date what version 1.0 DSL looks like ATM.
I just would argue that repartitioning should be an own API call like
through or something.
One can use through or to already to get this. I would argue one should
look there instead of overloads

Best Jan

On 04.11.2017 16:01, Jeyhun Karimov wrote:


Dear community,

I would like to initiate discussion on KIP-221 [1] based on issue [2].
Please feel free to comment.

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-221%3A+Repartition+Topic+Hints+in+Streams

[2] https://issues.apache.org/jira/browse/KAFKA-6037



Cheers,
Jeyhun











This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.


Re: [DISCUSS] KIP-219 - Improve Quota Communication

2017-11-06 Thread Rajini Sivaram
Hi Becket,

2. Perhaps it is fine, though it may be better to limit throttle time to
*connections.max.idle.ms* , which is
typically a rather large number.

3. The maximum CLOSE_WAIT time is currently bounded by *connections.max.idle.ms
*. It is a large value, but we will
close client
connections after that even if the connection has been muted. The KIP
proposes to disable idle check for throttled clients and that could be an
issue.

4. I agree that it is hard to come up with a limit for throttle time that
works for all cases. And this is particularly hard since we need a set of
timeouts, sizes and quota limits which all make sense together. I think
Jun's suggestion of basing this limit on the total metric window size makes
sense because we currently don't have any way of enforcing quotas beyond
that (since you can always create a new connection to bypass quotas for
which we dont have any previous metrics).


4a) For request quotas, we already limit throttle time to quota window size.


4b) For fetch requests, limiting throttle time to total metric window size
and returning a smaller amount of data in each response could work well. This
reduces spikes in traffic and would allow clients to heartbeat, rebalance
etc. The minimum achievable quota would then be based on maximum message
size rather than maximum request size. Perhaps that would be sufficient?


4c) For produce requests, I agree that just a bound on throttle time is not
sufficient.


   - We need a solution to improve flow control for well-behaved clients
   which currently rely entirely on broker's throttling. The KIP addresses
   this using co-operative clients that sleep for an unbounded throttle time.
   I feel this is not ideal since the result is traffic with a lot of spikes.
   Feedback from brokers to enable flow control in the client is a good idea,
   but clients with excessive throttle times should really have been
   configured with smaller batch sizes.
   - We need a solution to enforce smaller quotas to protect the broker
   from misbehaving clients. The KIP addresses this by muting channels for an
   unbounded time. This introduces problems of channels in CLOSE_WAIT. And
   doesn't really solve all issues with misbehaving clients since new
   connections can be created to bypass quotas.

I feel that it is a bit difficult to get this right without addressing the
issue with new connections that bypass quotas (issue in the current
implementation rather than the KIP).

Regards,

Rajini


On Sat, Nov 4, 2017 at 3:11 AM, Becket Qin  wrote:

> Thanks Rajini.
>
> 1. Good point. We do need to bump up the protocol version so that the new
> clients do not wait for another throttle time when they are talking to old
> brokers. I'll update the KIP.
>
> 2. That is true. But the client was not supposed to send request to the
> broker during that period anyways. So detecting the broker failure later
> seems fine?
>
> 3. Wouldn't the CLOSE_WAIT handler number be the same as the current state?
> Currently the broker will still mute the socket until it sends the response
> back. If the clients disconnect while they are being throttled, the closed
> socket will not be detected until the throttle time has passed.
>
> Jun also suggested to bound the time by metric.sample.window.ms in the
> ticket. I am not sure about the bound on throttle time. It seems a little
> difficult to come up with a good bound. If the bound is too large, it does
> not really help solve the various timeout issue we may face. If the bound
> is too low, the quota is essentially not honored. We may potentially treat
> different requests differently, but that seems too complicated and error
> prone.
>
> IMO, the key improvement we want to make is to tell the clients how long
> they will be throttled so the clients knows what happened so they can act
> accordingly instead of waiting naively. Muting the socket on the broker
> side is just in case of non-cooperating clients. For the existing clients,
> it seems this does not have much impact compare with what we have now.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Fri, Nov 3, 2017 at 3:09 PM, Rajini Sivaram 
> wrote:
>
> > Hi Becket,
> >
> > Thank you for the KIP. A few comments:
> >
> > 1.KIP says:  "*No public interface changes are needed. We only propose
> > behavior change on the broker side.*"
> >
> > But from the proposed changes, it sounds like clients will be updated to
> > wait for throttle-time before sending next response, and also not handle
> > idle disconnections during that time. Doesn't that mean that clients need
> > to know that the broker has sent the response before throttling,
> requiring
> > protocol/version change?
> >
> >
> > 2. At the moment, broker failures are detected by clients (and vice
> versa)
> > within connections.max.idle.ms. By removing this check for an unlimited
> > throttle time, 

Re: [DISCUSS] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2017-11-06 Thread Ted Yu
Please fill out Discussion thread and JIRA fields.

Thanks

On Mon, Nov 6, 2017 at 2:02 AM, Tom Bentley  wrote:

> Hi Jorge,
>
> Thanks for the KIP. A few initial comments:
>
> 1. The AdminClient doesn't have any API like `listConsumerGroups()`
> currently, so in general how does a client know the group ids it is
> interested in?
> 2. Could you fill in the API of DescribeConsumerGroupResult, just so
> everyone knows exactly what being proposed.
> 3. Can you describe the ConsumerGroupDescription class?
> 4. Probably worth mentioning that this will use
> DescribeGroupsRequest/Response, and also enumerating the error codes that
> can return (or, equivalently, enumerate the exceptions throw from the
> futures obtained from the DescribeConsumerGroupResult).
>
> Cheers,
>
> Tom
>
> On 6 November 2017 at 08:19, Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi everyone,
> >
> > I would like to start a discussion on KIP-222 [1] based on issue [2].
> >
> > Looking forward to feedback.
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=74686265
> > [2] https://issues.apache.org/jira/browse/KAFKA-6058
> >
> > Cheers,
> > Jorge.
> >
>


Random partitioner for kafka-confluent-dotnet

2017-11-06 Thread Dipan Shah
Hello,

We are using confluent-kafka-dotnet 0.9.5. The topic has 50 partitions but when 
I produce a message with the key as NULL, all the messages are going into the 
same partition.

I looked at Apache's documentation for Kafka and it seems that 
org.apache.kafka.clients.producer.Partitioner is used to set the Random or 
Default partitioner. But I am not able to find anything related to that in 
confluent-kafka-dotnet.

Can you please help me with this as to how I could set up Random or Round Robin 
partitioning here?

Thanks,

Dipan Shah


[jira] [Created] (KAFKA-6176) numDroppedMessages metric should not be incremented when no data are lost

2017-11-06 Thread Daniel Wojda (JIRA)
Daniel Wojda created KAFKA-6176:
---

 Summary: numDroppedMessages metric should not be incremented when 
no data are lost
 Key: KAFKA-6176
 URL: https://issues.apache.org/jira/browse/KAFKA-6176
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Affects Versions: 1.0.0
Reporter: Daniel Wojda
Priority: Minor


Mirror Maker can be configured to not lose the data when producing failed. 

However when Mirror Maker is configured correctly and no messages are lost, 
*numDroppedMessages* metric is increased in case of producer failure. It is 
misleading for people who monitor Mirror Maker. 

Could we increase that metric when messages are really dropped? Or at least 
change the name to "numProducerFailuers"?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #4181: KAFKA-6164: Shutdown quota managers if other compo...

2017-11-06 Thread rajinisivaram
GitHub user rajinisivaram opened a pull request:

https://github.com/apache/kafka/pull/4181

KAFKA-6164: Shutdown quota managers if other components fail to start



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rajinisivaram/kafka KAFKA-6164

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4181.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4181


commit 9c11aa563fe320c60aa3d75b34d5d368bf0bad0b
Author: Rajini Sivaram 
Date:   2017-11-06T10:49:21Z

KAFKA-6164: Shutdown quota managers if other components fail to start




---


Re: [VOTE] KIP-206: Add support for UUID serialization and deserialization

2017-11-06 Thread Jakub Scholz
Hi Ismael,

As described in the KIP in Rejected alternatives section - while the string
representation is the same on all platforms the binary representation
differs. It would be quite easy to add the binary encoding and decoding
support to the Java client. But it might make it more complicated to add
such support in other clients (with different language).

Regards
Jakub

On Mon, Nov 6, 2017 at 11:55 AM, Ismael Juma  wrote:

> Thanks for the KIP. It would be good to elaborate a little more on why the
> binary representation is not suitable. The difference between 16 and 36
> bytes is significant.
>
> Ismael
>
> On Mon, Nov 6, 2017 at 9:53 AM, Jakub Scholz  wrote:
>
> > Hi all,
> >
> > Just a friendly reminder that this is still up for vote. If you think
> this
> > is a good fetrue, please give it your vote.
> >
> > Regards
> > Jakub
> >
> > On Tue, Oct 3, 2017 at 11:24 PM, Jakub Scholz  wrote:
> >
> > > Hi,
> > >
> > > Since there were no further discussion points, I would like to start
> the
> > > voting for KIP-206.
> > >
> > > For more details about the KIP go to
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 206%3A+Add+support+for+UUID+serialization+and+deserialization
> > >
> > > Thanks & Regards
> > > Jakub
> > >
> >
>


Re: [VOTE] KIP-206: Add support for UUID serialization and deserialization

2017-11-06 Thread Jan Filipiak

non-binding of course

On 06.11.2017 12:42, Jan Filipiak wrote:

Id Say this is trivial enough for users to implement on their own?

-1 I guess

On 06.11.2017 10:53, Jakub Scholz wrote:

Hi all,

Just a friendly reminder that this is still up for vote. If you think 
this

is a good fetrue, please give it your vote.

Regards
Jakub

On Tue, Oct 3, 2017 at 11:24 PM, Jakub Scholz  wrote:


Hi,

Since there were no further discussion points, I would like to start 
the

voting for KIP-206.

For more details about the KIP go to
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
206%3A+Add+support+for+UUID+serialization+and+deserialization

Thanks & Regards
Jakub







Re: [VOTE] KIP-206: Add support for UUID serialization and deserialization

2017-11-06 Thread Jan Filipiak

Id Say this is trivial enough for users to implement on their own?

-1 I guess

On 06.11.2017 10:53, Jakub Scholz wrote:

Hi all,

Just a friendly reminder that this is still up for vote. If you think this
is a good fetrue, please give it your vote.

Regards
Jakub

On Tue, Oct 3, 2017 at 11:24 PM, Jakub Scholz  wrote:


Hi,

Since there were no further discussion points, I would like to start the
voting for KIP-206.

For more details about the KIP go to
https://cwiki.apache.org/confluence/display/KAFKA/KIP-
206%3A+Add+support+for+UUID+serialization+and+deserialization

Thanks & Regards
Jakub





Re: [VOTE] KIP-206: Add support for UUID serialization and deserialization

2017-11-06 Thread Ismael Juma
Thanks for the KIP. It would be good to elaborate a little more on why the
binary representation is not suitable. The difference between 16 and 36
bytes is significant.

Ismael

On Mon, Nov 6, 2017 at 9:53 AM, Jakub Scholz  wrote:

> Hi all,
>
> Just a friendly reminder that this is still up for vote. If you think this
> is a good fetrue, please give it your vote.
>
> Regards
> Jakub
>
> On Tue, Oct 3, 2017 at 11:24 PM, Jakub Scholz  wrote:
>
> > Hi,
> >
> > Since there were no further discussion points, I would like to start the
> > voting for KIP-206.
> >
> > For more details about the KIP go to
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 206%3A+Add+support+for+UUID+serialization+and+deserialization
> >
> > Thanks & Regards
> > Jakub
> >
>


[DISCUSS] 0.11.0.2 bug fix release

2017-11-06 Thread Rajini Sivaram
Hi all,


Since we have fixed some critical issues since 0.11.0.1, it must be time
for a 0.11.0.2 release.


Since the 0.11.0.1 release we've fixed 14 JIRAs that are targeted for
0.11.0.2:

https://issues.apache.org/jira/browse/KAFKA-6134?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.11.0.2



We have 11 outstanding issues that are targeted for 0.11.0.2:

https://issues.apache.org/jira/browse/KAFKA-6007?jql=project%20%3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20fixVersion%20%3D%200.11.0.2


Can the owners of these issues please resolve them soon or move them to a
future release?


I will aim to create the first RC for 0.11.0.2 this Friday.

Thank you!

Regards,

Rajini


Re: [DISCUSS] KIP-222 - Add "describe consumer group" to KafkaAdminClient

2017-11-06 Thread Tom Bentley
Hi Jorge,

Thanks for the KIP. A few initial comments:

1. The AdminClient doesn't have any API like `listConsumerGroups()`
currently, so in general how does a client know the group ids it is
interested in?
2. Could you fill in the API of DescribeConsumerGroupResult, just so
everyone knows exactly what being proposed.
3. Can you describe the ConsumerGroupDescription class?
4. Probably worth mentioning that this will use
DescribeGroupsRequest/Response, and also enumerating the error codes that
can return (or, equivalently, enumerate the exceptions throw from the
futures obtained from the DescribeConsumerGroupResult).

Cheers,

Tom

On 6 November 2017 at 08:19, Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi everyone,
>
> I would like to start a discussion on KIP-222 [1] based on issue [2].
>
> Looking forward to feedback.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=74686265
> [2] https://issues.apache.org/jira/browse/KAFKA-6058
>
> Cheers,
> Jorge.
>


Re: [VOTE] KIP-206: Add support for UUID serialization and deserialization

2017-11-06 Thread Jakub Scholz
Hi all,

Just a friendly reminder that this is still up for vote. If you think this
is a good fetrue, please give it your vote.

Regards
Jakub

On Tue, Oct 3, 2017 at 11:24 PM, Jakub Scholz  wrote:

> Hi,
>
> Since there were no further discussion points, I would like to start the
> voting for KIP-206.
>
> For more details about the KIP go to
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 206%3A+Add+support+for+UUID+serialization+and+deserialization
>
> Thanks & Regards
> Jakub
>


  1   2   >