date:20171116

See 


Changes:

[me] KAFKA-6218: Optimize condition in if statement to reduce the number of

--
[...truncated 385.85 KB...]
kafka.utils.CoreUtilsTest > testCircularIterator PASSED

kafka.utils.CoreUtilsTest > testReadBytes STARTED

kafka.utils.CoreUtilsTest > testReadBytes PASSED

kafka.utils.CoreUtilsTest > testCsvList STARTED

kafka.utils.CoreUtilsTest > testCsvList PASSED

kafka.utils.CoreUtilsTest > testReadInt STARTED

kafka.utils.CoreUtilsTest > testReadInt PASSED

kafka.utils.CoreUtilsTest > testAtomicGetOrUpdate STARTED

kafka.utils.CoreUtilsTest > testAtomicGetOrUpdate PASSED

kafka.utils.CoreUtilsTest > testUrlSafeBase64EncodeUUID STARTED

kafka.utils.CoreUtilsTest > testUrlSafeBase64EncodeUUID PASSED

kafka.utils.CoreUtilsTest > testCsvMap STARTED

kafka.utils.CoreUtilsTest > testCsvMap PASSED

kafka.utils.CoreUtilsTest > testInLock STARTED

kafka.utils.CoreUtilsTest > testInLock PASSED

kafka.utils.CoreUtilsTest > testSwallow STARTED

kafka.utils.CoreUtilsTest > testSwallow PASSED

kafka.utils.IteratorTemplateTest > testIterator STARTED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.utils.json.JsonValueTest > testJsonObjectIterator STARTED

kafka.utils.json.JsonValueTest > testJsonObjectIterator PASSED

kafka.utils.json.JsonValueTest > testDecodeLong STARTED

kafka.utils.json.JsonValueTest > testDecodeLong PASSED

kafka.utils.json.JsonValueTest > testAsJsonObject STARTED

kafka.utils.json.JsonValueTest > testAsJsonObject PASSED

kafka.utils.json.JsonValueTest > testDecodeDouble STARTED

kafka.utils.json.JsonValueTest > testDecodeDouble PASSED

kafka.utils.json.JsonValueTest > testDecodeOption STARTED

kafka.utils.json.JsonValueTest > testDecodeOption PASSED

kafka.utils.json.JsonValueTest > testDecodeString STARTED

kafka.utils.json.JsonValueTest > testDecodeString PASSED

kafka.utils.json.JsonValueTest > testJsonValueToString STARTED

kafka.utils.json.JsonValueTest > testJsonValueToString PASSED

kafka.utils.json.JsonValueTest > testAsJsonObjectOption STARTED

kafka.utils.json.JsonValueTest > testAsJsonObjectOption PASSED

kafka.utils.json.JsonValueTest > testAsJsonArrayOption STARTED

kafka.utils.json.JsonValueTest > testAsJsonArrayOption PASSED

kafka.utils.json.JsonValueTest > testAsJsonArray STARTED

kafka.utils.json.JsonValueTest > testAsJsonArray PASSED

kafka.utils.json.JsonValueTest > testJsonValueHashCode STARTED

kafka.utils.json.JsonValueTest > testJsonValueHashCode PASSED

kafka.utils.json.JsonValueTest > testDecodeInt STARTED

kafka.utils.json.JsonValueTest > testDecodeInt PASSED

kafka.utils.json.JsonValueTest > testDecodeMap STARTED

kafka.utils.json.JsonValueTest > testDecodeMap PASSED

kafka.utils.json.JsonValueTest > testDecodeSeq STARTED

kafka.utils.json.JsonValueTest > testDecodeSeq PASSED

kafka.utils.json.JsonValueTest > testJsonObjectGet STARTED

kafka.utils.json.JsonValueTest > testJsonObjectGet PASSED

kafka.utils.json.JsonValueTest > testJsonValueEquals STARTED

kafka.utils.json.JsonValueTest > testJsonValueEquals PASSED

kafka.utils.json.JsonValueTest > testJsonArrayIterator STARTED

kafka.utils.json.JsonValueTest > testJsonArrayIterator PASSED

kafka.utils.json.JsonValueTest > testJsonObjectApply STARTED

kafka.utils.json.JsonValueTest > testJsonObjectApply PASSED

kafka.utils.json.JsonValueTest > testDecodeBoolean STARTED

kafka.utils.json.JsonValueTest > testDecodeBoolean PASSED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic STARTED

kafka.producer.AsyncProducerTest > testFailedSendRetryLogic PASSED

kafka.producer.AsyncProducerTest > testQueueTimeExpired STARTED

kafka.producer.AsyncProducerTest > testQueueTimeExpired PASSED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents STARTED

kafka.producer.AsyncProducerTest > testPartitionAndCollateEvents PASSED

kafka.producer.AsyncProducerTest > testBatchSize STARTED

kafka.producer.AsyncProducerTest > testBatchSize PASSED

kafka.producer.AsyncProducerTest > testSerializeEvents STARTED

kafka.producer.AsyncProducerTest > testSerializeEvents PASSED

kafka.producer.AsyncProducerTest > testProducerQueueSize STARTED

kafka.producer.AsyncProducerTest > testProducerQueueSize PASSED

kafka.producer.AsyncProducerTest > testRandomPartitioner STARTED

kafka.producer.AsyncProducerTest > testRandomPartitioner PASSED

kafka.producer.AsyncProducerTest > testInvalidConfiguration STARTED

kafka.producer.AsyncProducerTest > testInvalidConfiguration PASSED

kafka.producer.AsyncProducerTest > testInvalidPartition STARTED

kafka.producer.AsyncProducerTest > testInvalidPartition PASSED

kafka.producer.AsyncProducerTest > testNoBroker STARTED

kafka.producer.AsyncProducerTest > testNoBroker PASSED

kafka.producer.AsyncProducerTest > testProduceAfterClosed STARTED

kafka.producer.AsyncProducerTest > testProduceAfterClosed PASSED

kafka.producer.AsyncProducerTest >

[GitHub] kafka pull request #4227: MINOR: Log unexpected exceptions in Connect REST c...

2017-11-16 Thread ewencp

GitHub user ewencp opened a pull request:

https://github.com/apache/kafka/pull/4227

MINOR: Log unexpected exceptions in Connect REST calls that generate 500s 
at a higher log level

The ConnectExceptionMapper was originally intended to handle 
ConnectException errors for some expected cases where we just want to always 
convert them to a certain response and the ExceptionMapper was the easiest way 
to do that uniformly across the API. However, in the case that it's not an 
expected subclass, we should log the information at the error level so the user 
can track down the cause of the error.

This is only an initial improvement. We should probably also add a more 
general ExceptionMapper to handle other exceptions we may not have caught and 
converted to ConnectException.

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ewencp/kafka better-connect-error-logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4227


commit a2d750a87efc2ac49d6f0b7954488a7deb9f47cc
Author: Ewen Cheslack-Postava 
Date:   2017-11-16T23:45:36Z

MINOR: Log unexpected exceptions in Connect REST calls that generate 500s 
at a higher log level




---

Re: [DISCUSS] KIP-219 - Improve Quota Communication

2017-11-16 Thread Becket Qin

Thanks Rajini,

I updated the KIP wiki to clarify the scope of the KIP. To summarize, the
current quota has a few caveats:
1. The brokers are only throttling the NEXT request even if the current
request is already violating quota. So the broker will always process at
least one request from the client.
2. The brokers are not able to know the client id until they fully read the
request out of the sockets even if that client is being throttled.
3. The brokers are not communicating to the clients promptly, so the
clients have to blindly wait and sometimes times out unnecessarily.

This KIP only tries to address 3 but not 1 and 2 because A) those two
issues are sort of orthogonal to 3 and B) the solution to 1 and 2 are much
more complicated and worth a separate discussion.

I'll wait till tomorrow and start a voting thread if there are further
concerns raised about the KIP.

Thanks,

Jiangjie (Becket) Qin

On Thu, Nov 16, 2017 at 11:47 AM, Rajini Sivaram 
wrote:

> Hi Becket,
>
> The current user quota doesn't solve the problem. But I was thinking that
> if we could ensure we don't read more from the network than the quota
> allows, we may be able to fix the issue in a different way (throttling all
> connections, each for a limited time prior to reading large requests). But
> it would be more complex (and even more messy for client-id quotas), so I
> can understand why the solution you proposed in the KIP makes sense for the
> scenario that you described.
>
> Regards,
>
> Rajini
>
> On Tue, Nov 14, 2017 at 11:30 PM, Becket Qin  wrote:
>
> > Hi Rajini,
> >
> > We are using SSL so we could use user quota. But I am not sure if that
> > would solve the problem. The key issue in our case is that each broker
> can
> > only handle ~300 MB/s of incoming bytes, but the MapReduce job is trying
> to
> > push 1-2 GB/s, unless we can throttle the clients to 300 MB/s, the broker
> > cannot sustain. In order to do that, we need to be able to throttle
> > requests for more than request timeout, potentially a few minutes. It
> seems
> > neither user quota nor limited throttle time can achieve this.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, Nov 14, 2017 at 7:44 AM, Rajini Sivaram  >
> > wrote:
> >
> > > Hi Becket,
> > >
> > > For the specific scenario that you described, would it be possible to
> use
> > > user quotas rather than client-id quotas? With user quotas, perhaps we
> > can
> > > throttle more easily before reading requests as well (as you mentioned,
> > the
> > > difficulty with client-id quota is that we have to read partial
> requests
> > > and handle client-ids at network layer making that a much bigger
> change).
> > > If your clients are using SASL/SSL, I was wondering whether a solution
> > that
> > > improves user quotas and limits throttle time would work for you.
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> > >
> > >
> > > On Thu, Nov 9, 2017 at 12:59 AM, Becket Qin 
> > wrote:
> > >
> > > > Since we will bump up the wire request version, another option is for
> > > > clients that are sending old request versions the broker can just
> keep
> > > the
> > > > current behavior. For clients sending the new request versions, the
> > > broker
> > > > can respond then mute the channel as described in the KIP wiki. In
> this
> > > > case, muting the channel is mostly for protection purpose. A
> correctly
> > > > implemented client should back off for throttle time before sending
> the
> > > > next request. The downside is that the broker needs to keep both
> logic
> > > and
> > > > it seems not gaining much benefit. So personally I prefer to just
> mute
> > > the
> > > > channel. But I am open to different opinions.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Mon, Nov 6, 2017 at 7:28 PM, Becket Qin 
> > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Hmm, even if a connection is closed by the client when the channel
> is
> > > > > muted. After the channel is unmuted, it seems Selector.select()
> will
> > > > detect
> > > > > this and close the socket.
> > > > > It is true that before the channel is unmuted the socket will be
> in a
> > > > > CLOSE_WAIT state though. So having an arbitrarily long muted
> duration
> > > may
> > > > > indeed cause problem.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > > On Mon, Nov 6, 2017 at 7:22 PM, Becket Qin 
> > > wrote:
> > > > >
> > > > >> Hi Rajini,
> > > > >>
> > > > >> Thanks for the detail explanation. Please see the reply below:
> > > > >>
> > > > >> 2. Limiting the throttle time to connection.max.idle.ms on the
> > broker
> > > > >> side is probably fine. However, clients may have a different
> > > > configuration
> > > > >> of connection.max.idle.ms and still reconnect before the throttle
> > > time
> > > > >> (which is the server

[jira] [Resolved] (KAFKA-6218) Optimize condition in if statement to reduce the number of comparisons

2017-11-16 Thread Ewen Cheslack-Postava (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ewen Cheslack-Postava resolved KAFKA-6218.
--
Resolution: Fixed
  Reviewer: Ewen Cheslack-Postava

> Optimize condition in if statement to reduce the number of comparisons
> --
>
> Key: KAFKA-6218
> URL: https://issues.apache.org/jira/browse/KAFKA-6218
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.1.0, 
> 0.10.1.1, 0.10.2.0, 0.10.2.1, 0.11.0.0, 0.11.0.1, 1.0.0
>Reporter: sachin bhalekar
>Assignee: sachin bhalekar
>Priority: Trivial
>  Labels: newbie
> Fix For: 1.1.0
>
> Attachments: kafka_optimize_if.JPG
>
>
> Optimizing the condition in *if *statement 
> *(schema.name() == null || !(schema.name().equals(LOGICAL_NAME)))* which
> requires two comparisons in worst case with
> *(!LOGICAL_NAME.equals(schema.name()))*  which requires single comparison
> in all cases and _avoids null pointer exception_.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] kafka pull request #4225: KAFKA-6218 : Optimize condition in if statement to...

2017-11-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4225


---

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-16 Thread Matthias J. Sax

Hi,

I am just catching up on this discussion and did re-read the KIP and
discussion thread.

In contrast to you, I prefer the second approach with CombinedKey as
return type for the following reasons:

 1) the oneToManyJoin() method had less parameter
 2) those parameters are easy to understand
 3) we hide implementation details (joinPrefixFaker, leftKeyExtractor,
and the return type KO leaks internal implementation details from my
point of view)
 4) user can get their own KO type by extending CombinedKey interface
(this would also address the nesting issue Trevor pointed out)

That's unclear to me is, why you care about JSON serdes? What is the
problem with regard to prefix? It seems I am missing something here.

I also don't understand the argument about "the user can stick with his
default serde or his standard way of serializing"? If we have
`CombinedKey` as output, the use just provide the serdes for both input
combined-key types individually, and we can reuse both internally to do
the rest. This seems to be a way simpler API. With the KO output type
approach, users need to write an entirely new serde for KO in contrast.

Finally, @Jan, there are still some open comments you did not address
and the KIP wiki page needs some updates. Would be great if you could do
this.

Can you also explicitly describe the data layout of the store that is
used to do the range scans?

Additionally:

-> some arrows in the algorithm diagram are missing
-> was are those XXX in the diagram
-> can you finish the "Step by Step" example
-> it think the relationships between the different used types, K0,K1,KO
should be explains explicitly (all information is there implicitly, but
one need to think hard to figure it out)


Last but not least:

> But noone is really interested. 

Don't understand this statement...



-Matthias


On 11/16/17 9:05 AM, Jan Filipiak wrote:
> We are running this perfectly fine. for us the smaller table changes
> rather infrequent say. only a few times per day. The performance of the
> flush is way lower than the computing power you need to bring to the
> table to account for all the records beeing emmited after the one single
> update.
> 
> On 16.11.2017 18:02, Trevor Huey wrote:
>> Ah, I think I see the problem now. Thanks for the explanation. That is
>> tricky. As you said, it seems the easiest solution would just be to
>> flush the cache. I wonder how big of a performance hit that'd be...
>>
>> On Thu, Nov 16, 2017 at 9:07 AM Jan Filipiak > > wrote:
>>
>>     Hi Trevor,
>>
>>     I am leaning towards the less intrusive approach myself. Infact
>>     that is how we implemented our Internal API for this and how we
>>     run it in production.
>>     getting more voices towards this solution makes me really happy.
>>     The reason its a problem for Prefix and not for Range is the
>>     following. Imagine the intrusive approach. They key of the RockDB
>>     would be CombinedKey and the prefix scan would take an A, and
>>     the range scan would take an CombinedKey still. As you can
>>     see with the intrusive approach the keys are actually different
>>     types for different queries. With the less intrusive apporach we
>>     use the same type and rely on Serde Invariances. For us this works
>>     nice (protobuf) might bite some JSON users.
>>
>>     Hope it makes it clear
>>
>>     Best Jan
>>
>>
>>     On 16.11.2017 16:39, Trevor Huey wrote:
>>>     1. Going over KIP-213, I am leaning toward the "less intrusive"
>>>     approach. In my use case, I am planning on performing a sequence
>>>     of several oneToMany joins, From my understanding, the more
>>>     intrusive approach would result in several nested levels of
>>>     CombinedKey's. For example, consider Tables A, B, C, D with
>>>     corresponding keys KA, KB, KC. Joining A and B would produce
>>>     CombinedKey. Then joining that result on C would produce
>>>     CombinedKey>. My "keyOtherSerde" in this
>>>     case would need to be capable of deserializing CombinedKey>>     KB>. This would just get worse the more tables I join. I realize
>>>     that it's easier to shoot yourself in the foot with the less
>>>     intrusive approach, but as you said, " the user can stick with
>>>     his default serde or his standard way of serializing". In the
>>>     simplest case where the keys are just strings, they can do simple
>>>     string concatenation and Serdes.String(). It also allows the user
>>>     to create and use their own version of CombinedKey if they feel
>>>     so inclined.
>>>
>>>     2. Why is there a problem for prefix, but not for range?
>>>    
>>> https://github.com/apache/kafka/pull/3720/files#diff-8f863b74c3c5a0b989e89d00c149aef1L162
>>>
>>>
>>>
>>>     On Thu, Nov 16, 2017 at 2:57 AM Jan Filipiak
>>>     > wrote:
>>>
>>>     Hi Trevor,
>>>
>>>

Jenkins build is back to normal : kafka-trunk-jdk8 #2216

See

Re: [DISCUSS] KIP-221: Repartition Topic Hints in Streams

2017-11-16 Thread Matthias J. Sax

@Jan:

The `Produced` class was introduced in 1.0 to specify key and valud
Serdes (and partitioner) if data is written into a topic.

Old API:

KStream#to("topic", keySerde, valueSerde);

New API:

KStream#to("topic", Produced.with(keySerde, valueSerde));


This allows to reduce the number of overloads for `to()` (and
`through()` that follows the same pattern) -- the second parameter is
used to cover all different variations of option parameters users can
specify, while we only have 2 overload for `to()` itself.

What is still unclear to me it, what you mean by this topic prefix
thing? Either a user cares about the topic name and thus, must create
and manage it manually. Or the user does not care, and Streams create
it. How would this prefix idea fit in here?



@Guozhang:

My idea was to extend `Produced` with the hint we want to give for
creating internal topic and pass a optional `Produced` parameter. There
are multiple things we can do here:

1) stream.through(null, Produced...).groupBy().aggregate()
-> just allow for `null` topic name indicating that Streams should
create an internal topic

2) stream.through(Produced...).groupBy().aggregate()
-> add one overload taking an mandatory `Produced`

We use `Serialized` to picky back the information

3) stream.groupBy(Serialized...).aggregate()
and stream.groupByKey(Serialized...).aggregate()
-> we don't need new top level overloads


There are different trade-offs for those alternatives and maybe there
are other ways to change the API. It's just to push the discussion further.


-Matthias

On 11/12/17 1:22 PM, Jan Filipiak wrote:
> Hi Gouzhang,
> 
> this felt like these questions are supposed to be answered by me.
> I do not understand the first one. I don't understand why the user
> shouldn't be able to specify a suffix for the topic name.
> 
>  For the third question I am not 100% familiar if the Produced class
> came to existence
> at all. I remember proposing it somewhere in our redo DSL discussion that
> I dropped out of later. Finally any call that does:
> 
> 1. create the internal topic
> 2. register sink
> 3. register source
> 
> will always get the work done. If we have a Produced like class. putting
> all the parameters
> in there make sense. (Partitioner, serde, PartitionHint, internal, name
> ... )
> 
> Hope this helps?
> 
> 
> On 10.11.2017 07:54, Guozhang Wang wrote:
>> A few clarification questions on the proposal details.
>>
>> 1. API: although the repartition only happens at the final stateful
>> operations like agg / join, the repartition flag info was actually passed
>> from an earlier operator like map / groupBy. So what should be the new
>> API
>> look like? For example, if we do
>>
>> stream.groupBy().through("topic-name", Produced..).aggregate
>>
>> This would be add a bunch of APIs to GroupedKStream/KTable
>>
>> 2. Semantics: as Matthias mentioned, today any topics defined in
>> "through()" call is considered a user topic, and hence users are
>> responsible for managing them, including the topic name. For this KIP's
>> purpose, though, users would not care about the topic name. I.e. as a
>> user
>> I still want to make it be an internal topic so that I do not need to
>> worry
>> about it at all, but only specify num.partitions.
>>
>> 3. Details: in Produced we do not have specs for specifying the
>> num.partitions or should we repartition or not. So it is still not
>> clear to
>> me how we would make use of that to achieve what's in the old
>> proposal's RepartitionHint class.
>>
>>
>>
>> Guozhang
>>
>>
>> On Mon, Nov 6, 2017 at 1:21 PM, Ted Yu  wrote:
>>
>>> bq. enlarge the score of through()
>>>
>>> I guess you meant scope.
>>>
>>> On Mon, Nov 6, 2017 at 1:15 PM, Jeyhun Karimov 
>>> wrote:
>>>
 Hi,

 Sorry for the late reply. I am convinced that we should enlarge the
 score
 of through() (add more overloads) instead of introducing a separate set
>>> of
 overloads to other methods.
 I will update the KIP soon based on the discussion and inform.


 Cheers,
 Jeyhun

 On Mon, Nov 6, 2017 at 9:18 PM Jan Filipiak 
 wrote:

> Sorry for not beeing 100% up to date.
> Back then we had the discussion that when an operation puts a >Sink<
> into the topology, a >Produced<
> parameter is added. This produced parameter could have internal or
> external. If internal I think the name would still make
> a great suffix for the topic name
>
> Is this plan still around? Otherwise having the name as suffix is
> probably always good it can help the user quicker to identify hot
>>> topics
> that need more
> partitions if he has many of these internal repartitions
>
> Best Jan
>
>
> On 06.11.2017 20:13, Matthias J. Sax wrote:
>> I absolute agree with what you say. It's not a requirement to
>>> specify a
>> topic name -- and this was the idea -- if user does

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-16 Thread Matthias J. Sax

Any thoughts about my latest proposal?

-Matthias

On 11/10/17 10:02 PM, Jan Filipiak wrote:
> Hi,
> 
> i think this is the better way. Naming is always tricky Source is kinda
> taken
> I had TopicBackedK[Source|Table] in mind
> but for the user its way better already IMHO
> 
> Thank you for reconsideration
> 
> Best Jan
> 
> 
> On 10.11.2017 22:48, Matthias J. Sax wrote:
>> I was thinking about the source stream/table idea once more and it seems
>> it would not be too hard to implement:
>>
>> We add two new classes
>>
>>    SourceKStream extends KStream
>>
>> and
>>
>>    SourceKTable extend KTable
>>
>> and return both from StreamsBuilder#stream and StreamsBuilder#table
>>
>> As both are sub-classes, this change is backward compatible. We change
>> the return type for any single-record transform to this new types, too,
>> and use KStream/KTable as return type for any multi-record operation.
>>
>> The new RecordContext API is added to both new classes. For old classes,
>> we only implement KIP-149 to get access to the key.
>>
>>
>> WDYT?
>>
>>
>> -Matthias
>>
>> On 11/9/17 9:13 PM, Jan Filipiak wrote:
>>> Okay,
>>>
>>> looks like it would _at least work_ for Cached KTableSources .
>>> But we make it harder to the user to make mistakes by putting
>>> features into places where they don't make sense and don't
>>> help anyone.
>>>
>>> I once again think that my suggestion is easier to implement and
>>> more correct. I will use this email to express my disagreement with the
>>> proposed KIP (-1 non binding of course) state that I am open for any
>>> questions
>>> regarding this. I will also do the usual thing and point out that the
>>> friends
>>> over at Hive got it correct aswell.
>>> One can not user their
>>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns
>>>
>>>
>>> in any place where its not read from the Sources.
>>>
>>> With KSQl in mind it makes me sad how this is evolving here.
>>>
>>> Best Jan
>>>
>>>
>>>
>>>
>>>
>>> On 10.11.2017 01:06, Guozhang Wang wrote:
 Hello Jan,

 Regarding your question about caching: today we keep the record context
 with the cached entry already so when we flush the cache which may
 generate
 new records forwarding we will set the record context appropriately;
 and
 then after the flush is completed we will reset the context to the
 record
 before the flush happens. But I think when Jeyhun did the PR it is a
 good
 time to double check on such stages to make sure we are not
 introducing any
 regressions.


 Guozhang


 On Mon, Nov 6, 2017 at 8:54 PM, Jan Filipiak 
 wrote:

> I Aggree completely.
>
> Exposing this information in a place where it has no _natural_
> belonging
> might really be a bad blocker in the long run.
>
> Concerning your first point. I would argue its not to hard to have a
> user
> keep track of these. If we still don't want the user
> to keep track of these I would argue that all > projection only <
> transformations on a Source-backed KTable/KStream
> could also return a Ktable/KStream instance of the type we return
> from the
> topology builder.
> Only after any operation that exceeds projection or filter one would
> return a KTable not granting access to this any longer.
>
> Even then its difficult already: I never ran a topology with caching
> but I
> am not even 100% sure what the record Context means behind
> a materialized KTable with Caching? Topic and Partition are probably
> with
> some reasoning but offset is probably only the offset causing the
> flush?
> So one might aswell think to drop offsets from this RecordContext.
>
> Best Jan
>
>
>
>
>
>
>
> On 07.11.2017 03:18, Guozhang Wang wrote:
>
>> Regarding the API design (the proposed set of overloads v.s. one
>> overload
>> on #map to enrich the record), I think what we have represents a good
>> trade-off between API succinctness and user convenience: on one
>> hand we
>> definitely want to keep as fewer overloaded functions as possible.
>> But on
>> the other hand if we only do that in, say, the #map() function then
>> this
>> enrichment could be an overkill: think of a topology that has 7
>> operators
>> in a chain, where users want to access the record context on
>> operator #2
>> and #6 only, with the "enrichment" manner they need to do the
>> enrichment
>> on
>> operator #2 and keep it that way until #6. In addition, the
>> RecordContext
>> fields (topic, offset, etc) are really orthogonal to the key-value
>> payloads
>> themselves, so I think separating them into this object is a cleaner
>> way.
>>
>> Regarding the RecordContext inheritance, this is actually a good
>> point
>> that
>> have

[GitHub] kafka pull request #4226: MINOR: Arrays.toList replaced with Collections.sin...

2017-11-16 Thread KoenDG

GitHub user KoenDG opened a pull request:

https://github.com/apache/kafka/pull/4226

MINOR: Arrays.toList replaced with Collections.singletonList() where 
possible.

This is something I did after my working hours, I would ask people 
reviewing this do the same, don't take time for this during your work hours. 
It's not actual functionality, it is an internal rewrite based on suggestions 
provided by the static code analysis built into the Intellij IDE.

I try to keep such a PR as limited as possible, for clarity of reading.

==

In places where Arrays.asList() is given exactly 1 argument, replace it 
with Collections.singletonList().

Internally, this is a smaller object, so it uses a bit less memory at 
runtime.
An important thing to note is that you cannot add anything to such a list. 
So if it is returned through a method into a random `List<>` object, performing 
`add()` on it will throw an exception.
I checked every usage of all replaced instances, they're only ever returned 
to be read, so this should not occur. A

One might say this is a micro-optimization, I would say that's true, it is.
It's up to the maintainers whether or not they want this in. Personally I 
checked all code usages and did not find a point where an exception would be 
caused because of these changes. So it should not break anything and provide a 
tiny improvement in terms of memory footprint at runtime.

*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/KoenDG/kafka singletonlist

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4226.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4226


commit 772d65f2411aac1f78b12cd4a52547a48fa8f9f4
Author: KoenDG 
Date:   2017-11-16T19:42:54Z

In places where Arrays.asList() is given exactly 1 argument, replace it 
with Collections.singletonList().

Internally, this is a smaller object, so it uses a bit less memory at 
runtime.
An important thing to note is that you cannot add anything to such a list. 
So if it is returned through a method into a random List<> object, performing 
add() on it will throw an exception.
I checked every usage of all replaced instances, they're only ever returned 
to be read, so this should not occur.

One might say this is a micro-optimization, I would say that's true, it is.
It's up to the maintainers whether or not they want this in. Personally I 
checked all code usages and did not find a point where an exception would be 
caused because of these changes. So it should not break anything and provide a 
tiny improvement in terms of memory footprint at runtime.

Import changes due to Intellij squashing imports into a * import. Had to 
update my settings so it won't happen again.




---

Re: [DISCUSS] KIP-219 - Improve Quota Communication

Hi Becket,

The current user quota doesn't solve the problem. But I was thinking that
if we could ensure we don't read more from the network than the quota
allows, we may be able to fix the issue in a different way (throttling all
connections, each for a limited time prior to reading large requests). But
it would be more complex (and even more messy for client-id quotas), so I
can understand why the solution you proposed in the KIP makes sense for the
scenario that you described.

Regards,

Rajini

On Tue, Nov 14, 2017 at 11:30 PM, Becket Qin  wrote:

> Hi Rajini,
>
> We are using SSL so we could use user quota. But I am not sure if that
> would solve the problem. The key issue in our case is that each broker can
> only handle ~300 MB/s of incoming bytes, but the MapReduce job is trying to
> push 1-2 GB/s, unless we can throttle the clients to 300 MB/s, the broker
> cannot sustain. In order to do that, we need to be able to throttle
> requests for more than request timeout, potentially a few minutes. It seems
> neither user quota nor limited throttle time can achieve this.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Nov 14, 2017 at 7:44 AM, Rajini Sivaram 
> wrote:
>
> > Hi Becket,
> >
> > For the specific scenario that you described, would it be possible to use
> > user quotas rather than client-id quotas? With user quotas, perhaps we
> can
> > throttle more easily before reading requests as well (as you mentioned,
> the
> > difficulty with client-id quota is that we have to read partial requests
> > and handle client-ids at network layer making that a much bigger change).
> > If your clients are using SASL/SSL, I was wondering whether a solution
> that
> > improves user quotas and limits throttle time would work for you.
> >
> > Regards,
> >
> > Rajini
> >
> >
> >
> > On Thu, Nov 9, 2017 at 12:59 AM, Becket Qin 
> wrote:
> >
> > > Since we will bump up the wire request version, another option is for
> > > clients that are sending old request versions the broker can just keep
> > the
> > > current behavior. For clients sending the new request versions, the
> > broker
> > > can respond then mute the channel as described in the KIP wiki. In this
> > > case, muting the channel is mostly for protection purpose. A correctly
> > > implemented client should back off for throttle time before sending the
> > > next request. The downside is that the broker needs to keep both logic
> > and
> > > it seems not gaining much benefit. So personally I prefer to just mute
> > the
> > > channel. But I am open to different opinions.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Nov 6, 2017 at 7:28 PM, Becket Qin 
> wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Hmm, even if a connection is closed by the client when the channel is
> > > > muted. After the channel is unmuted, it seems Selector.select() will
> > > detect
> > > > this and close the socket.
> > > > It is true that before the channel is unmuted the socket will be in a
> > > > CLOSE_WAIT state though. So having an arbitrarily long muted duration
> > may
> > > > indeed cause problem.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Mon, Nov 6, 2017 at 7:22 PM, Becket Qin 
> > wrote:
> > > >
> > > >> Hi Rajini,
> > > >>
> > > >> Thanks for the detail explanation. Please see the reply below:
> > > >>
> > > >> 2. Limiting the throttle time to connection.max.idle.ms on the
> broker
> > > >> side is probably fine. However, clients may have a different
> > > configuration
> > > >> of connection.max.idle.ms and still reconnect before the throttle
> > time
> > > >> (which is the server side connection.max.idle.ms). It seems another
> > > back
> > > >> door for quota.
> > > >>
> > > >> 3. I agree we could just mute the server socket until
> > > >> connection.max.idle.ms if the massive CLOSE_WAIT is a big issue.
> This
> > > >> helps guarantee only connection_rate * connection.max.idle.ms
> sockets
> > > >> will be in CLOSE_WAIT state. For cooperative clients, unmuting the
> > > socket
> > > >> will not have negative impact.
> > > >>
> > > >> 4. My concern for capping the throttle time to metrics.window.ms is
> > > that
> > > >> we will not be able to enforce quota effectively. It might be useful
> > to
> > > >> explain this with a real example we are trying to solve. We have a
> > > >> MapReduce job pushing data to a Kafka cluster. The MapReduce job has
> > > >> hundreds of producers and each of them sends a normal sized
> > > ProduceRequest
> > > >> (~2 MB) to each of the brokers in the cluster. Apparently the client
> > id
> > > >> will ran out of bytes quota pretty quickly, and the broker started
> to
> > > >> throttle the producers. The throttle time could actually be pretty
> > long
> > > >> (e.g. a few minute). At that point, request queue time on the
> brokers
> > > was
> > > >> around 30

[GitHub] kafka pull request #3891: [WIP input needed] MINOR: Further code cleanup inv...

2017-11-16 Thread KoenDG

Github user KoenDG closed the pull request at:

https://github.com/apache/kafka/pull/3891


---

Re: [DISCUSS] KIP-211: Revise Expiration Semantics of Consumer Group Offsets

2017-11-16 Thread Vahid S Hashemian

James, thanks for the feedback, and sharing the datapoint.

Just as a reference, here is how the key-value in an offsets topic record 
is formed:

Key
 ** group id: string
 ** topic, partition: string, int

Value
 ** offset, metadata: long, string
 ** commit timestamp: long
 ** expire timestamp: long

--Vahid



From:   James Cheng 
To: dev@kafka.apache.org
Date:   11/16/2017 12:01 AM
Subject:Re: [DISCUSS] KIP-211: Revise Expiration Semantics of 
Consumer Group Offsets



How fast does the in-memory cache grow?

As a random datapoint...

10 months ago we set our offsets.retention.minutes to 1 year. So, for the 
past 10 months, we essentially have not expired any offsets.

Via JMX, one of our brokers says 
kafka.coordinator.group:type=GroupMetadataManager,name=NumOffsets
Value=153552

I don't know that maps into memory usage. Are the keys dependent on topic 
names and group names?

And, of course, that number is highly dependent on cluster usage, so I'm 
not sure if we are able to generalize anything from it.

-James

> On Nov 15, 2017, at 5:05 PM, Vahid S Hashemian 
 wrote:
> 
> Thanks Jeff.
> 
> I believe the in-memory cache size is currently unbounded.
> As you mentioned the size of this cache on each broker is a factor of 
the 
> number of consumer groups (whose coordinator is on that broker) and the 
> number of partitions in each group.
> With compaction in mind, the cache size could be manageable even with 
the 
> current KIP.
> We could also consider implementing KAFKA-4664 to minimize the cache 
size: 
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_KAFKA-2D5664=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=Q_itwloTQj3_xUKl7Nzswo6KE4Nj-kjJc7uSVcviKUc=SHiFL0BnhQCLzzF1dqOQuxEZOHHusQz1_sZLIWuoDJk=NWwtk2DVPJytyF-Tkqd12dp1zDno_N3awnKnpKvEkpE=
.
> 
> It would be great to hear feedback from others (and committers) on this.
> 
> --Vahid
> 
> 
> 
> 
> From:   Jeff Widman 
> To: dev@kafka.apache.org
> Date:   11/15/2017 01:04 PM
> Subject:Re: [DISCUSS] KIP-211: Revise Expiration Semantics of 
> Consumer Group Offsets
> 
> 
> 
> I thought about this scenario as well.
> 
> However, my conclusion was that because __consumer_offsets is a 
compacted
> topic, this extra clutter from short-lived consumer groups is 
negligible.
> 
> The disk size is the product of the number of consumer groups and the
> number of partitions in the group's subscription. Typically I'd expect 
> that
> for short-lived consumer groups, that number < 100K.
> 
> The one area I wasn't sure of was how the group coordinator's in-memory
> cache of offsets works. Is it a pull-through cache of unbounded size or
> does it contain all offsets of all groups that use that broker as their
> coordinator? If the latter, possibly there's an OOM risk there. If so,
> might be worth investigating changing the cache design to a bounded 
size.
> 
> Also, switching to this design means that consumer groups no longer need 

> to
> commit all offsets, they only need to commit the ones that changed. I
> expect in certain cases there will be broker-side performance gains due 
to
> parsing smaller OffsetCommit requests. For example, due to some bad 
design
> decisions we have some a couple of topics that have 1500 partitions of
> which ~10% are regularly used. So 90% of the OffsetCommit request
> processing is unnecessary.
> 
> 
> 
> On Wed, Nov 15, 2017 at 11:27 AM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
> 
>> I'm forwarding this feedback from John to the mailing list, and 
> responding
>> at the same time:
>> 
>> John, thanks for the feedback. I agree that the scenario you described
>> could lead to unnecessary long offset retention for other consumer 
> groups.
>> If we want to address that in this KIP we could either keep the
>> 'retention_time' field in the protocol, or propose a per group 
retention
>> configuration.
>> 
>> I'd like to ask for feedback from the community on whether we should
>> design and implement a per-group retention configuration as part of 
this
>> KIP; or keep it simple at this stage and go with one broker level 
> setting
>> only.
>> Thanks in advance for sharing your opinion.
>> 
>> --Vahid
>> 
>> 
>> 
>> 
>> From:   John Crowley 
>> To: vahidhashem...@us.ibm.com
>> Date:   11/15/2017 10:16 AM
>> Subject:[DISCUSS] KIP-211: Revise Expiration Semantics of 
> Consumer
>> Group Offsets
>> 
>> 
>> 
>> Sorry for the clutter, first found KAFKA-3806, then -4682, and finally
>> this KIP - they have more detail which I’ll avoid duplicating here.
>> 
>> Think that not starting the expiration until all consumers have ceased,
>> and clearing all offsets at the same time, does clean things up and 
> solves
>> 99% of the original issues - and 100% of my particular concern.
>> 
>> A valid use-case may still have a periodic application - say production
>> applications posting to Topics

[GitHub] kafka pull request #3904: [MINOR] Added equals() method to Stamped

2017-11-16 Thread KoenDG

Github user KoenDG closed the pull request at:

https://github.com/apache/kafka/pull/3904


---

Build failed in Jenkins: kafka-trunk-jdk8 #2215

See 


Changes:

[wangguoz] KAFKA-5925: Adding records deletion operation to the new Admin Client

--
[...truncated 385.37 KB...]

kafka.security.auth.ResourceTypeTest > testFromString PASSED

kafka.security.auth.OperationTest > testJavaConversions STARTED

kafka.security.auth.OperationTest > testJavaConversions PASSED

kafka.security.auth.AclTest > testAclJsonConversion STARTED

kafka.security.auth.AclTest > testAclJsonConversion PASSED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled STARTED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled PASSED

kafka.security.auth.ZkAuthorizationTest > testZkUtils STARTED

kafka.security.auth.ZkAuthorizationTest > testZkUtils PASSED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testZkMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testChroot STARTED

kafka.security.auth.ZkAuthorizationTest > testChroot PASSED

kafka.security.auth.ZkAuthorizationTest > testDelete STARTED

kafka.security.auth.ZkAuthorizationTest > testDelete PASSED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive STARTED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAllowAllAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testLocalConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyDeletionOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFound PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclInheritance PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls FAILED
java.lang.AssertionError: expected acls Set(User:36 has Allow permission 
for operations: Read from hosts: *, User:7 has Allow permission for operations: 
Read from hosts: *, User:21 has Allow permission for operations: Read from 
hosts: *, User:39 has Allow permission for operations: Read from hosts: *, 
User:43 has Allow permission for operations: Read from hosts: *, User:3 has 
Allow permission for operations: Read from hosts: *, User:35 has Allow 
permission for operations: Read from hosts: *, User:15 has Allow permission for 
operations: Read from hosts: *, User:16 has Allow permission for operations: 
Read from hosts: *, User:22 has Allow permission for operations: Read from 
hosts: *, User:26 has Allow permission for operations: Read from hosts: *, 
User:11 has Allow permission for operations: Read from hosts: *, User:38 has 
Allow permission for operations: Read from hosts: *, User:8 has Allow 
permission for operations: Read from hosts: *, User:28 has Allow permission for 
operations: Read from hosts: *, User:32 has Allow permission for operations: 
Read from hosts: *, User:25 has Allow permission for operations: Read from 
hosts: *, User:41 has Allow permission for operations: Read from hosts: *, 
User:44 has Allow permission for operations: Read from hosts: *, User:48 has 
Allow permission for operations: Read from

[jira] [Resolved] (KAFKA-6213) Stream processor receives messages after close() is invoked

2017-11-16 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6213.

Resolution: Not A Problem

> Stream processor receives messages after close() is invoked
> ---
>
> Key: KAFKA-6213
> URL: https://issues.apache.org/jira/browse/KAFKA-6213
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.11.0.1
>Reporter: Bart De Vylder
>
> I think it is not expected or desirable that a processor receives messages 
> (through its {{process}} method) after {{close}} has been invoked.
> Scenario that triggered the behavior: 
> We have a topic with 2 partitions and a simple streaming app:
> {code}
> builder.stream(topic)
>.process(() -> new SomeProcessor());
> {code}
> Then we create one instance of this application, this triggers the 
> construction of 2 SomeProcessor instances in that application. Next we start 
> a second application, which triggers the rebalance of the partitions. It was 
> observed that both existing SomeProcessor instances in the first application 
> received a {{close}} call. However, after the {{close}} method was invoked, 
> no new SomeProcessor was constructed and the {{process}} method of one of the 
> existing (and closed) ones is still being invoked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] kafka pull request #4195: KAFKA-5811: Add Kibosh integration for Trogdor and...

2017-11-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4195


---

[jira] [Resolved] (KAFKA-5811) Trogdor should handle injecting disk faults

2017-11-16 Thread Rajini Sivaram (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram resolved KAFKA-5811.
---
   Resolution: Fixed
Fix Version/s: 1.1.0

Issue resolved by pull request 4195
[https://github.com/apache/kafka/pull/4195]

> Trogdor should handle injecting disk faults
> ---
>
> Key: KAFKA-5811
> URL: https://issues.apache.org/jira/browse/KAFKA-5811
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Fix For: 1.1.0
>
>
> Trogdor should handle injecting disk faults



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] kafka pull request #4132: KAFKA-5925: Adding records deletion operation to t...

2017-11-16 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4132


---

log retention policy issure

2017-11-16 Thread 张明富

Hi,
  From kafka's document I found：
  "The Kafka cluster retains all published records—whether or not they have 
been consumed—using a configurable retention period. For example, if the 
retention policy is set to two days, then for the two days after a record is 
published, it is available for consumption, after which it will be discarded to 
free up space. Kafka's performance is effectively constant with respect to data 
size so storing data for a long time is not a problem."
  Is this mean once the retention hour is over all the messages will be 
discarded no matter consumed or not ? 
  There is a risk of missing logs for some consumers which are not subscribing 
messages in time.
  Now the log retention policy just contain "log.retention.hours" and 
"log.retention.bytes" is not enough.
  I hope kafka can consider the factor consumer's offset when purge the log 
segment file.
  Is this issure can be imporved in the future release?


  Good Day, Thanks!

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-16 Thread Trevor Huey

1. Going over KIP-213, I am leaning toward the "less intrusive" approach.
In my use case, I am planning on performing a sequence of several oneToMany
joins, From my understanding, the more intrusive approach would result in
several nested levels of CombinedKey's. For example, consider Tables A, B,
C, D with corresponding keys KA, KB, KC. Joining A and B would produce
CombinedKey. Then joining that result on C would produce
CombinedKey>. My "keyOtherSerde" in this case would
need to be capable of deserializing CombinedKey. This would just
get worse the more tables I join. I realize that it's easier to shoot
yourself in the foot with the less intrusive approach, but as you said, " the
user can stick with his default serde or his standard way of serializing".
In the simplest case where the keys are just strings, they can do simple
string concatenation and Serdes.String(). It also allows the user to create
and use their own version of CombinedKey if they feel so inclined.

2. Why is there a problem for prefix, but not for range?
https://github.com/apache/kafka/pull/3720/files#diff-8f863b74c3c5a0b989e89d00c149aef1L162

On Thu, Nov 16, 2017 at 2:57 AM Jan Filipiak 
wrote:

> Hi Trevor,
>
> thank you very much for your interested. Too keep discussion mailing list
> focused and not Jira or Confluence I decided to reply here.
>
> 1. its tricky activity is indeed very low. In the KIP-213 there are 2
> proposals about the return type of the join. I would like to settle on one.
> Unfortunatly its controversal and I don't want to have the discussion
> after I settled on one way and implemented it. But noone is really
> interested.
> So discussing with YOU, what your preferred return type would look would
> be very helpfull already.
>
> 2.
> The most difficult part is implementing
> this
> https://github.com/apache/kafka/pull/3720/files#diff-ac41b4dfb9fc6bb707d966477317783cR68
> here
> https://github.com/apache/kafka/pull/3720/files#diff-8f863b74c3c5a0b989e89d00c149aef1R244
> and here
> https://github.com/apache/kafka/pull/3720/files#diff-b1a1281dce5219fd0cb5afad380d9438R207
> One can get an easy shot by just flushing the underlying rocks and using
> Rocks for range scan.
> But as you can see the implementation depends on the API. For wich way the
> API discussion goes
> I would implement this differently.
>
> 3.
> I only have so and so much time to work on this. I filed the KIP because I
> want to pull it through and I am pretty confident that I can do it.
> But I am still waiting for the full discussion to happen on this. To get
> the discussion forward it seems to be that I need to fill out the table in
> the KIP entirly (the one describing the events, change modifications and
> output). Feel free to continue the discussion w/o the table. I want
> to finish the table during next week.
>
> Best Jan thank you for your interest!
>
> _ Jira Quote __
>
> Jan Filipiak
> 
> Please bear with me while I try to get caught up. I'm not yet familiar with
> the Kafka code base. I have a few questions to try to figure out how I can
> get involved:
> 1. It seems like we need to get buy-in on your KIP-213? It doesn't seem
> like there's been much activity on it besides yourself in a while. What's
> your current plan of attack for getting that approved?
> 2. I know you said that the most difficult part is yet to be done. Is
> there some code you can point me toward so I can start digging in and
> better understand why this is so difficult?
> 3. This issue has been open since May '16. How far out do you think we are
> from getting this implemented?
>

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-16 Thread Jan Filipiak

We are running this perfectly fine. for us the smaller table changes
rather infrequent say. only a few times per day. The performance of the
flush is way lower than the computing power you need to bring to the
table to account for all the records beeing emmited after the one single
update.

On 16.11.2017 18:02, Trevor Huey wrote:
Ah, I think I see the problem now. Thanks for the explanation. That is
tricky. As you said, it seems the easiest solution would just be to
flush the cache. I wonder how big of a performance hit that'd be...

On Thu, Nov 16, 2017 at 9:07 AM Jan Filipiak > wrote:

Hi Trevor,

I am leaning towards the less intrusive approach myself. Infact
that is how we implemented our Internal API for this and how we
run it in production.
getting more voices towards this solution makes me really happy.
The reason its a problem for Prefix and not for Range is the
following. Imagine the intrusive approach. They key of the RockDB
would be CombinedKey and the prefix scan would take an A, and
the range scan would take an CombinedKey still. As you can
see with the intrusive approach the keys are actually different
types for different queries. With the less intrusive apporach we
use the same type and rely on Serde Invariances. For us this works
nice (protobuf) might bite some JSON users.

Hope it makes it clear

Best Jan

On 16.11.2017 16:39, Trevor Huey wrote:

1. Going over KIP-213, I am leaning toward the "less intrusive"
approach. In my use case, I am planning on performing a sequence
of several oneToMany joins, From my understanding, the more
intrusive approach would result in several nested levels of
CombinedKey's. For example, consider Tables A, B, C, D with
corresponding keys KA, KB, KC. Joining A and B would produce
CombinedKey. Then joining that result on C would produce
CombinedKey>. My "keyOtherSerde" in this
case would need to be capable of deserializing CombinedKey. This would just get worse the more tables I join. I realize
that it's easier to shoot yourself in the foot with the less
intrusive approach, but as you said, " the user can stick with
his default serde or his standard way of serializing". In the
simplest case where the keys are just strings, they can do simple
string concatenation and Serdes.String(). It also allows the user
to create and use their own version of CombinedKey if they feel
so inclined.

2. Why is there a problem for prefix, but not for range?

https://github.com/apache/kafka/pull/3720/files#diff-8f863b74c3c5a0b989e89d00c149aef1L162

On Thu, Nov 16, 2017 at 2:57 AM Jan Filipiak
> wrote:

Hi Trevor,

thank you very much for your interested. Too keep discussion
mailing list focused and not Jira or Confluence I decided to
reply here.

1. its tricky activity is indeed very low. In the KIP-213
there are 2 proposals about the return type of the join. I
would like to settle on one.
Unfortunatly its controversal and I don't want to have the
discussion after I settled on one way and implemented it. But
noone is really interested.
So discussing with YOU, what your preferred return type would
look would be very helpfull already.

2.
The most difficult part is implementing
this

https://github.com/apache/kafka/pull/3720/files#diff-ac41b4dfb9fc6bb707d966477317783cR68
here

https://github.com/apache/kafka/pull/3720/files#diff-8f863b74c3c5a0b989e89d00c149aef1R244
and here

https://github.com/apache/kafka/pull/3720/files#diff-b1a1281dce5219fd0cb5afad380d9438R207
One can get an easy shot by just flushing the underlying
rocks and using Rocks for range scan.
But as you can see the implementation depends on the API. For
wich way the API discussion goes
I would implement this differently.

3.
I only have so and so much time to work on this. I filed the
KIP because I want to pull it through and I am pretty
confident that I can do it.
But I am still waiting for the full discussion to happen on
this. To get the discussion forward it seems to be that I
need to fill out the table in
the KIP entirly (the one describing the events, change
modifications and output). Feel free to continue the
discussion w/o the table. I want
to finish the table during next week.

Best Jan thank you for your interest!

_ Jira Quote __

Jan Filipiak

Please bear with me while I try to get

[jira] [Resolved] (KAFKA-414) Evaluate mmap-based writes for Log implementation

2017-11-16 Thread Jay Kreps (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Kreps resolved KAFKA-414.
-
Resolution: Won't Fix

> Evaluate mmap-based writes for Log implementation
> -
>
> Key: KAFKA-414
> URL: https://issues.apache.org/jira/browse/KAFKA-414
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Jay Kreps
>Priority: Minor
> Attachments: TestLinearWritePerformance.java, 
> linear_write_performance.txt
>
>
> Working on another project I noticed that small write performance for 
> FileChannel is really very bad. This likely effects Kafka in the case where 
> messages are produced one at a time or in small batches. I wrote a quick 
> program to evaluate the following options:
> raf = RandomAccessFile
> mmap = MappedByteBuffer
> channel = FileChannel
> For both of the later two I tried both direct-allocated and non-direct 
> allocated buffers (direct allocation is supposed to be faster).
> Here are the results I saw:
> [jkreps@jkreps-ld valencia]$ java -XX:+UseConcMarkSweepGC -cp 
> target/test-classes -server -Xmx1G -Xms1G valencia.TestLinearWritePerformance 
> $((256*1024)) $((1*1024*1024*1024)) 2
>   file_length  size (bytes)  raf (mb/sec) 
>   channel_direct (mb/sec)  mmap_direct (mb/sec) channel_heap (mb/sec) 
>mmap_heap (mb/sec)
>   100 1   
> 0.60  0.52 28.66  
> 0.55 50.40
>   200 2   
> 1.18  1.16 67.84  
> 1.13 74.17
>   400 4   
> 2.33  2.26121.52  
> 2.23122.14
>   800 8   
> 4.72  4.51228.39  
> 4.41175.20
>  160016   
> 9.25  8.96393.24  
> 8.88314.11
>  320032  
> 18.43 17.93601.83 
> 17.28482.25
>  640064  
> 36.25 35.21799.98 
> 34.39680.39
> 12800   128  
> 69.80 67.52963.30 
> 66.21870.82
> 25600   256 
> 134.24129.25   1064.13
> 129.01   1014.00
> 51200   512 
> 247.38238.24   1124.71
> 235.57   1091.81
>102400  1024 
> 420.42411.43   1170.94
> 406.57   1138.80
>1073741824  2048 
> 671.93658.96   1133.63
> 650.39   1151.81
>1073741824  4096
> 1007.84989.88   1165.60   
>  976.10   1158.49
>1073741824  8192
> 1137.12   1145.01   1189.38   
> 1128.30   1174.66
>1073741824 16384
> 1172.63   1228.33   1192.19   
> 1206.58   1156.37
>1073741824 32768
> 1221.13   1295.37   1170.96   
> 1262.28   1156.65
>1073741824 65536
> 1255.23   1306.33   1160.22   
> 1268.24   1142.52
>1073741824131072
> 1240.65   1292.06   1101.90   
> 1269.00   1119.14
> The size column gives

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-16 Thread Sönke Liebau

Sounds good. I've added a few sentences to this effect to the KIP.

On Thu, Nov 16, 2017 at 5:02 PM, Randall Hauch  wrote:

> Nice job updating the KIP. The PR (
> https://github.com/apache/kafka/pull/2755/files) for the proposed
> implementation does prevent names from being empty, and it trims whitespace
> from the name only when creating a new connector. However, the KIP's
> "Proposed Change" section should probably be very clear about this, and the
> migration section should address how a connector that was created with
> leading and/or trailing whitespace characters will still be able to be
> updated and deleted. I think that decreases the likelihood of this change
> negatively impacting existing users. Basically, going forward, the names of
> new connectors will be trimmed.
>
> WDYT?
>
> On Thu, Nov 16, 2017 at 9:32 AM, Sönke Liebau <
> soenke.lie...@opencore.com.invalid> wrote:
>
> > I've added some more detail to the KIP [1] around current scenarios that
> > might break in the future. I actually came up with a second limitation
> that
> > we'd impose on users and also documented this.
> >
> > Let me know what you think.
> >
> > Kind regards,
> > Sönke
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 212%3A+Enforce+set+of+legal+characters+for+connector+names
> >
> >
> > On Thu, Nov 16, 2017 at 2:59 PM, Sönke Liebau <
> soenke.lie...@opencore.com>
> > wrote:
> >
> > > Hi Randall,
> > >
> > > I had mentioned this edge case in the KIP, but will add some further
> > > detail to further clarify all changing scenarios post pull request.
> > >
> > > Kind regards,
> > > Sönke
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Nov 16, 2017 at 2:06 PM, Randall Hauch 
> wrote:
> > >
> > >> No, we need to keep the KIP since we want to change/correct the
> existing
> > >> behavior. But we do need to clarify in the KIP these edge cases that
> > will
> > >> change.
> > >>
> > >> Thanks for the continued work on this, Sönke.
> > >>
> > >> Regards,
> > >>
> > >> Randall
> > >>
> > >> > On Nov 16, 2017, at 1:56 AM, Sönke Liebau <
> soenke.lie...@opencore.com
> > >> .INVALID> wrote:
> > >> >
> > >> > Hi Randall,
> > >> >
> > >> > zero length definitely works, that's what sent me down this hole in
> > the
> > >> > first place. I had a customer accidentally create a connector
> without
> > a
> > >> > name in his environment and then be unable to delete it. No
> connector
> > >> name
> > >> > doesn't work, as this throws a null pointer exception due to
> > KAFKA-4938
> > >> ,
> > >> > but once that is fixed would create a connector named "null" I
> think.
> > >> Have
> > >> > not retested this, but seen it in the past.
> > >> >
> > >> > Also, it is possible to create connectors with trailing and leading
> > >> > whitespaces, this errors out on the create request (which will be
> > fixed
> > >> > when KAFKA-4827 is merged), but correctly creates the connector and
> > you
> > >> can
> > >> > access it if you percent-escape the curl call. This for me is the
> main
> > >> > reason why a KIP might be needed, as we are changing public facing
> > >> behavior
> > >> > here. I agree with you, that this will probably not affect anyone or
> > >> hardly
> > >> > anyone, but in principle it is a change that should need a KIP I
> > think.
> > >> >
> > >> > I've retested and documented this for Confluent 3.3.0:
> > >> > https://gist.github.com/soenkeliebau/9363745cff23560fcc234d9b64ac14
> c4
> > >> >
> > >> > I am of course happy to withdraw the KIP if you think it is
> > unnecessary,
> > >> > I've also updated the pull request for KAFKA-4930 to reflect the
> > changes
> > >> > stated in the KIP and tested the code with Arjuns pull request for
> > >> > KAFKA-4827 to ensure they don't interfere with each other.
> > >> >
> > >> > Let me know what you think.
> > >> >
> > >> > Kind regards,
> > >> > Sönke
> > >> >
> > >> > ᐧ
> > >> >
> > >> >> On Tue, Nov 14, 2017 at 7:03 PM, Randall Hauch 
> > >> wrote:
> > >> >>
> > >> >> Thanks for updating the KIP to reflect the current process.
> However,
> > I
> > >> >> still question whether it is necessary to have a KIP - it depends
> on
> > >> >> whether it was possible with prior versions to have connectors with
> > >> >> zero-length or blank names. Have you tried both of these cases?
> > >> >>
> > >> >> On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
> > >> >> soenke.lie...@opencore.com.invalid> wrote:
> > >> >>
> > >> >>> Hi Randall,
> > >> >>>
> > >> >>> I have set aside some time to work on this next week. The fix
> itself
> > >> is
> > >> >>> quite simple, but I've yet to write tests to properly catch this,
> > >> which
> > >> >>> turns out to be a bit more complex, as it needs a running
> restserver
> > >> >> which
> > >> >>> is mocked in the tests I've looked at so far.
> > >> >>>
> > >> >>> Should I withdraw the KIP or update it to reflect the
> documentation
> > >> >> changes
> > >> >>> and enforced rules around

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-16 Thread Randall Hauch

Nice job updating the KIP. The PR (
https://github.com/apache/kafka/pull/2755/files) for the proposed
implementation does prevent names from being empty, and it trims whitespace
from the name only when creating a new connector. However, the KIP's
"Proposed Change" section should probably be very clear about this, and the
migration section should address how a connector that was created with
leading and/or trailing whitespace characters will still be able to be
updated and deleted. I think that decreases the likelihood of this change
negatively impacting existing users. Basically, going forward, the names of
new connectors will be trimmed.

WDYT?

On Thu, Nov 16, 2017 at 9:32 AM, Sönke Liebau <
soenke.lie...@opencore.com.invalid> wrote:

> I've added some more detail to the KIP [1] around current scenarios that
> might break in the future. I actually came up with a second limitation that
> we'd impose on users and also documented this.
>
> Let me know what you think.
>
> Kind regards,
> Sönke
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 212%3A+Enforce+set+of+legal+characters+for+connector+names
>
>
> On Thu, Nov 16, 2017 at 2:59 PM, Sönke Liebau 
> wrote:
>
> > Hi Randall,
> >
> > I had mentioned this edge case in the KIP, but will add some further
> > detail to further clarify all changing scenarios post pull request.
> >
> > Kind regards,
> > Sönke
> >
> >
> >
> >
> >
> > On Thu, Nov 16, 2017 at 2:06 PM, Randall Hauch  wrote:
> >
> >> No, we need to keep the KIP since we want to change/correct the existing
> >> behavior. But we do need to clarify in the KIP these edge cases that
> will
> >> change.
> >>
> >> Thanks for the continued work on this, Sönke.
> >>
> >> Regards,
> >>
> >> Randall
> >>
> >> > On Nov 16, 2017, at 1:56 AM, Sönke Liebau  >> .INVALID> wrote:
> >> >
> >> > Hi Randall,
> >> >
> >> > zero length definitely works, that's what sent me down this hole in
> the
> >> > first place. I had a customer accidentally create a connector without
> a
> >> > name in his environment and then be unable to delete it. No connector
> >> name
> >> > doesn't work, as this throws a null pointer exception due to
> KAFKA-4938
> >> ,
> >> > but once that is fixed would create a connector named "null" I think.
> >> Have
> >> > not retested this, but seen it in the past.
> >> >
> >> > Also, it is possible to create connectors with trailing and leading
> >> > whitespaces, this errors out on the create request (which will be
> fixed
> >> > when KAFKA-4827 is merged), but correctly creates the connector and
> you
> >> can
> >> > access it if you percent-escape the curl call. This for me is the main
> >> > reason why a KIP might be needed, as we are changing public facing
> >> behavior
> >> > here. I agree with you, that this will probably not affect anyone or
> >> hardly
> >> > anyone, but in principle it is a change that should need a KIP I
> think.
> >> >
> >> > I've retested and documented this for Confluent 3.3.0:
> >> > https://gist.github.com/soenkeliebau/9363745cff23560fcc234d9b64ac14c4
> >> >
> >> > I am of course happy to withdraw the KIP if you think it is
> unnecessary,
> >> > I've also updated the pull request for KAFKA-4930 to reflect the
> changes
> >> > stated in the KIP and tested the code with Arjuns pull request for
> >> > KAFKA-4827 to ensure they don't interfere with each other.
> >> >
> >> > Let me know what you think.
> >> >
> >> > Kind regards,
> >> > Sönke
> >> >
> >> > ᐧ
> >> >
> >> >> On Tue, Nov 14, 2017 at 7:03 PM, Randall Hauch 
> >> wrote:
> >> >>
> >> >> Thanks for updating the KIP to reflect the current process. However,
> I
> >> >> still question whether it is necessary to have a KIP - it depends on
> >> >> whether it was possible with prior versions to have connectors with
> >> >> zero-length or blank names. Have you tried both of these cases?
> >> >>
> >> >> On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
> >> >> soenke.lie...@opencore.com.invalid> wrote:
> >> >>
> >> >>> Hi Randall,
> >> >>>
> >> >>> I have set aside some time to work on this next week. The fix itself
> >> is
> >> >>> quite simple, but I've yet to write tests to properly catch this,
> >> which
> >> >>> turns out to be a bit more complex, as it needs a running restserver
> >> >> which
> >> >>> is mocked in the tests I've looked at so far.
> >> >>>
> >> >>> Should I withdraw the KIP or update it to reflect the documentation
> >> >> changes
> >> >>> and enforced rules around trimming and zero length connector names?
> >> This
> >> >> is
> >> >>> a change to existing behavior, even if it is quite small and
> probably
> >> >> won't
> >> >>> even be noticed by many people..
> >> >>>
> >> >>> best regards,
> >> >>> Sönke
> >> >>>
> >>  On Thu, Nov 9, 2017 at 9:10 PM, Randall Hauch 
> >> wrote:
> >> 
> >>  Any progress on updating the PR and withdrawing KIP-212?
> >> 
> >>

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-16 Thread Sönke Liebau

I've added some more detail to the KIP [1] around current scenarios that
might break in the future. I actually came up with a second limitation that
we'd impose on users and also documented this.

Let me know what you think.

Kind regards,
Sönke

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-212%3A+Enforce+set+of+legal+characters+for+connector+names


On Thu, Nov 16, 2017 at 2:59 PM, Sönke Liebau 
wrote:

> Hi Randall,
>
> I had mentioned this edge case in the KIP, but will add some further
> detail to further clarify all changing scenarios post pull request.
>
> Kind regards,
> Sönke
>
>
>
>
>
> On Thu, Nov 16, 2017 at 2:06 PM, Randall Hauch  wrote:
>
>> No, we need to keep the KIP since we want to change/correct the existing
>> behavior. But we do need to clarify in the KIP these edge cases that will
>> change.
>>
>> Thanks for the continued work on this, Sönke.
>>
>> Regards,
>>
>> Randall
>>
>> > On Nov 16, 2017, at 1:56 AM, Sönke Liebau > .INVALID> wrote:
>> >
>> > Hi Randall,
>> >
>> > zero length definitely works, that's what sent me down this hole in the
>> > first place. I had a customer accidentally create a connector without a
>> > name in his environment and then be unable to delete it. No connector
>> name
>> > doesn't work, as this throws a null pointer exception due to KAFKA-4938
>> ,
>> > but once that is fixed would create a connector named "null" I think.
>> Have
>> > not retested this, but seen it in the past.
>> >
>> > Also, it is possible to create connectors with trailing and leading
>> > whitespaces, this errors out on the create request (which will be fixed
>> > when KAFKA-4827 is merged), but correctly creates the connector and you
>> can
>> > access it if you percent-escape the curl call. This for me is the main
>> > reason why a KIP might be needed, as we are changing public facing
>> behavior
>> > here. I agree with you, that this will probably not affect anyone or
>> hardly
>> > anyone, but in principle it is a change that should need a KIP I think.
>> >
>> > I've retested and documented this for Confluent 3.3.0:
>> > https://gist.github.com/soenkeliebau/9363745cff23560fcc234d9b64ac14c4
>> >
>> > I am of course happy to withdraw the KIP if you think it is unnecessary,
>> > I've also updated the pull request for KAFKA-4930 to reflect the changes
>> > stated in the KIP and tested the code with Arjuns pull request for
>> > KAFKA-4827 to ensure they don't interfere with each other.
>> >
>> > Let me know what you think.
>> >
>> > Kind regards,
>> > Sönke
>> >
>> > ᐧ
>> >
>> >> On Tue, Nov 14, 2017 at 7:03 PM, Randall Hauch 
>> wrote:
>> >>
>> >> Thanks for updating the KIP to reflect the current process. However, I
>> >> still question whether it is necessary to have a KIP - it depends on
>> >> whether it was possible with prior versions to have connectors with
>> >> zero-length or blank names. Have you tried both of these cases?
>> >>
>> >> On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
>> >> soenke.lie...@opencore.com.invalid> wrote:
>> >>
>> >>> Hi Randall,
>> >>>
>> >>> I have set aside some time to work on this next week. The fix itself
>> is
>> >>> quite simple, but I've yet to write tests to properly catch this,
>> which
>> >>> turns out to be a bit more complex, as it needs a running restserver
>> >> which
>> >>> is mocked in the tests I've looked at so far.
>> >>>
>> >>> Should I withdraw the KIP or update it to reflect the documentation
>> >> changes
>> >>> and enforced rules around trimming and zero length connector names?
>> This
>> >> is
>> >>> a change to existing behavior, even if it is quite small and probably
>> >> won't
>> >>> even be noticed by many people..
>> >>>
>> >>> best regards,
>> >>> Sönke
>> >>>
>>  On Thu, Nov 9, 2017 at 9:10 PM, Randall Hauch 
>> wrote:
>> 
>>  Any progress on updating the PR and withdrawing KIP-212?
>> 
>>  On Fri, Oct 27, 2017 at 5:19 PM, Randall Hauch 
>> >> wrote:
>> 
>> > Yes, connector names should not be blank or contain just whitespace.
>> >> In
>> > fact, I might recommend that we trim whitespace at the front and
>> rear
>> >>> of
>> > new connector names and then disallowing any zero-length name.
>> >> Existing
>> > connectors would remain valid, and this would not break backward
>> > compatibility. That might require a small kip simply to update the
>> > documentation and specify what names are valid.
>> >
>> > WDYT?
>> >
>> > Randall
>> >
>> > On Fri, Oct 27, 2017 at 1:08 PM, Colin McCabe 
>>  wrote:
>> >
>> >>> On Wed, Oct 25, 2017, at 01:07, Sönke Liebau wrote:
>> >>> I've spent some time looking at this and testing various
>> >> characters
>>  and
>> >>> it
>> >>> would appear that Randall's suspicion was spot on. I think we can
>> >> support
>> >>> a
>>

[jira] [Resolved] (KAFKA-1408) Kafk broker can not stop itself normaly after problems with connection to ZK


 [ 
https://issues.apache.org/jira/browse/KAFKA-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-1408.

Resolution: Duplicate

Closing as duplicate of KAFKA-1317 based on other comments and the fact that 
there has been no activity on this issue for years.

> Kafk broker can not stop itself normaly after problems with connection to ZK
> 
>
> Key: KAFKA-1408
> URL: https://issues.apache.org/jira/browse/KAFKA-1408
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 0.8.1
>Reporter: Dmitry Bugaychenko
>Assignee: Neha Narkhede
>
> After getting to inconsistence state due to short netwrok failure broker can 
> not stop itself. The last message in the log is:
> {code}
> INFO   | jvm 1| 2014/04/21 08:53:07 | [2014-04-21 09:53:06,999] INFO 
> [kafka-log-cleaner-thread-0], Stopped  (kafka.log.LogCleaner)
> INFO   | jvm 1| 2014/04/21 08:53:07 | [2014-04-21 09:53:06,999] INFO 
> [kafka-log-cleaner-thread-0], Shutdown completed (kafka.log.LogCleaner)
> {code}
> There is also a preceding error:
> {code}
> INFO   | jvm 1| 2014/04/21 08:52:55 | [2014-04-21 09:52:55,015] WARN 
> Controller doesn't exist (kafka.utils.Utils$)
> INFO   | jvm 1| 2014/04/21 08:52:55 | kafka.common.KafkaException: 
> Controller doesn't exist
> INFO   | jvm 1| 2014/04/21 08:52:55 |   at 
> kafka.utils.ZkUtils$.getController(ZkUtils.scala:70)
> INFO   | jvm 1| 2014/04/21 08:52:55 |   at 
> kafka.server.KafkaServer.kafka$server$KafkaServer$$controlledShutdown(KafkaServer.scala:148)
> INFO   | jvm 1| 2014/04/21 08:52:55 |   at 
> kafka.server.KafkaServer$$anonfun$shutdown$1.apply$mcV$sp(KafkaServer.scala:220)
> {code}
> Here is a part of jstack (it looks like there is a deadlock between 
> delete-topics-thread  and ZkClient-EventThread):
> {code}
> IWrapper-Connection id=10 state=WAITING
> - waiting on <0x15d6aa44> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> - locked <0x15d6aa44> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
>  owned by ZkClient-EventThread-37-devlnx2:2181 id=37
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at kafka.utils.Utils$.inLock(Utils.scala:536)
> at kafka.controller.KafkaController.shutdown(KafkaController.scala:641)
> at 
> kafka.server.KafkaServer$$anonfun$shutdown$8.apply$mcV$sp(KafkaServer.scala:233)
> at kafka.utils.Utils$.swallow(Utils.scala:167)
> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
> at kafka.utils.Utils$.swallowWarn(Utils.scala:46)
> at kafka.utils.Logging$class.swallow(Logging.scala:94)
> at kafka.utils.Utils$.swallow(Utils.scala:46)
> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:233)
> at odkl.databus.server.Main.stop(Main.java:184)
> at 
> org.tanukisoftware.wrapper.WrapperManager.stopInner(WrapperManager.java:1982)
> at 
> org.tanukisoftware.wrapper.WrapperManager.handleSocket(WrapperManager.java:2391)
> at org.tanukisoftware.wrapper.WrapperManager.run(WrapperManager.java:2696)
> at java.lang.Thread.run(Thread.java:744)
> ZkClient-EventThread-37-devlnx2:2181 id=37 state=WAITING
> - waiting on <0x3d5f9878> (a java.util.concurrent.CountDownLatch$Sync)
> - locked <0x3d5f9878> (a java.util.concurrent.CountDownLatch$Sync)
> at sun.misc.Unsafe.park(Native Method)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:236)
> at kafka.utils.ShutdownableThread.shutdown(ShutdownableThread.scala:36)
> at 
> kafka.controller.TopicDeletionManager.shutdown(TopicDeletionManager.scala:93)
> at 
>

[jira] [Resolved] (KAFKA-381) Changes made by a request do not affect following requests in the same packet.

2017-11-16 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/KAFKA-381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sönke Liebau resolved KAFKA-381.

Resolution: Not A Bug

I think we can safely close this issue, the behavior was sufficiently 
investigated and explained.
Behavior today would still be like this and is the expected behavior.

> Changes made by a request do not affect following requests in the same packet.
> --
>
> Key: KAFKA-381
> URL: https://issues.apache.org/jira/browse/KAFKA-381
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.7
>Reporter: Samir Jindel
>Priority: Minor
>
> If a packet contains a produce request followed immediately by a fetch 
> request, the fetch request will not have access to the data produced by the 
> prior request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [ANNOUNCE] Apache Kafka 1.0.0 Released

2017-11-16 Thread Ismael Juma

Thanks Sebb. For reference, Sebb filed KAFKA-6222 and KAFKA-6223, so let's
keep the discussion in the relevant JIRAs.

Ismael

On Thu, Nov 16, 2017 at 1:49 PM, sebb  wrote:

> On 1 November 2017 at 15:38, Guozhang Wang  wrote:
> > The Apache Kafka community is pleased to announce the release for Apache
> > Kafka 1.0.0.
>
> ...
>
> >
> > All of the changes in this release can be found in the release notes:
> >
> > https://dist.apache.org/repos/dist/release/kafka/1.0.0/RELEA
> SE_NOTES.html
>
> Please don't publish the dist.apache.org URL.
> It is only intended as a staging host and does not have the bandwidth
> for general downloads.
>
> In future, please only use the ASF mirror system for advertising downloads.
>
> For KEYS, sigs and hashes, use the ASF mirror source at:
>
> https://www.apache.org/dist/kafka/...
>
> This would also be OK for small files such as release notes, i.e. you can
> use:
>
> https://www.apache.org/dist/kafka/1.0.0/RELEASE_NOTES.html
>
> For files such as binaries and source archives, use the 3rd party
> mirrors (as you have done below).
>
> Thanks.
>
> >
> > You can download the source release from:
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.0/kafk
> a-1.0.0-src.tgz
> >
> > and binary releases from:
> >
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.0/kafk
> a_2.11-1.0.0.tgz
> > (Scala 2.11)
> > https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.0/kafk
> a_2.12-1.0.0.tgz
> > (Scala 2.12)
> >
>
> These are fine.
>

[jira] [Created] (KAFKA-6223) Please delete old releases from mirroring system

2017-11-16 Thread Sebb (JIRA)

Sebb created KAFKA-6223:
---

 Summary: Please delete old releases from mirroring system
 Key: KAFKA-6223
 URL: https://issues.apache.org/jira/browse/KAFKA-6223
 Project: Kafka
  Issue Type: Bug
 Environment: https://dist.apache.org/repos/dist/release/kafka/
Reporter: Sebb


To reduce the load on the ASF mirrors, projects are required to delete old 
releases [1]

Please can you remove all non-current releases?
It's unfair to expect the 3rd party mirrors to carry old releases.

Note that older releases can still be linked from the download page, but such 
links should use the archive server at:

https://archive.apache.org/dist/kafka/

A suggested process is:
+ Change the download page to use archive.a.o for old releases
+ Delete the corresponding directories from 
{{https://dist.apache.org/repos/dist/release/kafka/}}

e.g. {{svn delete https://dist.apache.org/repos/dist/release/kafka/0.8.0}}


Thanks!

[1] http://www.apache.org/dev/release.html#when-to-archive





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (KAFKA-6222) Download page must not link to dist.apache.org

2017-11-16 Thread Sebb (JIRA)

Sebb created KAFKA-6222:
---

 Summary: Download page must not link to dist.apache.org
 Key: KAFKA-6222
 URL: https://issues.apache.org/jira/browse/KAFKA-6222
 Project: Kafka
  Issue Type: Bug
Reporter: Sebb


The download page currently links to dist.apache.org for sigs and hashes.
However that host is only intended as a development staging host; it is not 
intended for general downloads.

Links to hashes and sigs must use https://www.apache.org/dist/kafka
Note that the KEYS file should really use https: as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-16 Thread Sönke Liebau

Hi Randall,

I had mentioned this edge case in the KIP, but will add some further detail
to further clarify all changing scenarios post pull request.

Kind regards,
Sönke





On Thu, Nov 16, 2017 at 2:06 PM, Randall Hauch  wrote:

> No, we need to keep the KIP since we want to change/correct the existing
> behavior. But we do need to clarify in the KIP these edge cases that will
> change.
>
> Thanks for the continued work on this, Sönke.
>
> Regards,
>
> Randall
>
> > On Nov 16, 2017, at 1:56 AM, Sönke Liebau 
> > 
> wrote:
> >
> > Hi Randall,
> >
> > zero length definitely works, that's what sent me down this hole in the
> > first place. I had a customer accidentally create a connector without a
> > name in his environment and then be unable to delete it. No connector
> name
> > doesn't work, as this throws a null pointer exception due to KAFKA-4938 ,
> > but once that is fixed would create a connector named "null" I think.
> Have
> > not retested this, but seen it in the past.
> >
> > Also, it is possible to create connectors with trailing and leading
> > whitespaces, this errors out on the create request (which will be fixed
> > when KAFKA-4827 is merged), but correctly creates the connector and you
> can
> > access it if you percent-escape the curl call. This for me is the main
> > reason why a KIP might be needed, as we are changing public facing
> behavior
> > here. I agree with you, that this will probably not affect anyone or
> hardly
> > anyone, but in principle it is a change that should need a KIP I think.
> >
> > I've retested and documented this for Confluent 3.3.0:
> > https://gist.github.com/soenkeliebau/9363745cff23560fcc234d9b64ac14c4
> >
> > I am of course happy to withdraw the KIP if you think it is unnecessary,
> > I've also updated the pull request for KAFKA-4930 to reflect the changes
> > stated in the KIP and tested the code with Arjuns pull request for
> > KAFKA-4827 to ensure they don't interfere with each other.
> >
> > Let me know what you think.
> >
> > Kind regards,
> > Sönke
> >
> > ᐧ
> >
> >> On Tue, Nov 14, 2017 at 7:03 PM, Randall Hauch 
> wrote:
> >>
> >> Thanks for updating the KIP to reflect the current process. However, I
> >> still question whether it is necessary to have a KIP - it depends on
> >> whether it was possible with prior versions to have connectors with
> >> zero-length or blank names. Have you tried both of these cases?
> >>
> >> On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
> >> soenke.lie...@opencore.com.invalid> wrote:
> >>
> >>> Hi Randall,
> >>>
> >>> I have set aside some time to work on this next week. The fix itself is
> >>> quite simple, but I've yet to write tests to properly catch this, which
> >>> turns out to be a bit more complex, as it needs a running restserver
> >> which
> >>> is mocked in the tests I've looked at so far.
> >>>
> >>> Should I withdraw the KIP or update it to reflect the documentation
> >> changes
> >>> and enforced rules around trimming and zero length connector names?
> This
> >> is
> >>> a change to existing behavior, even if it is quite small and probably
> >> won't
> >>> even be noticed by many people..
> >>>
> >>> best regards,
> >>> Sönke
> >>>
>  On Thu, Nov 9, 2017 at 9:10 PM, Randall Hauch 
> wrote:
> 
>  Any progress on updating the PR and withdrawing KIP-212?
> 
>  On Fri, Oct 27, 2017 at 5:19 PM, Randall Hauch 
> >> wrote:
> 
> > Yes, connector names should not be blank or contain just whitespace.
> >> In
> > fact, I might recommend that we trim whitespace at the front and rear
> >>> of
> > new connector names and then disallowing any zero-length name.
> >> Existing
> > connectors would remain valid, and this would not break backward
> > compatibility. That might require a small kip simply to update the
> > documentation and specify what names are valid.
> >
> > WDYT?
> >
> > Randall
> >
> > On Fri, Oct 27, 2017 at 1:08 PM, Colin McCabe 
>  wrote:
> >
> >>> On Wed, Oct 25, 2017, at 01:07, Sönke Liebau wrote:
> >>> I've spent some time looking at this and testing various
> >> characters
>  and
> >>> it
> >>> would appear that Randall's suspicion was spot on. I think we can
> >> support
> >>> a
> >>> fairly large set of characters with very minor changes.
> >>>
> >>> I was put of by the exceptions that were thrown when creating
>  connectors
> >>> with certain characters and suspected a larger underlying problem
> >>> when
> >> in
> >>> fact the only issue is, that the URL in the rest request used to
> >> retrieve
> >>> the response for the create connector request needs to be percent
> >> encoded
> >>> [1].
> >>>
> >>> I've fixed this and done some local testing which worked out quite
> >>> nicely,
> >>> apart from

Re: [ANNOUNCE] Apache Kafka 1.0.0 Released

2017-11-16 Thread sebb

On 1 November 2017 at 15:38, Guozhang Wang  wrote:
> The Apache Kafka community is pleased to announce the release for Apache
> Kafka 1.0.0.

...

>
> All of the changes in this release can be found in the release notes:
>
> https://dist.apache.org/repos/dist/release/kafka/1.0.0/RELEASE_NOTES.html

Please don't publish the dist.apache.org URL.
It is only intended as a staging host and does not have the bandwidth
for general downloads.

In future, please only use the ASF mirror system for advertising downloads.

For KEYS, sigs and hashes, use the ASF mirror source at:

https://www.apache.org/dist/kafka/...

This would also be OK for small files such as release notes, i.e. you can use:

https://www.apache.org/dist/kafka/1.0.0/RELEASE_NOTES.html

For files such as binaries and source archives, use the 3rd party
mirrors (as you have done below).

Thanks.

>
> You can download the source release from:
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.0/kafka-1.0.0-src.tgz
>
> and binary releases from:
>
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.0/kafka_2.11-1.0.0.tgz
> (Scala 2.11)
> https://www.apache.org/dyn/closer.cgi?path=/kafka/1.0.0/kafka_2.12-1.0.0.tgz
> (Scala 2.12)
>

These are fine.

[jira] [Reopened] (KAFKA-1993) Enable topic deletion as default


 [ 
https://issues.apache.org/jira/browse/KAFKA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-1993:


> Enable topic deletion as default
> 
>
> Key: KAFKA-1993
> URL: https://issues.apache.org/jira/browse/KAFKA-1993
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: KAFKA-1993.patch
>
>
> Since topic deletion is now throughly tested and works as well as most Kafka 
> features, we should enable it by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KAFKA-1993) Enable topic deletion as default


 [ 
https://issues.apache.org/jira/browse/KAFKA-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-1993.

Resolution: Duplicate

> Enable topic deletion as default
> 
>
> Key: KAFKA-1993
> URL: https://issues.apache.org/jira/browse/KAFKA-1993
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.8.2.0
>Reporter: Gwen Shapira
>Assignee: Gwen Shapira
> Attachments: KAFKA-1993.patch
>
>
> Since topic deletion is now throughly tested and works as well as most Kafka 
> features, we should enable it by default.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (KAFKA-4675) Subsequent CreateTopic command could be lost after a DeleteTopic command


 [ 
https://issues.apache.org/jira/browse/KAFKA-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4675.

Resolution: Duplicate

KAFKA-6098 is the same issue and has more information.

> Subsequent CreateTopic command could be lost after a DeleteTopic command
> 
>
> Key: KAFKA-4675
> URL: https://issues.apache.org/jira/browse/KAFKA-4675
> Project: Kafka
>  Issue Type: Bug
>Reporter: Guozhang Wang
>  Labels: admin
>
> This is discovered while investigating KAFKA-3896: If an admin client sends a 
> delete topic command and a create topic command consecutively, even if it 
> wait for the response of the previous command before issuing the second, 
> there is still a race condition that the create topic command could be "lost".
> This is because currently these commands are all asynchronous as defined in 
> KIP-4, and controller will return the response once it has written the 
> corresponding data to ZK path, which can be handled by different listener 
> threads at different paces, and if the thread handling create is faster than 
> the other, the executions could be effectively re-ordered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[GitHub] kafka pull request #4150: MINOR: Add valid values for message.timestamp.type

2017-11-16 Thread makearl

Github user makearl closed the pull request at:

https://github.com/apache/kafka/pull/4150


---

[GitHub] kafka pull request #4150: MINOR: Add valid values for message.timestamp.type

2017-11-16 Thread makearl

GitHub user makearl reopened a pull request:

https://github.com/apache/kafka/pull/4150

MINOR: Add valid values for message.timestamp.type

The documentation for `message.timestamp.type` is missing valid values 
(https://kafka.apache.org/documentation/#topicconfigs). This change adds valid 
values for that config

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/makearl/kafka 
add-topic-message-timestamp-defaults

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4150.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4150


commit 5f314b10bf9eb464a2897fa97e9d96490c83
Author: makearl 
Date:   2017-10-28T00:47:47Z

Add valid values for message.timestamp.type




---

Re: [DISCUSS] KIP-212: Enforce set of legal characters for connector names

2017-11-16 Thread Randall Hauch

No, we need to keep the KIP since we want to change/correct the existing 
behavior. But we do need to clarify in the KIP these edge cases that will 
change.

Thanks for the continued work on this, Sönke. 

Regards, 

Randall

> On Nov 16, 2017, at 1:56 AM, Sönke Liebau 
>  wrote:
> 
> Hi Randall,
> 
> zero length definitely works, that's what sent me down this hole in the
> first place. I had a customer accidentally create a connector without a
> name in his environment and then be unable to delete it. No connector name
> doesn't work, as this throws a null pointer exception due to KAFKA-4938 ,
> but once that is fixed would create a connector named "null" I think. Have
> not retested this, but seen it in the past.
> 
> Also, it is possible to create connectors with trailing and leading
> whitespaces, this errors out on the create request (which will be fixed
> when KAFKA-4827 is merged), but correctly creates the connector and you can
> access it if you percent-escape the curl call. This for me is the main
> reason why a KIP might be needed, as we are changing public facing behavior
> here. I agree with you, that this will probably not affect anyone or hardly
> anyone, but in principle it is a change that should need a KIP I think.
> 
> I've retested and documented this for Confluent 3.3.0:
> https://gist.github.com/soenkeliebau/9363745cff23560fcc234d9b64ac14c4
> 
> I am of course happy to withdraw the KIP if you think it is unnecessary,
> I've also updated the pull request for KAFKA-4930 to reflect the changes
> stated in the KIP and tested the code with Arjuns pull request for
> KAFKA-4827 to ensure they don't interfere with each other.
> 
> Let me know what you think.
> 
> Kind regards,
> Sönke
> 
> ᐧ
> 
>> On Tue, Nov 14, 2017 at 7:03 PM, Randall Hauch  wrote:
>> 
>> Thanks for updating the KIP to reflect the current process. However, I
>> still question whether it is necessary to have a KIP - it depends on
>> whether it was possible with prior versions to have connectors with
>> zero-length or blank names. Have you tried both of these cases?
>> 
>> On Fri, Nov 10, 2017 at 3:52 AM, Sönke Liebau <
>> soenke.lie...@opencore.com.invalid> wrote:
>> 
>>> Hi Randall,
>>> 
>>> I have set aside some time to work on this next week. The fix itself is
>>> quite simple, but I've yet to write tests to properly catch this, which
>>> turns out to be a bit more complex, as it needs a running restserver
>> which
>>> is mocked in the tests I've looked at so far.
>>> 
>>> Should I withdraw the KIP or update it to reflect the documentation
>> changes
>>> and enforced rules around trimming and zero length connector names? This
>> is
>>> a change to existing behavior, even if it is quite small and probably
>> won't
>>> even be noticed by many people..
>>> 
>>> best regards,
>>> Sönke
>>> 
 On Thu, Nov 9, 2017 at 9:10 PM, Randall Hauch  wrote:
 
 Any progress on updating the PR and withdrawing KIP-212?
 
 On Fri, Oct 27, 2017 at 5:19 PM, Randall Hauch 
>> wrote:
 
> Yes, connector names should not be blank or contain just whitespace.
>> In
> fact, I might recommend that we trim whitespace at the front and rear
>>> of
> new connector names and then disallowing any zero-length name.
>> Existing
> connectors would remain valid, and this would not break backward
> compatibility. That might require a small kip simply to update the
> documentation and specify what names are valid.
> 
> WDYT?
> 
> Randall
> 
> On Fri, Oct 27, 2017 at 1:08 PM, Colin McCabe 
 wrote:
> 
>>> On Wed, Oct 25, 2017, at 01:07, Sönke Liebau wrote:
>>> I've spent some time looking at this and testing various
>> characters
 and
>>> it
>>> would appear that Randall's suspicion was spot on. I think we can
>> support
>>> a
>>> fairly large set of characters with very minor changes.
>>> 
>>> I was put of by the exceptions that were thrown when creating
 connectors
>>> with certain characters and suspected a larger underlying problem
>>> when
>> in
>>> fact the only issue is, that the URL in the rest request used to
>> retrieve
>>> the response for the create connector request needs to be percent
>> encoded
>>> [1].
>>> 
>>> I've fixed this and done some local testing which worked out quite
>>> nicely,
>>> apart from two special cases, I've not been able to find
>> characters
 that
>>> created issues, even space and slash work.
>>> The mentioned special cases are:
>>>  \  - if the name contains a backslash that is not the beginning
>>> of a
>>> valid escape sequence the request fails before we ever get it in
>>> ConnectorsResource, so a backslash would need to be escaped: \\
>>>  "  - Quotation marks need to be escaped as well to keep the json
 body
>>>

Fwd: The KafkaConsumer reads randomly from the offset 0

2017-11-16 Thread dali dali

-- Forwarded message --
From: dali dali 
Date: 2017-11-16 10:54 GMT+01:00
Subject: The KafkaConsumer reads randomly from the offset 0
To: us...@kafka.apache.org


Hi,
I want to test a Kafka example. I am using Kafka 0.10.0.1.
The producer:

object ProducerApp extends App {

val topic = "topicTest"
val  props = new Properties()
props.put("bootstrap.servers", "localhost:9092")
props.put(ConsumerConfig.GROUP_ID_CONFIG, "consumer")
props.put("key.serializer",
"org.apache.kafka.common.serialization.StringSerializer")
props.put("value.serializer",
"org.apache.kafka.common.serialization.StringSerializer")
val producer = new KafkaProducer[String, String](props)
for(i <- 0 to 20)
{
val record = new ProducerRecord(topic, "key "+i," value "+i)
producer.send(record)
Thread.sleep(100)
}
}

The consumer (the topic "topicTest" is created with 1 partition):

object ConsumerApp extends App {
val topic = "topicTest"
val properties = new Properties
properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092")
properties.put(ConsumerConfig.GROUP_ID_CONFIG, "consumer")
properties.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false")
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
properties.put("key.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer")
properties.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer")
val consumer = new KafkaConsumer[String, String](properties)
consumer.subscribe(scala.List(topic).asJava)
while (true) {
consumer.seekToBeginning(consumer.assignment())
val records:ConsumerRecords[String,String] = consumer.poll(2)
println("records size "+records.count())
records.asScala.foreach(rec => println("offset "+rec.offset()))
}
}

The problem is that the consumer does not read from the offset 0 at the
first iteration but at the other oiterations it does. I want to know the
reason and how can I make the consumer reads from the offset 0 at all the
iterations.
The expected result is:

records size 6
offset 0
offset 1
offset 2
offset 3
offset 4
offset 5
records size 6
offset 0
offset 1
offset 2
offset 3
offset 4
offset 5
...

but the obtained result is:

records size 4
offset 2
offset 3
offset 4
offset 5
records size 6
offset 0
offset 1
offset 2
offset 3
offset 4
offset 5
...

I want that the consumer reads all the records at all the iterations (from
the offset 0 to the offset 5)

Build failed in Jenkins: kafka-0.11.0-jdk7 #336

See 


Changes:

[rajinisivaram] Bump version to 0.11.0.2

[rajinisivaram] MINOR: Update version numbers to 0.11.0.3-SNAPSHOT

--
[...truncated 2.44 MB...]

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToCommitMultiplePartitionOffsets PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRunWithTwoSubtopologies STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRunWithTwoSubtopologies PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskGetsFencedUsingIsolatedAppInstances STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskGetsFencedUsingIsolatedAppInstances PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskFailsWithState STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldNotViolateEosIfOneTaskFailsWithState PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRunWithTwoSubtopologiesAndMultiplePartitions STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRunWithTwoSubtopologiesAndMultiplePartitions PASSED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRestartAfterClose STARTED

org.apache.kafka.streams.integration.EosIntegrationTest > 
shouldBeAbleToRestartAfterClose PASSED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin STARTED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin PASSED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldRestoreTransactionalMessages STARTED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldRestoreTransactionalMessages PASSED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin STARTED

org.apache.kafka.streams.integration.GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldCompactTopicsForStateChangelogs PASSED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs STARTED

org.apache.kafka.streams.integration.InternalTopicIntegrationTest > 
shouldUseCompactAndDeleteForWindowStoreChangelogs PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldUseNewConfigsWhenPresent 
STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldUseNewConfigsWhenPresent 
PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce STARTED

org.apache.kafka.streams.StreamsConfigTest > shouldAcceptAtLeastOnce PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldUseCorrectDefaultsWhenNoneSpecified STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldUseCorrectDefaultsWhenNoneSpecified PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerEnableIdempotenceIfEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
STARTED

org.apache.kafka.streams.StreamsConfigTest > defaultSerdeShouldBeConfigured 
PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetDifferentDefaultsIfEosEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSetDifferentDefaultsIfEosEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigRetriesIfExactlyOnceEnabled STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldNotOverrideUserConfigRetriesIfExactlyOnceEnabled PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldSupportPrefixedConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldOverrideStreamsDefaultConsumerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs STARTED

org.apache.kafka.streams.StreamsConfigTest > testGetProducerConfigs PASSED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerMaxInFlightRequestPerConnectionsWhenEosDisabled 
STARTED

org.apache.kafka.streams.StreamsConfigTest > 
shouldAllowSettingProducerMaxInFlightRequestPerConnectionsWhenEosDisabled PASSED

org.apache.kafka.streams.StreamsConfigTest >

[RESULTS] [VOTE] Release Kafka version 0.11.0.2

This vote passes with 7 +1 votes (3 bindings) and no 0 or -1 votes.

+1 votes
PMC Members:
* Gwen Shapira
* Guozhang Wang
* Ismael Juma

Committers:
* Rajini Sivaram

Community:

* Ted Yu
* Satish Duggana
* Tim Carey-Smith

0 votes
* No votes

-1 votes
* No votes

Vote thread:

http://mail-archives.apache.org/mod_mbox/kafka-dev/201711.mbox/%3ccaojcb38nqfu__ctcchgw3wquh+kc5kk+uxb1lct626kzrfa...@mail.gmail.com%3e


I'll continue with the release process and the release announcement
will follow in the next few days.

Rajini

Re: [VOTE] 0.11.0.2 RC0

Correction from previous note:

Vote closed with 3 binding PMC votes (Gwen, Guozhang, Ismael ) and 4
non-binding votes.

On Thu, Nov 16, 2017 at 10:03 AM, Rajini Sivaram 
wrote:

> +1 from me
>
> The vote has passed with 4 binding votes (Gwen, Guozhang, Ismael and
> Rajini) and 3 non-binding votes (Ted, Satish and Tim). I will close the
> voting thread and complete the release process.
>
> Many thanks to everyone for voting.
>
> Regards,
>
> Rajini
>
> On Thu, Nov 16, 2017 at 3:01 AM, Ismael Juma  wrote:
>
>> +1 (binding). Tested the quickstart with the source and binary (Scala
>> 2.12)
>> artifacts, ran the tests on the source artifact and verified some
>> signatures and hashes on source and binary (Scala 2.12) artifacts.
>>
>> Thanks for managing this release Rajini!
>>
>> On Sat, Nov 11, 2017 at 12:37 AM, Rajini Sivaram > >
>> wrote:
>>
>> > Hello Kafka users, developers and client-developers,
>> >
>> >
>> > This is the first candidate for release of Apache Kafka 0.11.0.2.
>> >
>> >
>> > This is a bug fix release and it includes fixes and improvements from 16
>> > JIRAs,
>> > including a few critical bugs.
>> >
>> >
>> > Release notes for the 0.11.0.2 release:
>> >
>> > http://home.apache.org/~rsivaram/kafka-0.11.0.2-rc0/RELEASE_NOTES.html
>> >
>> >
>> > *** Please download, test and vote by Wednesday the 15th of November,
>> 8PM
>> > PT
>> >
>> >
>> > Kafka's KEYS file containing PGP keys we use to sign the release:
>> >
>> > http://kafka.apache.org/KEYS
>> >
>> >
>> > * Release artifacts to be voted upon (source and binary):
>> >
>> > http://home.apache.org/~rsivaram/kafka-0.11.0.2-rc0/
>> >
>> >
>> > * Maven artifacts to be voted upon:
>> >
>> > https://repository.apache.org/content/groups/staging/
>> >
>> >
>> > * Javadoc:
>> >
>> > http://home.apache.org/~rsivaram/kafka-0.11.0.2-rc0/javadoc/
>> >
>> >
>> > * Tag to be voted upon (off 0.11.0 branch) is the 0.11.0.2 tag:
>> >
>> > https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=tag;h=
>> > 25639822d6e23803c599cba35ad3dc1a2817b404
>> >
>> >
>> >
>> > * Documentation:
>> >
>> > Note the documentation can't be pushed live due to changes that will
>> > not go live
>> > until the release. You can manually verify by downloading
>> >
>> > http://home.apache.org/~rsivaram/kafka-0.11.0.2-rc0/kafka_2.
>> > 11-0.11.0.2-site-docs.tgz
>> >
>> >
>> >
>> > * Protocol:
>> >
>> > http://kafka.apache.org/0110/protocol.html
>> >
>> >
>> > * Successful Jenkins builds for the 0.11.0 branch:
>> >
>> > Unit/integration tests: https://builds.apache.org/job/
>> > kafka-0.11.0-jdk7/333/
>> >
>> >
>> >
>> >
>> > Thanks,
>> >
>> >
>> > Rajini
>> >
>>
>
>

Re: [DISCUSS] KIP-225 - Use tags for consumer “records.lag” metrics

2017-11-16 Thread charly molter

Yes James you are right.
I wasn't sure what to do about it and followed what happened with BytesOut
in KIP-153 which completely changed meaning without any deprecation window.
I'm happy to adapt my KIP if the community thinks we should duplicate the
metric for a while.

Thanks!

On Thu, Nov 16, 2017 at 8:13 AM, James Cheng  wrote:

> This KIP will break backwards compatibility for anyone who is using the
> existing attribute names.
>
> Kafka devs, I believe that metrics are a supported interface, and so this
> would be a breaking change. In order to do this, we would need a
> deprecation timeframe for the old metric, and a transition plan to the new
> name. Is that right? I'm not sure how we deprecate metrics...
>
> During the deprecation timeframe, we could duplicate the metric to the new
> name.
>
> -James
>
> On Nov 13, 2017, at 6:09 AM, charly molter 
> wrote:
> >
> > Hi,
> >
> > There doesn't seem to be much opposition to this KIP, I'll leave a couple
> > more days before starting the vote.
> >
> > Thanks!
> >
> > On Thu, Nov 9, 2017 at 1:59 PM, charly molter 
> > wrote:
> >
> >> Hi,
> >>
> >> I'd like to start the discussion on KIP-225.
> >>
> >> This KIP tries to correct the way the consumer lag metrics are reported
> to
> >> use built in tags from MetricName.
> >>
> >> Here's the link:
> >> https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=74686649
> >>
> >> Thanks!
> >> --
> >> Charly Molter
> >>
> >
> >
> >
> > --
> > Charly Molter
>
>


-- 
Charly Molter

Re: [VOTE] 0.11.0.2 RC0