[GitHub] kafka pull request #4234: KAFKA-6207 : Include start of record when RecordIs...

2017-11-18 Thread jawalesumit
GitHub user jawalesumit opened a pull request:

https://github.com/apache/kafka/pull/4234

KAFKA-6207 : Include start of record when RecordIsTooLarge

When a message is too large to be sent (at 
org.apache.kafka.clients.producer.KafkaProducer#doSend), the 
RecordTooLargeException should carry the start of the record (for example, the 
first 1KB) so that the calling application can debug which message caused the 
error.

To resolve this, I created a new function which will get the first 1KB of 
the message (if the RecordTooLargeException is occurring), and the same will be 
logged...

Thanks,
Sumit Jawale

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jawalesumit/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4234


commit 686a3ff711018b4e80e622240cc250e6735d24c7
Author: jawalesumit 
Date:   2017-11-19T07:24:53Z

KAFKA-6207 : Include start of record when RecordIsTooLarge




---


[GitHub] kafka pull request #4233: KAFKA-6181 Examining log messages with {{--deep-it...

2017-11-18 Thread deorenikhil
GitHub user deorenikhil opened a pull request:

https://github.com/apache/kafka/pull/4233

KAFKA-6181 Examining log messages with {{--deep-iteration}} should show 
superset of fields

Printing log data on Kafka brokers using kafka.tools.DumpLogSegments 
--deep-iteration option doesn't print all the fields.
Adding missing fields in the log data on kafka brokers
Adding following fields in the deep-interation option:

baseOffset
lastOffset
baseSequence
lastSequence
producerEpoch
partitionLeaderEpoch
size
crc

thanks,
Nikhil

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deorenikhil/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4233.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4233


commit 4c2ae5974077d4bd6e5a756b280f3b6c382eb8bc
Author: deorenikhil 
Date:   2017-11-19T06:49:19Z

Adding missing fields in the log data on kafka brokers




---


[GitHub] kafka-site pull request #110: Add missing close parenthesis

2017-11-18 Thread renchaorevee
GitHub user renchaorevee opened a pull request:

https://github.com/apache/kafka-site/pull/110

Add missing close parenthesis



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/renchaorevee/kafka-site master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/110.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #110


commit b0fd1db9b5c87053fed44c2c14e471b4dd915b07
Author: Chao Ren 
Date:   2017-11-19T03:18:52Z

Add missing close paracentesis




---


Jenkins build is back to normal : kafka-trunk-jdk8 #2221

2017-11-18 Thread Apache Jenkins Server
See 




Jenkins build is back to normal : kafka-trunk-jdk9 #204

2017-11-18 Thread Apache Jenkins Server
See 




Build failed in Jenkins: kafka-trunk-jdk7 #2982

2017-11-18 Thread Apache Jenkins Server
See 


Changes:

[wangguoz] KAFKA-6122: Global Consumer should handle TimeoutException

--
[...truncated 385.74 KB...]
kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testDistributedConcurrentModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testAclManagementAPIs PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testWildCardAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testTopicAcl PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testSuperUserHasAccess PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testDenyTakesPrecedence PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testNoAclFoundOverride PASSED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls STARTED

kafka.security.auth.SimpleAclAuthorizerTest > 
testHighConcurrencyModificationOfResourceAcls PASSED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache STARTED

kafka.security.auth.SimpleAclAuthorizerTest > testLoadCache PASSED

kafka.security.auth.OperationTest > testJavaConversions STARTED

kafka.security.auth.OperationTest > testJavaConversions PASSED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled STARTED

kafka.security.auth.ZkAuthorizationTest > testIsZkSecurityEnabled PASSED

kafka.security.auth.ZkAuthorizationTest > testZkUtils STARTED

kafka.security.auth.ZkAuthorizationTest > testZkUtils PASSED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkAntiMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testZkMigration STARTED

kafka.security.auth.ZkAuthorizationTest > testZkMigration PASSED

kafka.security.auth.ZkAuthorizationTest > testChroot STARTED

kafka.security.auth.ZkAuthorizationTest > testChroot PASSED

kafka.security.auth.ZkAuthorizationTest > testDelete STARTED

kafka.security.auth.ZkAuthorizationTest > testDelete PASSED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive STARTED

kafka.security.auth.ZkAuthorizationTest > testDeleteRecursive PASSED

kafka.security.auth.AclTest > testAclJsonConversion STARTED

kafka.security.auth.AclTest > testAclJsonConversion PASSED

kafka.security.auth.ResourceTypeTest > testJavaConversions STARTED

kafka.security.auth.ResourceTypeTest > testJavaConversions PASSED

kafka.security.auth.ResourceTypeTest > testFromString STARTED

kafka.security.auth.ResourceTypeTest > testFromString PASSED

kafka.security.auth.PermissionTypeTest > testJavaConversions STARTED

kafka.security.auth.PermissionTypeTest > testJavaConversions PASSED

kafka.security.auth.PermissionTypeTest > testFromString STARTED

kafka.security.auth.PermissionTypeTest > testFromString PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsWrongSetValue STARTED

kafka.KafkaTest > testGetKafkaConfigFromArgsWrongSetValue PASSED

kafka.KafkaTest > testKafkaSslPasswords STARTED

kafka.KafkaTest > testKafkaSslPasswords PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgs STARTED

kafka.KafkaTest > testGetKafkaConfigFromArgs PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheEnd STARTED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheEnd PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsOnly STARTED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsOnly PASSED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheBegging STARTED

kafka.KafkaTest > testGetKafkaConfigFromArgsNonArgsAtTheBegging PASSED

kafka.producer.SyncProducerTest > testReachableServer STARTED

kafka.producer.SyncProducerTest > testReachableServer PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLarge PASSED

kafka.producer.SyncProducerTest > testNotEnoughReplicas STARTED

kafka.producer.SyncProducerTest > testNotEnoughReplicas PASSED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero STARTED

kafka.producer.SyncProducerTest > testMessageSizeTooLargeWithAckZero PASSED

kafka.producer.SyncProducerTest > testProducerCanTimeout STARTED

kafka.producer.SyncProducerTest > testProducerCanTimeout PASSED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse STARTED

kafka.producer.SyncProducerTest > testProduceRequestWithNoResponse PASSED

kafka.producer.Sync

[GitHub] kafka pull request #4206: KAFKA-6122: Global Consumer should handle TimeoutE...

2017-11-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/4206


---


Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-18 Thread Jan Filipiak

-> it think the relationships between the different used types, K0,K1,KO
should be explains explicitly (all information is there implicitly, but
one need to think hard to figure it out)


I'm probably blind for this. can you help me here? how would you 
formulate this?


Thanks,

Jan


On 16.11.2017 23:18, Matthias J. Sax wrote:

Hi,

I am just catching up on this discussion and did re-read the KIP and
discussion thread.

In contrast to you, I prefer the second approach with CombinedKey as
return type for the following reasons:

  1) the oneToManyJoin() method had less parameter
  2) those parameters are easy to understand
  3) we hide implementation details (joinPrefixFaker, leftKeyExtractor,
and the return type KO leaks internal implementation details from my
point of view)
  4) user can get their own KO type by extending CombinedKey interface
(this would also address the nesting issue Trevor pointed out)

That's unclear to me is, why you care about JSON serdes? What is the
problem with regard to prefix? It seems I am missing something here.

I also don't understand the argument about "the user can stick with his
default serde or his standard way of serializing"? If we have
`CombinedKey` as output, the use just provide the serdes for both input
combined-key types individually, and we can reuse both internally to do
the rest. This seems to be a way simpler API. With the KO output type
approach, users need to write an entirely new serde for KO in contrast.

Finally, @Jan, there are still some open comments you did not address
and the KIP wiki page needs some updates. Would be great if you could do
this.

Can you also explicitly describe the data layout of the store that is
used to do the range scans?

Additionally:

-> some arrows in the algorithm diagram are missing
-> was are those XXX in the diagram
-> can you finish the "Step by Step" example
-> it think the relationships between the different used types, K0,K1,KO
should be explains explicitly (all information is there implicitly, but
one need to think hard to figure it out)


Last but not least:


But noone is really interested.

Don't understand this statement...



-Matthias


On 11/16/17 9:05 AM, Jan Filipiak wrote:

We are running this perfectly fine. for us the smaller table changes
rather infrequent say. only a few times per day. The performance of the
flush is way lower than the computing power you need to bring to the
table to account for all the records beeing emmited after the one single
update.

On 16.11.2017 18:02, Trevor Huey wrote:

Ah, I think I see the problem now. Thanks for the explanation. That is
tricky. As you said, it seems the easiest solution would just be to
flush the cache. I wonder how big of a performance hit that'd be...

On Thu, Nov 16, 2017 at 9:07 AM Jan Filipiak mailto:jan.filip...@trivago.com>> wrote:

 Hi Trevor,

 I am leaning towards the less intrusive approach myself. Infact
 that is how we implemented our Internal API for this and how we
 run it in production.
 getting more voices towards this solution makes me really happy.
 The reason its a problem for Prefix and not for Range is the
 following. Imagine the intrusive approach. They key of the RockDB
 would be CombinedKey and the prefix scan would take an A, and
 the range scan would take an CombinedKey still. As you can
 see with the intrusive approach the keys are actually different
 types for different queries. With the less intrusive apporach we
 use the same type and rely on Serde Invariances. For us this works
 nice (protobuf) might bite some JSON users.

 Hope it makes it clear

 Best Jan


 On 16.11.2017 16:39, Trevor Huey wrote:

 1. Going over KIP-213, I am leaning toward the "less intrusive"
 approach. In my use case, I am planning on performing a sequence
 of several oneToMany joins, From my understanding, the more
 intrusive approach would result in several nested levels of
 CombinedKey's. For example, consider Tables A, B, C, D with
 corresponding keys KA, KB, KC. Joining A and B would produce
 CombinedKey. Then joining that result on C would produce
 CombinedKey>. My "keyOtherSerde" in this
 case would need to be capable of deserializing CombinedKey. This would just get worse the more tables I join. I realize
 that it's easier to shoot yourself in the foot with the less
 intrusive approach, but as you said, " the user can stick with
 his default serde or his standard way of serializing". In the
 simplest case where the keys are just strings, they can do simple
 string concatenation and Serdes.String(). It also allows the user
 to create and use their own version of CombinedKey if they feel
 so inclined.

 2. Why is there a problem for prefix, but not for range?

https://github.com/apache/kafka/pull/3720/files#diff-8f863b74c3c5a0b989e89d00c149aef1L162




 On Thu, Nov 16, 2017 at 2:57 AM Jan Filipiak
 mail

[GitHub] kafka pull request #4232: KAFKA-6233 :Removed unnecessary null check

2017-11-18 Thread sagarchavan3172
GitHub user sagarchavan3172 opened a pull request:

https://github.com/apache/kafka/pull/4232

KAFKA-6233 :Removed unnecessary null check

*More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.*

*Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.*

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sagarchavan3172/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4232.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4232


commit ea843103d6d10775e58be53bb6a02154c0a10735
Author: Sagar Chavan 
Date:   2017-11-18T20:43:33Z

Removed unnecessary null check




---


[jira] [Created] (KAFKA-6233) Removed unnecessary null check

2017-11-18 Thread sagar sukhadev chavan (JIRA)
sagar sukhadev chavan created KAFKA-6233:


 Summary: Removed unnecessary null check
 Key: KAFKA-6233
 URL: https://issues.apache.org/jira/browse/KAFKA-6233
 Project: Kafka
  Issue Type: New Feature
  Components: clients
Affects Versions: 0.11.0.2, 1.0.0, 0.10.2.1, 0.10.1.1
Reporter: sagar sukhadev chavan
Priority: Trivial


Removed unnecessary null check

if (encodingValue != null && encodingValue instanceof String)

null instanceof String returns false hence replaced the check with
if (encodingValue instanceof String)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] kafka pull request #4231: MINOR: Small cleanups/refactoring in kafka.control...

2017-11-18 Thread mimaison
GitHub user mimaison opened a pull request:

https://github.com/apache/kafka/pull/4231

MINOR: Small cleanups/refactoring in kafka.controller

- Updated logging to use string templates
- Minor refactors
- Fixed a few typos

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mimaison/kafka controller_refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/4231.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4231


commit 427c6b1d5e0829116eddbd8c44941939e5428033
Author: Mickael Maison 
Date:   2017-11-18T18:28:49Z

MINOR: Small cleanups/refactoring in kafka.controller

Updated logging to use string templates
Fixed a few typos




---


Re: SessionKeySchema#segmentsToSearch()

2017-11-18 Thread Ted Yu
This code:

final Segment minSegment = segments
.getMinSegmentGreaterThanEqualToTimestamp(timeFrom);

final Segment maxSegment = segments
.getMaxSegmentLessThanEqualToTimestamp(timeTo);

Can be replaced with:

final List searchSpace = keySchema.segmentsToSearch(
segments, from, to);

The minSegment would be first in List and maxSegment would be last in List.

On Sat, Nov 18, 2017 at 11:09 AM, Ted Yu  wrote:

> Hi,
> I was reading code for SessionKeySchema#segmentsToSearch() where:
>
> public List segmentsToSearch(final Segments segments, final
> long from, final long to) {
> return segments.segments(from, Long.MAX_VALUE);
>
> I wonder why the parameter to is ignored.
> WindowKeySchema#segmentsToSearch() passes parameter to
> to segments.segments().
>
> Cheers
>


SessionKeySchema#segmentsToSearch()

2017-11-18 Thread Ted Yu
Hi,
I was reading code for SessionKeySchema#segmentsToSearch() where:

public List segmentsToSearch(final Segments segments, final
long from, final long to) {
return segments.segments(from, Long.MAX_VALUE);

I wonder why the parameter to is ignored.
WindowKeySchema#segmentsToSearch() passes parameter to
to segments.segments().

Cheers


Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-18 Thread Jan Filipiak


On 17.11.2017 06:59, Guozhang Wang wrote:

Thanks for the explanation Jan. On top of my head I'm leaning towards the
"more intrusive" approach to resolve the race condition issue we discussed
above. Matthias has some arguments for this approach already, so I would
not re-iterate them here. To me I find the "ValueMapper
joinPrefixFaker" is actually leaking the same amount of internal
implementation details information as the more intrusive approach, but in a
less clear way. So I'd rather just clarify to users than trying to abstract
in an awkward way.

As again. The benefits of __not__ introducing a new "Wrapper" type are huge!
We spend lot of effort to get rid of Changes<> in our Topics. We also 
will not

want CombinedKeys.
I make one suggestion! Let's keep thinking how to make this more precise
w/o introducing new Kafka Streams only types?
As you can see currently the vote is 2 / 2. People that use kafka stream
like the less intrusive approach people that develop like the more 
intrusive one.

The most pretty thing might not be the thing that gives the bang for the bug
out there.

Best Jan


Also I'm not clear what do you mean by "CombinedKey would require an
additional mapping to what the less intrusive method has". If you meant
that users are enforced to provide a new serde for this combo key, could
that be avoided with the library automatically generate a serde for it
until the user changed this key later in the topology (e.g. via a map()
function) in which they can "flatten" this combo key into a flat key.

*@Trevor: *for your case for concatenating multiple joins, I think a better
way is to call `oneToManyJoin().map().oneToManyJoin().map()...` than
specifying a sequence of joinPrefixFakers as they will also be chained up
together (remember we have to keep this object along the rest of the
topology) which will make serde even harder?

Hi,

that was the map I was talking about. Last time I checked KTable only 
had 3 Generic Types.
For this I think it would require 4 Types, KeyIn,KeyOut,ValueIn,ValueOut 
I am very much in favour to
add this since basically ever, maybe this opens up some discussion, but 
without this Mapping keys of
KTable is not possible. I once again recommend peeking over to 
Mapreduce/Tez guys wich have the

concept of these 4 Generics since basically ever.



Similar to Matthias's question, the "XXX" markers are a bit confusing to me.

Sorry!!!


Guozhang


On Thu, Nov 16, 2017 at 2:18 PM, Matthias J. Sax 
wrote:


Hi,

I am just catching up on this discussion and did re-read the KIP and
discussion thread.

In contrast to you, I prefer the second approach with CombinedKey as
return type for the following reasons:

  1) the oneToManyJoin() method had less parameter
  2) those parameters are easy to understand
  3) we hide implementation details (joinPrefixFaker, leftKeyExtractor,
and the return type KO leaks internal implementation details from my
point of view)
  4) user can get their own KO type by extending CombinedKey interface
(this would also address the nesting issue Trevor pointed out)

That's unclear to me is, why you care about JSON serdes? What is the
problem with regard to prefix? It seems I am missing something here.

I also don't understand the argument about "the user can stick with his
default serde or his standard way of serializing"? If we have
`CombinedKey` as output, the use just provide the serdes for both input
combined-key types individually, and we can reuse both internally to do
the rest. This seems to be a way simpler API. With the KO output type
approach, users need to write an entirely new serde for KO in contrast.

Finally, @Jan, there are still some open comments you did not address
and the KIP wiki page needs some updates. Would be great if you could do
this.

Can you also explicitly describe the data layout of the store that is
used to do the range scans?

Additionally:

-> some arrows in the algorithm diagram are missing
-> was are those XXX in the diagram
-> can you finish the "Step by Step" example
-> it think the relationships between the different used types, K0,K1,KO
should be explains explicitly (all information is there implicitly, but
one need to think hard to figure it out)


Last but not least:


But noone is really interested.

Don't understand this statement...



-Matthias


On 11/16/17 9:05 AM, Jan Filipiak wrote:

We are running this perfectly fine. for us the smaller table changes
rather infrequent say. only a few times per day. The performance of the
flush is way lower than the computing power you need to bring to the
table to account for all the records beeing emmited after the one single
update.

On 16.11.2017 18:02, Trevor Huey wrote:

Ah, I think I see the problem now. Thanks for the explanation. That is
tricky. As you said, it seems the easiest solution would just be to
flush the cache. I wonder how big of a performance hit that'd be...

On Thu, Nov 16, 2017 at 9:07 AM Jan Filipiak mailto:jan.filip...@trivago.com>> wrote:

   

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

2017-11-18 Thread Jan Filipiak

Hi Matthias

answers to the questions inline.

On 16.11.2017 23:18, Matthias J. Sax wrote:

Hi,

I am just catching up on this discussion and did re-read the KIP and
discussion thread.

In contrast to you, I prefer the second approach with CombinedKey as
return type for the following reasons:

  1) the oneToManyJoin() method had less parameter

yeah I like that!

  2) those parameters are easy to understand

The big benefit really!

  3) we hide implementation details (joinPrefixFaker, leftKeyExtractor,
and the return type KO leaks internal implementation details from my
point of view)
It does, so does Combined key. I am a firm believer in the principle of 
leaky abstractions.
I think this is okay given the non-triviality of what it tries to 
abstract away.



  4) user can get their own KO type by extending CombinedKey interface
(this would also address the nesting issue Trevor pointed out)
They can not easily. Say you use protobuf (as we do) and your classes 
get generated by a compiler such as protoc you can not easily
have it generate subclasses of CombinedKey. I think it's great that we 
have Trevors opinion here as a second users perspective.
To get stuff going its sometimes easier to deal with the implications of 
you API (that are there anyways) instead of fighting you current
established toolset to adapt to some new sheme (like ComnbinedKeys). In 
the end there is a reason we run it in production with the
less intrusive approach. because it is way less intrusive into our 
current tool chain and does not require us to adapt to some "kafka streams"
specifica. We have tools to inspect topics, if they key would suddenly 
be a CombinedKey of two protobuf messages we cant use our default 
toolchain. This argument for us is very relevant to give some gravitas: 
We also rewrote the KTable::GroupBy that comes with stock
0.10.0.1 to repartition without serializing Change<> and have log 
compaction enabled to not treat Streamstopics different than any other.
For us this is very important. We want to upstream this to be able to 
use it instead of our Reflection based PAPI setup. We would not take the 
stock one into production.

That's unclear to me is, why you care about JSON serdes? What is the
problem with regard to prefix? It seems I am missing something here.

say you have a CombineKey A,B  with values "a" and "b",
if you take our Protobuf Serde its gonna be
"a""b"
and without "b" field set
"a"
as you can see its a perfect prefix
but with JSON
you would get
{ "A" => "a", "B" => "b" }
and without the B field
{ "A" => "a" }
as you can see it will  not be a prefix

I also don't understand the argument about "the user can stick with his
default serde or his standard way of serializing"? If we have
`CombinedKey` as output, the use just provide the serdes for both input
combined-key types individually, and we can reuse both internally to do
the rest. This seems to be a way simpler API. With the KO output type
approach, users need to write an entirely new serde for KO in contrast.

Finally, @Jan, there are still some open comments you did not address
and the KIP wiki page needs some updates. Would be great if you could do
this.

Can you also explicitly describe the data layout of the store that is
used to do the range scans?

Additionally:

-> some arrows in the algorithm diagram are missing
-> was are those XXX in the diagram
-> can you finish the "Step by Step" example
Can't find the missing arrows. Those XX's are not really relevant, will 
focus on the step by step example

-> it think the relationships between the different used types, K0,K1,KO
should be explains explicitly (all information is there implicitly, but
one need to think hard to figure it out)

will add this

Last but not least:


But noone is really interested.
This was the first time some took the effort to address the most 
pressuring issue moving this forward.

I counted this as not beeing interested before.

Don't understand this statement...



-Matthias


On 11/16/17 9:05 AM, Jan Filipiak wrote:

We are running this perfectly fine. for us the smaller table changes
rather infrequent say. only a few times per day. The performance of the
flush is way lower than the computing power you need to bring to the
table to account for all the records beeing emmited after the one single
update.

On 16.11.2017 18:02, Trevor Huey wrote:

Ah, I think I see the problem now. Thanks for the explanation. That is
tricky. As you said, it seems the easiest solution would just be to
flush the cache. I wonder how big of a performance hit that'd be...

On Thu, Nov 16, 2017 at 9:07 AM Jan Filipiak mailto:jan.filip...@trivago.com>> wrote:

 Hi Trevor,

 I am leaning towards the less intrusive approach myself. Infact
 that is how we implemented our Internal API for this and how we
 run it in production.
 getting more voices towards this solution makes me really happy.
 The reason its a problem for Prefix and not for Range is the
 followin

Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-18 Thread Jan Filipiak

Hi,

 not an issue at all. IMO
the approach that is on the table would be perfect

On 18.11.2017 10:58, Jeyhun Karimov wrote:

Hi,

I did not expected that Context will be this much an issue. Instead of
applying different semantics for different operators, I think we should
remove this feature completely.


Cheers,
Jeyhun
On Sat 18. Nov 2017 at 07:49, Jan Filipiak  wrote:


Yes, the mail said only join so I wanted to clarify.



On 17.11.2017 19:05, Matthias J. Sax wrote:

Yes. But I think an aggregation is an many-to-one operation, too.

For the stripping off part: internally, we can just keep some record
context, but just do not allow users to access it (because the context
context does not make sense for them) by hiding the corresponding APIs.


-Matthias

On 11/16/17 10:05 PM, Guozhang Wang wrote:

Matthias,

For this idea, are your proposing that for any many-to-one mapping
operations (for now only Join operators), we will strip off the record
context in the resulted records and claim "we cannot infer its traced
context anymore"?


Guozhang


On Thu, Nov 16, 2017 at 1:03 PM, Matthias J. Sax 
Any thoughts about my latest proposal?

-Matthias

On 11/10/17 10:02 PM, Jan Filipiak wrote:

Hi,

i think this is the better way. Naming is always tricky Source is

kinda

taken
I had TopicBackedK[Source|Table] in mind
but for the user its way better already IMHO

Thank you for reconsideration

Best Jan


On 10.11.2017 22:48, Matthias J. Sax wrote:

I was thinking about the source stream/table idea once more and it

seems

it would not be too hard to implement:

We add two new classes

 SourceKStream extends KStream

and

 SourceKTable extend KTable

and return both from StreamsBuilder#stream and StreamsBuilder#table

As both are sub-classes, this change is backward compatible. We

change

the return type for any single-record transform to this new types,

too,

and use KStream/KTable as return type for any multi-record operation.

The new RecordContext API is added to both new classes. For old

classes,

we only implement KIP-149 to get access to the key.


WDYT?


-Matthias

On 11/9/17 9:13 PM, Jan Filipiak wrote:

Okay,

looks like it would _at least work_ for Cached KTableSources .
But we make it harder to the user to make mistakes by putting
features into places where they don't make sense and don't
help anyone.

I once again think that my suggestion is easier to implement and
more correct. I will use this email to express my disagreement with

the

proposed KIP (-1 non binding of course) state that I am open for any
questions
regarding this. I will also do the usual thing and point out that

the

friends
over at Hive got it correct aswell.
One can not user their
https://cwiki.apache.org/confluence/display/Hive/

LanguageManual+VirtualColumns

in any place where its not read from the Sources.

With KSQl in mind it makes me sad how this is evolving here.

Best Jan





On 10.11.2017 01:06, Guozhang Wang wrote:

Hello Jan,

Regarding your question about caching: today we keep the record

context

with the cached entry already so when we flush the cache which may
generate
new records forwarding we will set the record context

appropriately;

and
then after the flush is completed we will reset the context to the
record
before the flush happens. But I think when Jeyhun did the PR it is

a

good
time to double check on such stages to make sure we are not
introducing any
regressions.


Guozhang


On Mon, Nov 6, 2017 at 8:54 PM, Jan Filipiak <

jan.filip...@trivago.com>

wrote:


I Aggree completely.

Exposing this information in a place where it has no _natural_
belonging
might really be a bad blocker in the long run.

Concerning your first point. I would argue its not to hard to

have a

user
keep track of these. If we still don't want the user
to keep track of these I would argue that all > projection only <
transformations on a Source-backed KTable/KStream
could also return a Ktable/KStream instance of the type we return
from the
topology builder.
Only after any operation that exceeds projection or filter one

would

return a KTable not granting access to this any longer.

Even then its difficult already: I never ran a topology with

caching

but I
am not even 100% sure what the record Context means behind
a materialized KTable with Caching? Topic and Partition are

probably

with
some reasoning but offset is probably only the offset causing the
flush?
So one might aswell think to drop offsets from this RecordContext.

Best Jan







On 07.11.2017 03:18, Guozhang Wang wrote:


Regarding the API design (the proposed set of overloads v.s. one
overload
on #map to enrich the record), I think what we have represents a

good

trade-off between API succinctness and user convenience: on one
hand we
definitely want to keep as fewer overloaded functions as

possible.

But on
the other hand if we only do that in, say, the #map() function

then

this
enrichment could be an overkill: think of a topology that has 7

Re: Interested in being a contributor

2017-11-18 Thread Ted Yu
Please read this: https://kafka.apache.org/contributing

You can use this Filter to find issues for new contributor:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20labels%20%3D%20newbie%20AND%20status%20%3D%20Open

Cheers

On Sat, Nov 18, 2017 at 2:12 AM, Panuwat Anawatmongkhon <
panuwat.anawatmongk...@gmail.com> wrote:

> Hi All,
> I am interested in being a contributor. Can anyone guide me? I am a very
> new for contributing open source.
> Thank you,
> Benz
>


Interested in being a contributor

2017-11-18 Thread Panuwat Anawatmongkhon
Hi All,
I am interested in being a contributor. Can anyone guide me? I am a very
new for contributing open source.
Thank you,
Benz


[jira] [Created] (KAFKA-6232) SaslSslAdminClientIntegrationTest sometimes fails

2017-11-18 Thread Ted Yu (JIRA)
Ted Yu created KAFKA-6232:
-

 Summary: SaslSslAdminClientIntegrationTest sometimes fails
 Key: KAFKA-6232
 URL: https://issues.apache.org/jira/browse/KAFKA-6232
 Project: Kafka
  Issue Type: Test
Reporter: Ted Yu


Here was one recent occurrence:

https://builds.apache.org/job/kafka-trunk-jdk9/203/testReport/kafka.api/SaslSslAdminClientIntegrationTest/testLogStartOffsetCheckpoint/
{code}
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.LeaderNotAvailableException: There is no leader 
for this topic-partition as we are in the middle of a leadership election.
at 
org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
at 
org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:225)
at 
kafka.api.AdminClientIntegrationTest.$anonfun$testLogStartOffsetCheckpoint$3(AdminClientIntegrationTest.scala:762)
{code}
In the test output, I saw:
{code}
[2017-11-17 23:15:45,593] ERROR [KafkaApi-1] Error when handling request 
{filters=[{resource_type=2,resource_name=foobar,principal=User:ANONYMOUS,host=*,operation=3,permission_type=3},{resource_type=5,resource_name=transactional_id,principal=User:ANONYMOUS,host=*,operation=4,permission_type=3}]}
 (kafka.server.KafkaApis:107)
org.apache.kafka.common.errors.ClusterAuthorizationException: Request 
Request(processor=2, connectionId=127.0.0.1:36295-127.0.0.1:58183-0, 
session=Session(User:client2,localhost/127.0.0.1), 
listenerName=ListenerName(SASL_SSL), securityProtocol=SASL_SSL, buffer=null) is 
not authorized.
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] KIP-159: Introducing Rich functions to Streams

2017-11-18 Thread Jeyhun Karimov
Hi,

I did not expected that Context will be this much an issue. Instead of
applying different semantics for different operators, I think we should
remove this feature completely.


Cheers,
Jeyhun
On Sat 18. Nov 2017 at 07:49, Jan Filipiak  wrote:

> Yes, the mail said only join so I wanted to clarify.
>
>
>
> On 17.11.2017 19:05, Matthias J. Sax wrote:
> > Yes. But I think an aggregation is an many-to-one operation, too.
> >
> > For the stripping off part: internally, we can just keep some record
> > context, but just do not allow users to access it (because the context
> > context does not make sense for them) by hiding the corresponding APIs.
> >
> >
> > -Matthias
> >
> > On 11/16/17 10:05 PM, Guozhang Wang wrote:
> >> Matthias,
> >>
> >> For this idea, are your proposing that for any many-to-one mapping
> >> operations (for now only Join operators), we will strip off the record
> >> context in the resulted records and claim "we cannot infer its traced
> >> context anymore"?
> >>
> >>
> >> Guozhang
> >>
> >>
> >> On Thu, Nov 16, 2017 at 1:03 PM, Matthias J. Sax  >
> >> wrote:
> >>
> >>> Any thoughts about my latest proposal?
> >>>
> >>> -Matthias
> >>>
> >>> On 11/10/17 10:02 PM, Jan Filipiak wrote:
>  Hi,
> 
>  i think this is the better way. Naming is always tricky Source is
> kinda
>  taken
>  I had TopicBackedK[Source|Table] in mind
>  but for the user its way better already IMHO
> 
>  Thank you for reconsideration
> 
>  Best Jan
> 
> 
>  On 10.11.2017 22:48, Matthias J. Sax wrote:
> > I was thinking about the source stream/table idea once more and it
> seems
> > it would not be too hard to implement:
> >
> > We add two new classes
> >
> > SourceKStream extends KStream
> >
> > and
> >
> > SourceKTable extend KTable
> >
> > and return both from StreamsBuilder#stream and StreamsBuilder#table
> >
> > As both are sub-classes, this change is backward compatible. We
> change
> > the return type for any single-record transform to this new types,
> too,
> > and use KStream/KTable as return type for any multi-record operation.
> >
> > The new RecordContext API is added to both new classes. For old
> classes,
> > we only implement KIP-149 to get access to the key.
> >
> >
> > WDYT?
> >
> >
> > -Matthias
> >
> > On 11/9/17 9:13 PM, Jan Filipiak wrote:
> >> Okay,
> >>
> >> looks like it would _at least work_ for Cached KTableSources .
> >> But we make it harder to the user to make mistakes by putting
> >> features into places where they don't make sense and don't
> >> help anyone.
> >>
> >> I once again think that my suggestion is easier to implement and
> >> more correct. I will use this email to express my disagreement with
> the
> >> proposed KIP (-1 non binding of course) state that I am open for any
> >> questions
> >> regarding this. I will also do the usual thing and point out that
> the
> >> friends
> >> over at Hive got it correct aswell.
> >> One can not user their
> >> https://cwiki.apache.org/confluence/display/Hive/
> >>> LanguageManual+VirtualColumns
> >>
> >> in any place where its not read from the Sources.
> >>
> >> With KSQl in mind it makes me sad how this is evolving here.
> >>
> >> Best Jan
> >>
> >>
> >>
> >>
> >>
> >> On 10.11.2017 01:06, Guozhang Wang wrote:
> >>> Hello Jan,
> >>>
> >>> Regarding your question about caching: today we keep the record
> >>> context
> >>> with the cached entry already so when we flush the cache which may
> >>> generate
> >>> new records forwarding we will set the record context
> appropriately;
> >>> and
> >>> then after the flush is completed we will reset the context to the
> >>> record
> >>> before the flush happens. But I think when Jeyhun did the PR it is
> a
> >>> good
> >>> time to double check on such stages to make sure we are not
> >>> introducing any
> >>> regressions.
> >>>
> >>>
> >>> Guozhang
> >>>
> >>>
> >>> On Mon, Nov 6, 2017 at 8:54 PM, Jan Filipiak <
> >>> jan.filip...@trivago.com>
> >>> wrote:
> >>>
>  I Aggree completely.
> 
>  Exposing this information in a place where it has no _natural_
>  belonging
>  might really be a bad blocker in the long run.
> 
>  Concerning your first point. I would argue its not to hard to
> have a
>  user
>  keep track of these. If we still don't want the user
>  to keep track of these I would argue that all > projection only <
>  transformations on a Source-backed KTable/KStream
>  could also return a Ktable/KStream instance of the type we return
>  from the
>  topology builder.
>  Only after any operation that exceeds project