[jira] [Created] (KAFKA-5145) Remove task close() call from closeNonAssignedSuspendedTasks method

2017-04-30 Thread Narendra Kumar (JIRA)
Narendra Kumar created KAFKA-5145:
-

 Summary: Remove task close() call from 
closeNonAssignedSuspendedTasks method
 Key: KAFKA-5145
 URL: https://issues.apache.org/jira/browse/KAFKA-5145
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.0
Reporter: Narendra Kumar
 Attachments: BugTest.java, DebugTransformer.java, logs.txt

While rebalancing ProcessorNode.close() can be called twice, once  from 
StreamThread.suspendTasksAndState() and once from  
StreamThread.closeNonAssignedSuspendedTasks(). If ProcessorNode.close() throws 
some exception because of calling close() multiple times( i.e. 
IllegalStateException from  some KafkaConsumer instance being used by some 
processor for some lookup), we fail to close the task's state manager ( i.e. 
call to task.closeStateManager(true); fails).  After rebalance, if the same 
task id is launched on same application instance but in different thread then 
the task get stuck because it fails to get lock to the task's state directory.

Since processor close() is already called from 
StreamThread.suspendTasksAndState() we don't need to call again from 
StreamThread.closeNonAssignedSuspendedTasks().



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-30 Thread Stephane Maarek
I’m not sure how people would feel about having two distinct methods to build 
the same object?
An API wrapper may be useful, but it doesn’t bring opinion about how one should 
program, that’s just driven by the docs. 
I’m okay with that, but we need concensus
 

On 1/5/17, 6:08 am, "Michael Pearce"  wrote:

Why not, instead of deprecating or removing whats there, as noted, its a 
point of preference, think about something that could wrap the existing, but 
provide an api that for you is cleaner?

e.g. here's a sample idea building on a fluent api way. (this wraps the 
producer and producer records so no changes needed)

https://gist.github.com/michaelandrepearce/de0f5ad4aa7d39d243781741c58c293e

In future as new items further add to Producer Record, they just become new 
methods in the fluent API, as it builds the ProducerRecord using the most 
exhaustive constructor.




From: Matthias J. Sax 
Sent: Saturday, April 29, 2017 6:52 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

I understand that we cannot just break stuff (btw: also not for
Streams!). But deprecating does not break anything, so I don't think
it's a big deal to change the API as long as we keep the old API as
deprecated.


-Matthias

On 4/29/17 9:28 AM, Jay Kreps wrote:
> Hey Matthias,
>
> Yeah I agree, I'm not against change as a general thing! I also think if
> you look back on the last two years, we completely rewrote the producer 
and
> consumer APIs, reworked the binary protocol many times over, and added the
> connector and stream processing apis, both major new additions. So I don't
> think we're in too much danger of stagnating!
>
> My two cents was just around breaking compatibility for trivial changes
> like constructor => builder. I think this only applies to the producer,
> consumer, and connect apis which are heavily embedded in hundreds of
> ecosystem components that depend on them. This is different from direct
> usage. If we break the streams api it is really no big deal---apps just
> need to rebuild when they upgrade, not the end of the world at all. 
However
> because many intermediate things depend on the Kafka producer you can 
cause
> these weird situations where your app depends on two third party things
> that use Kafka and each requires different, incompatible versions. We did
> this a lot in earlier versions of Kafka and it was the cause of much angst
> (and an ingrained general reluctance to upgrade) from our users.
>
> I still think we may have to break things, i just don't think we should do
> it for things like builders vs direct constructors which i think are kind
> of a debatable matter of taste.
>
> -Jay
>
>
>
> On Mon, Apr 24, 2017 at 9:40 AM, Matthias J. Sax 
> wrote:
>
>> Hey Jay,
>>
>> I understand your concern, and for sure, we will need to keep the
>> current constructors deprecated for a long time (ie, many years).
>>
>> But if we don't make the move, we will not be able to improve. And I
>> think warnings about using deprecated APIs is an acceptable price to
>> pay. And the API improvements will help new people who adopt Kafka to
>> get started more easily.
>>
>> Otherwise Kafka might end up as many other enterprise software with a
>> lots of old stuff that is kept forever because nobody has the guts to
>> improve/change it.
>>
>> Of course, we can still improve the docs of the deprecated constructors,
>> too.
>>
>> Just my two cents.
>>
>>
>> -Matthias
>>
>> On 4/23/17 3:37 PM, Jay Kreps wrote:
>>> Hey guys,
>>>
>>> I definitely think that the constructors could have been better 
designed,
>>> but I think given that they're in heavy use I don't think this proposal
>>> will improve things. Deprecating constructors just leaves everyone with
>>> lots of warnings and crossed out things. We can't actually delete the
>>> methods because lots of code needs to be usable across multiple Kafka
>>> versions, right? So we aren't picking between the original approach
>> (worse)
>>> and the new approach (better); what we are proposing is a perpetual
>>> mingling of the original style and the new style with a bunch of
>> deprecated
>>> stuff, which I think is worst of all.
>>>
>>> I'd vote for just documenting the meaning of null in the ProducerRecord
>>> constructor.
>>>
>>> -Jay
>>>
>>> On Wed, Apr 19, 2017 at 3:34 PM, Stephane Maarek <
>>> steph...@simplemachines.com.au> wrote:
>>>
 Hi all,

 My first KIP, let me know your thoughts!

kafka tcp connection

2017-04-30 Thread Sharat Jagannath
I was just wondering, when does the kafka producer  tcp connection open
when using the java api? Is it when initializing the KafkaProducer object
using the constructor or when the actual send method is called to send the
record?
I am guessing its on send since if kafka is down, it throws a connection
error on the send method. Is that correct?

Thanks
Sharat J
Sent from my phone

-- 
 
- 
*Confidentiality Notice: This message is intended for use only by the 
individual or entity to which it is addressed and may contain information 
that is privileged, confidential, and exempt from disclosure under 
applicable law. If the reader of this message is not the intended recipient 
or the employee or agent responsible for delivering the message to the 
intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. If 
you have received this communication in error, please contact the sender 
immediately and destroy the material in its entirety. *
*By replying to this e-mail, you consent to FusionOps monitoring activities 
of all communication that occurs on FusionOps systems.*


[DISCUSS] KIP-147: Add missing type parameters to StateStoreSupplier factories and KGroupedStream/Table methods

2017-04-30 Thread Michal Borowiecki

Hi community!

I have just drafted KIP-147: Add missing type parameters to 
StateStoreSupplier factories and KGroupedStream/Table methods 



Please let me know if this a step in the right direction.

All comments welcome.

Thanks,
Michal
--
Signature
 Michal Borowiecki
Senior Software Engineer L4
T:  +44 208 742 1600


+44 203 249 8448



E:  michal.borowie...@openbet.com
W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee. If you 
have received this message in error, please immediately notify the 
postmas...@openbet.com  and delete it 
from your system as well as any copies. The content of e-mails as well 
as traffic data may be monitored by OpenBet for employment and security 
purposes. To protect the environment please do not print this e-mail 
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company 
registered in England and Wales. Registered no. 3134634. VAT no. 
GB927523612




[GitHub] kafka pull request #2950: enhance JmxTool usage and output format

2017-04-30 Thread scottf
GitHub user scottf opened a pull request:

https://github.com/apache/kafka/pull/2950

enhance JmxTool usage and output format

one-time flag to only run the pull once instead of looping forever
reporting-interval -1 equivalent to setting one-time
reportFormatOpt supports 'original', 'properties', 'csv', 'tsv' formatting

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloudurable/kafka trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2950.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2950


commit b3852a4087032b06e3731445dbf79141c2889061
Author: Scott Fauerbach 
Date:   2017-04-30T22:04:20Z

one-time flag to only run the pull once instead of looping forever
reporting-interval -1 equivalent to setting one-time
reportFormatOpt supports 'original', 'properties', 'csv', 'tsv' formatting




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5055) Kafka Streams skipped-records-rate sensor producing nonzero values even when FailOnInvalidTimestamp is used as extractor

2017-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990377#comment-15990377
 ] 

ASF GitHub Bot commented on KAFKA-5055:
---

GitHub user dpoldrugo opened a pull request:

https://github.com/apache/kafka/pull/2949

KAFKA-5055: Kafka Streams skipped-records-rate sensor bug

Fix as described in the [KAFKA-5055 Jira 
comment](https://issues.apache.org/jira/browse/KAFKA-5055?focusedCommentId=15990086=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15990086).

@mjsax @guozhangwang take a look

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dpoldrugo/kafka 
KAFKA-5055-skipped-records-rate-sensor-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2949


commit a8768f4f4431c84880bec617f9c91b866fc003c7
Author: dpoldrugo 
Date:   2017-04-30T20:48:36Z

KAFKA-5055 Kafka Streams skipped-records-rate sensor bug




> Kafka Streams skipped-records-rate sensor producing nonzero values even when 
> FailOnInvalidTimestamp is used as extractor
> 
>
> Key: KAFKA-5055
> URL: https://issues.apache.org/jira/browse/KAFKA-5055
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Nikki Thean
>Assignee: Guozhang Wang
>
> According to the code and the documentation for this metric, the only reason 
> for a skipped record is an invalid timestamp, except that a) I am reading 
> from a topic that is populated solely by Kafka Connect and b) I am using 
> `FailOnInvalidTimestamp` as the timestamp extractor.
> Either I'm missing something in the documentation (i.e. another reason for 
> skipped records) or there is a bug in the code that calculates this metric.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2949: KAFKA-5055: Kafka Streams skipped-records-rate sen...

2017-04-30 Thread dpoldrugo
GitHub user dpoldrugo opened a pull request:

https://github.com/apache/kafka/pull/2949

KAFKA-5055: Kafka Streams skipped-records-rate sensor bug

Fix as described in the [KAFKA-5055 Jira 
comment](https://issues.apache.org/jira/browse/KAFKA-5055?focusedCommentId=15990086=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15990086).

@mjsax @guozhangwang take a look

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dpoldrugo/kafka 
KAFKA-5055-skipped-records-rate-sensor-bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2949.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2949


commit a8768f4f4431c84880bec617f9c91b866fc003c7
Author: dpoldrugo 
Date:   2017-04-30T20:48:36Z

KAFKA-5055 Kafka Streams skipped-records-rate sensor bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

2017-04-30 Thread Michael Pearce
Why not, instead of deprecating or removing whats there, as noted, its a point 
of preference, think about something that could wrap the existing, but provide 
an api that for you is cleaner?

e.g. here's a sample idea building on a fluent api way. (this wraps the 
producer and producer records so no changes needed)

https://gist.github.com/michaelandrepearce/de0f5ad4aa7d39d243781741c58c293e

In future as new items further add to Producer Record, they just become new 
methods in the fluent API, as it builds the ProducerRecord using the most 
exhaustive constructor.




From: Matthias J. Sax 
Sent: Saturday, April 29, 2017 6:52 PM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP 141 - ProducerRecordBuilder Interface

I understand that we cannot just break stuff (btw: also not for
Streams!). But deprecating does not break anything, so I don't think
it's a big deal to change the API as long as we keep the old API as
deprecated.


-Matthias

On 4/29/17 9:28 AM, Jay Kreps wrote:
> Hey Matthias,
>
> Yeah I agree, I'm not against change as a general thing! I also think if
> you look back on the last two years, we completely rewrote the producer and
> consumer APIs, reworked the binary protocol many times over, and added the
> connector and stream processing apis, both major new additions. So I don't
> think we're in too much danger of stagnating!
>
> My two cents was just around breaking compatibility for trivial changes
> like constructor => builder. I think this only applies to the producer,
> consumer, and connect apis which are heavily embedded in hundreds of
> ecosystem components that depend on them. This is different from direct
> usage. If we break the streams api it is really no big deal---apps just
> need to rebuild when they upgrade, not the end of the world at all. However
> because many intermediate things depend on the Kafka producer you can cause
> these weird situations where your app depends on two third party things
> that use Kafka and each requires different, incompatible versions. We did
> this a lot in earlier versions of Kafka and it was the cause of much angst
> (and an ingrained general reluctance to upgrade) from our users.
>
> I still think we may have to break things, i just don't think we should do
> it for things like builders vs direct constructors which i think are kind
> of a debatable matter of taste.
>
> -Jay
>
>
>
> On Mon, Apr 24, 2017 at 9:40 AM, Matthias J. Sax 
> wrote:
>
>> Hey Jay,
>>
>> I understand your concern, and for sure, we will need to keep the
>> current constructors deprecated for a long time (ie, many years).
>>
>> But if we don't make the move, we will not be able to improve. And I
>> think warnings about using deprecated APIs is an acceptable price to
>> pay. And the API improvements will help new people who adopt Kafka to
>> get started more easily.
>>
>> Otherwise Kafka might end up as many other enterprise software with a
>> lots of old stuff that is kept forever because nobody has the guts to
>> improve/change it.
>>
>> Of course, we can still improve the docs of the deprecated constructors,
>> too.
>>
>> Just my two cents.
>>
>>
>> -Matthias
>>
>> On 4/23/17 3:37 PM, Jay Kreps wrote:
>>> Hey guys,
>>>
>>> I definitely think that the constructors could have been better designed,
>>> but I think given that they're in heavy use I don't think this proposal
>>> will improve things. Deprecating constructors just leaves everyone with
>>> lots of warnings and crossed out things. We can't actually delete the
>>> methods because lots of code needs to be usable across multiple Kafka
>>> versions, right? So we aren't picking between the original approach
>> (worse)
>>> and the new approach (better); what we are proposing is a perpetual
>>> mingling of the original style and the new style with a bunch of
>> deprecated
>>> stuff, which I think is worst of all.
>>>
>>> I'd vote for just documenting the meaning of null in the ProducerRecord
>>> constructor.
>>>
>>> -Jay
>>>
>>> On Wed, Apr 19, 2017 at 3:34 PM, Stephane Maarek <
>>> steph...@simplemachines.com.au> wrote:
>>>
 Hi all,

 My first KIP, let me know your thoughts!
 https://cwiki.apache.org/confluence/display/KAFKA/KIP+
 141+-+ProducerRecordBuilder+Interface


 Cheers,
 Stephane

>>>
>>
>>
>
The information contained in this email is strictly confidential and for the 
use of the addressee only, unless otherwise indicated. If you are not the 
intended recipient, please do not read, copy, use or disclose to others this 
message or any attachment. Please also notify the sender by replying to this 
email or by telephone (+44(020 7896 0011) and then delete the email and any 
copies of it. Opinions, conclusion (etc) that do not relate to the official 
business of this company shall be understood as neither given nor endorsed by 
it. IG is a trading name of IG Markets Limited (a company 

Re: [DISCUSS] KIP-138: Change punctuate semantics

2017-04-30 Thread Michal Borowiecki

Hi Matthias,

I'd like to start moving the discarded ideas into Rejected Alternatives 
section. Before I do, I want to tidy them up, ensure they've each been 
given proper treatment.


To that end let me go back to one of your earlier comments about the 
original suggestion (A) to put that to bed.



On 04/04/17 06:44, Matthias J. Sax wrote:

(A) You argue, that users can still "punctuate" on event-time via
process(), but I am not sure if this is possible. Note, that users only
get record timestamps via context.timestamp(). Thus, users would need to
track the time progress per partition (based on the partitions they
obverse via context.partition(). (This alone puts a huge burden on the
user by itself.) However, users are not notified at startup what
partitions are assigned, and user are not notified when partitions get
revoked. Because this information is not available, it's not possible to
"manually advance" stream-time, and thus event-time punctuation within
process() seems not to be possible -- or do you see a way to get it
done? And even if, it might still be too clumsy to use.
I might have missed something but I'm guessing your worry about users 
having to track time progress /per partition/ comes from the what the 
stream-time does currently.
But I'm not sure that those semantics of stream-time are ideal for users 
of punctuate.
That is, if stream-time punctuate didn't exist and users had to use 
process(), would they actually want to use the current semantics of 
stream time?


As a reminder stream time, in all its glory, is (not exactly actually, 
but when trying to be absolutely precise here I spotted KAFKA-5144 
 so I think this 
approximation suffices to illustrate the point):


a minimum across all input partitions of (
   if(msgs never received by partition) -1;
   else {
  a non-descending-minimum of ( the per-batch minimum msg timestamp)
   }
)

Would that really be clear enough to the users of punctuate? Do they 
care for such a convoluted notion of time? I see how this can be useful 
for StreamTask to pick the next partition to take a record from but for 
punctuate?
If users had to implement punctuation with process(), is that what they 
would have chosen as their notion of time?

I'd argue not.

None of the processors implementing the rich windowing/join operations 
in the DSL use punctuate.
Let's take the KStreamKStreamJoinProcessor as an example, in it's 
process() method it simply uses context().timestamp(), which, since it's 
called from process, returns simply, per javadoc:


If it is triggered while processing a record streamed from the source 
processor, timestamp is defined as the timestamp of the current input 
record;


So they don't use that convoluted formula for stream-time. Instead, they 
only care about the timestamp of the current record. I think that having 
users track just that wouldn't be that much of a burden. I don't think 
they need to care about which partitions got assigned or not. And 
StreamTask would still be picking records first from the partition 
having the lowest timestamp to try to "synchronize" the streams as it 
does now.


What users would have to do in their Processor implementations is 
somewhere along the lines of:


long lastPunctuationTime = 0;
long interval = ; //millis

@Override
public void process(K key, V value){
while (ctx.timestamp() >= lastPunctuationTime + interval){
punctuate(ctx.timestamp());
lastPunctuationTime += interval;// I'm not sure of the merit of 
this vs lastPunctuationTime = ctx.timestamp(); but that's what 
PunctuationQueue does currently

}
// do some other business logic here
}

Looking forward to your thoughts.

Cheers,
Michal

--
Signature
 Michal Borowiecki
Senior Software Engineer L4
T:  +44 208 742 1600


+44 203 249 8448



E:  michal.borowie...@openbet.com
W:  www.openbet.com 


OpenBet Ltd

Chiswick Park Building 9

566 Chiswick High Rd

London

W4 5XT

UK




This message is confidential and intended only for the addressee. If you 
have received this message in error, please immediately notify the 
postmas...@openbet.com  and delete it 
from your system as well as any copies. The content of e-mails as well 
as traffic data may be monitored by OpenBet for employment and security 
purposes. To protect the environment please do not print this e-mail 
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company 
registered in England and Wales. Registered no. 3134634. VAT no. 
GB927523612




[jira] [Commented] (KAFKA-5144) MinTimestampTracker does not correctly add timestamps lower than the current max

2017-04-30 Thread Michal Borowiecki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990335#comment-15990335
 ] 

Michal Borowiecki commented on KAFKA-5144:
--

Added a second [PR|https://github.com/apache/kafka/pull/2948] which is just a 
refactoring (does not fix the issue!) to make reasoning about the code easier.
It renames variables to express their true meaning and adds comments where it 
matters.
This is a separate PR since it's independent of the tests. If I got it wrong 
somehow and the test cases are indeed invalid, this refactoring is still useful 
as it better documents what the code actually does.

> MinTimestampTracker does not correctly add timestamps lower than the current 
> max
> 
>
> Key: KAFKA-5144
> URL: https://issues.apache.org/jira/browse/KAFKA-5144
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Michal Borowiecki
>Assignee: Michal Borowiecki
>
> When adding elements MinTimestampTracker removes all existing elements 
> greater than the added element.
> Perhaps I've missed something and this is intended behaviour but I can't find 
> any evidence for that in comments or tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5144) MinTimestampTracker does not correctly add timestamps lower than the current max

2017-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990333#comment-15990333
 ] 

ASF GitHub Bot commented on KAFKA-5144:
---

GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/2948

KAFKA-5144 renamed and added comments to make it clear what's going on

The descendingSubsequence is a misnomer. The linked list is actually 
arranged so that the lowest timestamp is first and larger timestamps are added 
to the end, therefore renamed to ascendingSubsequence.
The minElem variable was also misnamed. It's actually the current maximum 
element as it's taken from the end of the list.
Added comment to get() to make it clear it's returning the lowest timestamp.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2948


commit c8d5f4646e412917495f720102774e450ae06265
Author: mihbor 
Date:   2017-04-30T18:36:08Z

KAFKA-5144 rename variables and add comments to make it clear what's going 
on

The descendingSubsequence is a misnomer. The linked list is actually 
arranged so that the lowest timestamp is first and larger timestamps are added 
to the end, therefore renamed to ascendingSubsequence.
The minElem variable was also misnamed. It's actually the current maximum 
element as it's taken from the end of the list.
Added comment to get() to make it clear it's returning the lowest timestamp.




> MinTimestampTracker does not correctly add timestamps lower than the current 
> max
> 
>
> Key: KAFKA-5144
> URL: https://issues.apache.org/jira/browse/KAFKA-5144
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Michal Borowiecki
>Assignee: Michal Borowiecki
>
> When adding elements MinTimestampTracker removes all existing elements 
> greater than the added element.
> Perhaps I've missed something and this is intended behaviour but I can't find 
> any evidence for that in comments or tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2948: KAFKA-5144 renamed and added comments to make it c...

2017-04-30 Thread mihbor
GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/2948

KAFKA-5144 renamed and added comments to make it clear what's going on

The descendingSubsequence is a misnomer. The linked list is actually 
arranged so that the lowest timestamp is first and larger timestamps are added 
to the end, therefore renamed to ascendingSubsequence.
The minElem variable was also misnamed. It's actually the current maximum 
element as it's taken from the end of the list.
Added comment to get() to make it clear it's returning the lowest timestamp.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-4

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2948.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2948


commit c8d5f4646e412917495f720102774e450ae06265
Author: mihbor 
Date:   2017-04-30T18:36:08Z

KAFKA-5144 rename variables and add comments to make it clear what's going 
on

The descendingSubsequence is a misnomer. The linked list is actually 
arranged so that the lowest timestamp is first and larger timestamps are added 
to the end, therefore renamed to ascendingSubsequence.
The minElem variable was also misnamed. It's actually the current maximum 
element as it's taken from the end of the list.
Added comment to get() to make it clear it's returning the lowest timestamp.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-5144) MinTimestampTracker does not correctly add timestamps lower than the current max

2017-04-30 Thread Michal Borowiecki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990329#comment-15990329
 ] 

Michal Borowiecki commented on KAFKA-5144:
--

[PR|https://github.com/apache/kafka/pull/2947/commits/a8b223b92c0aec31498e0d6043c8deffb1ea21ef]
 contains the cases that are failing, which I think are valid.
If someone can please confirm I'm not talking nonsense here, I'll proceed with 
a fix.

> MinTimestampTracker does not correctly add timestamps lower than the current 
> max
> 
>
> Key: KAFKA-5144
> URL: https://issues.apache.org/jira/browse/KAFKA-5144
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Michal Borowiecki
>Assignee: Michal Borowiecki
>
> When adding elements MinTimestampTracker removes all existing elements 
> greater than the added element.
> Perhaps I've missed something and this is intended behaviour but I can't find 
> any evidence for that in comments or tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-5144) MinTimestampTracker does not correctly add timestamps lower than the current max

2017-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990328#comment-15990328
 ] 

ASF GitHub Bot commented on KAFKA-5144:
---

GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/2947

KAFKA-5144 added 2 test cases

These two newly added test cases are currently failing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2947


commit a8b223b92c0aec31498e0d6043c8deffb1ea21ef
Author: mihbor 
Date:   2017-04-30T18:24:38Z

KAFKA-5144 added 2 test cases




> MinTimestampTracker does not correctly add timestamps lower than the current 
> max
> 
>
> Key: KAFKA-5144
> URL: https://issues.apache.org/jira/browse/KAFKA-5144
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: Michal Borowiecki
>Assignee: Michal Borowiecki
>
> When adding elements MinTimestampTracker removes all existing elements 
> greater than the added element.
> Perhaps I've missed something and this is intended behaviour but I can't find 
> any evidence for that in comments or tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2947: KAFKA-5144 added 2 test cases

2017-04-30 Thread mihbor
GitHub user mihbor opened a pull request:

https://github.com/apache/kafka/pull/2947

KAFKA-5144 added 2 test cases

These two newly added test cases are currently failing

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mihbor/kafka patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2947


commit a8b223b92c0aec31498e0d6043c8deffb1ea21ef
Author: mihbor 
Date:   2017-04-30T18:24:38Z

KAFKA-5144 added 2 test cases




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-5144) MinTimestampTracker does not correctly add timestamps lower than the current max

2017-04-30 Thread Michal Borowiecki (JIRA)
Michal Borowiecki created KAFKA-5144:


 Summary: MinTimestampTracker does not correctly add timestamps 
lower than the current max
 Key: KAFKA-5144
 URL: https://issues.apache.org/jira/browse/KAFKA-5144
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.1
Reporter: Michal Borowiecki
Assignee: Michal Borowiecki


When adding elements MinTimestampTracker removes all existing elements greater 
than the added element.
Perhaps I've missed something and this is intended behaviour but I can't find 
any evidence for that in comments or tests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4218) Enable access to key in ValueTransformer

2017-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990294#comment-15990294
 ] 

ASF GitHub Bot commented on KAFKA-4218:
---

GitHub user jeyhunkarimov opened a pull request:

https://github.com/apache/kafka/pull/2946

KAFKA-4218: Enable access to key in ValueTransformer

This PR groups KAFKA-4218,  KAFKA-4726 and KAFKA-3745.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeyhunkarimov/kafka KAFKA-4218

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2946.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2946


commit f535f8ea126a34e2854ed45e9131324099c7aa71
Author: Jeyhun Karimov 
Date:   2017-04-30T16:11:57Z

KAFKA-4218

bug fix in KTableImplTest




> Enable access to key in ValueTransformer
> 
>
> Key: KAFKA-4218
> URL: https://issues.apache.org/jira/browse/KAFKA-4218
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.1
>Reporter: Elias Levy
>Assignee: Jeyhun Karimov
>  Labels: api, needs-kip, newbie
>
> While transforming values via {{KStream.transformValues}} and 
> {{ValueTransformer}}, the key associated with the value may be needed, even 
> if it is not changed.  For instance, it may be used to access stores.  
> As of now, the key is not available within these methods and interfaces, 
> leading to the use of {{KStream.transform}} and {{Transformer}}, and the 
> unnecessary creation of new {{KeyValue}} objects.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2946: KAFKA-4218: Enable access to key in ValueTransform...

2017-04-30 Thread jeyhunkarimov
GitHub user jeyhunkarimov opened a pull request:

https://github.com/apache/kafka/pull/2946

KAFKA-4218: Enable access to key in ValueTransformer

This PR groups KAFKA-4218,  KAFKA-4726 and KAFKA-3745.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeyhunkarimov/kafka KAFKA-4218

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2946.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2946


commit f535f8ea126a34e2854ed45e9131324099c7aa71
Author: Jeyhun Karimov 
Date:   2017-04-30T16:11:57Z

KAFKA-4218

bug fix in KTableImplTest




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3514) Stream timestamp computation needs some further thoughts

2017-04-30 Thread Michal Borowiecki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990288#comment-15990288
 ] 

Michal Borowiecki commented on KAFKA-3514:
--

Agreed. That's what I meant. Beyond the scope of KIP-138 and let's keep it that 
way.

> Stream timestamp computation needs some further thoughts
> 
>
> Key: KAFKA-3514
> URL: https://issues.apache.org/jira/browse/KAFKA-3514
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: architecture
> Fix For: 0.11.0.0
>
>
> Our current stream task's timestamp is used for punctuate function as well as 
> selecting which stream to process next (i.e. best effort stream 
> synchronization). And it is defined as the smallest timestamp over all 
> partitions in the task's partition group. This results in two unintuitive 
> corner cases:
> 1) observing a late arrived record would keep that stream's timestamp low for 
> a period of time, and hence keep being process until that late record. For 
> example take two partitions within the same task annotated by their 
> timestamps:
> {code}
> Stream A: 5, 6, 7, 8, 9, 1, 10
> {code}
> {code}
> Stream B: 2, 3, 4, 5
> {code}
> The late arrived record with timestamp "1" will cause stream A to be selected 
> continuously in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9 
> until the record itself is dequeued and processed, then stream B will be 
> selected starting with timestamp 2.
> 2) an empty buffered partition will cause its timestamp to be not advanced, 
> and hence the task timestamp as well since it is the smallest among all 
> partitions. This may not be a severe problem compared with 1) above though.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3514) Stream timestamp computation needs some further thoughts

2017-04-30 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990281#comment-15990281
 ] 

Matthias J. Sax commented on KAFKA-3514:


Your observation is correct -- and this ticket is exactly about the issue you 
describe. There are two problems with the current design (1) it's hard to 
reason about processing order and (2) there is some non-determinism. But I 
think it does not affect KIP-138 (I agree that it's somehow connected). KIP-138 
is about when/how to punctuate and the punctuation strategy should be agnostic 
to the way "stream time" gets advanced.

> Stream timestamp computation needs some further thoughts
> 
>
> Key: KAFKA-3514
> URL: https://issues.apache.org/jira/browse/KAFKA-3514
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: architecture
> Fix For: 0.11.0.0
>
>
> Our current stream task's timestamp is used for punctuate function as well as 
> selecting which stream to process next (i.e. best effort stream 
> synchronization). And it is defined as the smallest timestamp over all 
> partitions in the task's partition group. This results in two unintuitive 
> corner cases:
> 1) observing a late arrived record would keep that stream's timestamp low for 
> a period of time, and hence keep being process until that late record. For 
> example take two partitions within the same task annotated by their 
> timestamps:
> {code}
> Stream A: 5, 6, 7, 8, 9, 1, 10
> {code}
> {code}
> Stream B: 2, 3, 4, 5
> {code}
> The late arrived record with timestamp "1" will cause stream A to be selected 
> continuously in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9 
> until the record itself is dequeued and processed, then stream B will be 
> selected starting with timestamp 2.
> 2) an empty buffered partition will cause its timestamp to be not advanced, 
> and hence the task timestamp as well since it is the smallest among all 
> partitions. This may not be a severe problem compared with 1) above though.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4923) Add Exactly-Once Semantics to Streams

2017-04-30 Thread Matthias J. Sax (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-4923:
---
Status: Patch Available  (was: In Progress)

> Add Exactly-Once Semantics to Streams
> -
>
> Key: KAFKA-4923
> URL: https://issues.apache.org/jira/browse/KAFKA-4923
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>  Labels: kip
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2945: KAFKA-4923: Add Exactly-Once Semantics to Streams

2017-04-30 Thread mjsax
GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2945

KAFKA-4923: Add Exactly-Once Semantics to Streams



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka kafka-4923-add-eos-to-streams

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2945.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2945


commit f2161c80df79573b1cd045e2947a245ed1e1e1aa
Author: Matthias J. Sax 
Date:   2017-04-05T01:07:14Z

code cleanup

commit abdd6e3c43d727bb4c17f28f69e2677eb8b9be73
Author: Matthias J. Sax 
Date:   2017-03-21T19:12:25Z

KAFKA-4923: Add Exactly-Once Semantics to Streams




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4923) Add Exactly-Once Semantics to Streams

2017-04-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990178#comment-15990178
 ] 

ASF GitHub Bot commented on KAFKA-4923:
---

GitHub user mjsax opened a pull request:

https://github.com/apache/kafka/pull/2945

KAFKA-4923: Add Exactly-Once Semantics to Streams



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mjsax/kafka kafka-4923-add-eos-to-streams

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2945.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2945


commit f2161c80df79573b1cd045e2947a245ed1e1e1aa
Author: Matthias J. Sax 
Date:   2017-04-05T01:07:14Z

code cleanup

commit abdd6e3c43d727bb4c17f28f69e2677eb8b9be73
Author: Matthias J. Sax 
Date:   2017-03-21T19:12:25Z

KAFKA-4923: Add Exactly-Once Semantics to Streams




> Add Exactly-Once Semantics to Streams
> -
>
> Key: KAFKA-4923
> URL: https://issues.apache.org/jira/browse/KAFKA-4923
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Matthias J. Sax
>  Labels: kip
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-3514) Stream timestamp computation needs some further thoughts

2017-04-30 Thread Michal Borowiecki (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990176#comment-15990176
 ] 

Michal Borowiecki commented on KAFKA-3514:
--

I think the description of this ticket is missing an important detail.
If my understanding is correct, it will behave as described if all the records 
arrive in a single batch.
However, if the records preceding the record with timestamp "1" come in a 
separate batch (I'll use brackets to depict batch boundaries):
{code}
Stream A: [5, 6, 7, 8, 9], [1, 10]

Stream B: [2, 3, 4, 5]
{code}
then initially the timestamp for stream A is going to be set to "5" (minimum of 
the first batch) and since it's not allowed to move back, the second batch 
containing the late arriving record "1" is not going to change that. Stream B 
is going to be drained first until "5".
However, if the batch boundaries are different by just one record and the late 
arriving "1" is in the first batch:
{code}
Stream A: [5, 6, 7, 8, 9, 1], [10]

Stream B: [2, 3, 4, 5]
{code}
 then it's going to behave as currently described.

Please correct me if I got this wrong.
But if that is the case, it feels all too non-deterministic and I think the 
timestamp computation deserves further thought beyond the scope of 
[KIP-138|https://cwiki.apache.org/confluence/display/KAFKA/KIP-138%3A+Change+punctuate+semantics],
 which is limited to punctuate semantics, but not stream time semantics in 
general.

> Stream timestamp computation needs some further thoughts
> 
>
> Key: KAFKA-3514
> URL: https://issues.apache.org/jira/browse/KAFKA-3514
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Eno Thereska
>  Labels: architecture
> Fix For: 0.11.0.0
>
>
> Our current stream task's timestamp is used for punctuate function as well as 
> selecting which stream to process next (i.e. best effort stream 
> synchronization). And it is defined as the smallest timestamp over all 
> partitions in the task's partition group. This results in two unintuitive 
> corner cases:
> 1) observing a late arrived record would keep that stream's timestamp low for 
> a period of time, and hence keep being process until that late record. For 
> example take two partitions within the same task annotated by their 
> timestamps:
> {code}
> Stream A: 5, 6, 7, 8, 9, 1, 10
> {code}
> {code}
> Stream B: 2, 3, 4, 5
> {code}
> The late arrived record with timestamp "1" will cause stream A to be selected 
> continuously in the thread loop, i.e. messages with timestamp 5, 6, 7, 8, 9 
> until the record itself is dequeued and processed, then stream B will be 
> selected starting with timestamp 2.
> 2) an empty buffered partition will cause its timestamp to be not advanced, 
> and hence the task timestamp as well since it is the smallest among all 
> partitions. This may not be a severe problem compared with 1) above though.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Request to add to the contributor list

2017-04-30 Thread Amit Daga
Hello Team,

Hope you are doing well.

My name is Amit Daga. This is to request you to add me to the contributor
list. Also if KAFKA-4996 issue is still not assigned, I would like to work
on it.

Thanks,
Amit