[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-06 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365689#comment-15365689
 ] 

James Cheng commented on KAFKA-3894:


#4 is a good point. By looking at the buffer size, the broker can calculate how 
large a segment it can handle, and can thus make sure to only generate segments 
that it can handle.

The comment about having a large segment that you are unable to process made me 
think about the long discussion that happened in 
https://issues.apache.org/jira/browse/KAFKA-3810. In that JIRA, a large message 
in the __consumer_offsets topic would block (internal) consumers who had too 
small of a fetch size.

The solution that was chosen and was implemented was to loosen the fetch size 
for fetches from internal topics. Internal topics would always return at least 
one message, even if the message was larger than the fetch size.

It made me wonder if it might make sense to treat the dedupe buffer in a 
similar way. In a steady state, the configured dedupe buffer size would be used 
but if it's too small to even fit a single segment, then the dedupe buffer 
would be (temporarily) grown to allow cleaning of that large segment.

CC [~junrao]


> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [VOTE] KIP-67: Queryable state for Kafka Streams

2016-07-06 Thread Henry Cai
It wasn't quite clear to me how the user program interacts with the
discovery API, especially on the user supplied listener part, how does the
user program supply that listener to KafkaStreams and how does KafkaStreams
know which port the user listener is running, maybe a more complete
end-to-end example including the steps on registering the user listener and
whether the user listener needs to be involved with task reassignment.


On Wed, Jul 6, 2016 at 9:13 PM, Guozhang Wang  wrote:

> +1
>
> On Wed, Jul 6, 2016 at 12:44 PM, Damian Guy  wrote:
>
> > Hi all,
> >
> > I'd like to initiate the voting process for KIP-67
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
> > >
> >
> > KAFKA-3909  is the top
> > level JIRA for this effort.
> >
> > Initial PRs for Step 1 of the process are:
> > Expose State Store Names  and
> > Query Local State Stores 
> >
> > Thanks,
> > Damian
> >
>
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-67: Queryable state for Kafka Streams

2016-07-06 Thread Guozhang Wang
+1

On Wed, Jul 6, 2016 at 12:44 PM, Damian Guy  wrote:

> Hi all,
>
> I'd like to initiate the voting process for KIP-67
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
> >
>
> KAFKA-3909  is the top
> level JIRA for this effort.
>
> Initial PRs for Step 1 of the process are:
> Expose State Store Names  and
> Query Local State Stores 
>
> Thanks,
> Damian
>



-- 
-- Guozhang


[jira] [Updated] (KAFKA-3931) kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure

2016-07-06 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-3931:
---
Status: Patch Available  (was: In Progress)

> kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure
> ---
>
> Key: KAFKA-3931
> URL: https://issues.apache.org/jira/browse/KAFKA-3931
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> Some of the recent builds are failing this test 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4579/testReport/kafka.api/PlaintextConsumerTest/testPatternUnsubscription/]).
> Some other are failing on 
> kafka.api.PlaintextConsumerTest.testPatternSubscription 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4565/testReport/junit/kafka.api/PlaintextConsumerTest/testPatternSubscription/])
> These failures seem to have started after [this 
> commit|https://github.com/apache/kafka/commit/d7de59a579af5ba4ecb1aec8fed84054f8b86443].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Error after restarting kafka broker

2016-07-06 Thread Subhash Agrawal
Hi All,
I am running kafka broker (0.10.0 version) in standalone mode (Single kafka 
broker in the cluster).

After couple of restart, I see this error during embedded kafka startup, even 
though it does not seem to be causing any problem.
Is there any way I can avoid this error?

Thanks
Subhash Agrawal

kafka.common.NoReplicaOnlineException: No replica for partition 
[__consumer_offsets,2] is alive. Live brokers are: [Set()], Assigned replicas 
are: [List(0)]
at 
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
at 
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
at 
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at 
scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at 
kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
at 
kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
at 
kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335)
at 
kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166)
at 
kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
at 
kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:50)
at 
kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:48)
at 
kafka.server.ZookeeperLeaderElector$$anonfun$startup$1.apply(ZookeeperLeaderElector.scala:48)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at 
kafka.server.ZookeeperLeaderElector.startup(ZookeeperLeaderElector.scala:48)
at 
kafka.controller.KafkaController$$anonfun$startup$1.apply$mcV$sp(KafkaController.scala:684)
at 
kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:680)
at 
kafka.controller.KafkaController$$anonfun$startup$1.apply(KafkaController.scala:680)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
at 
kafka.controller.KafkaController.startup(KafkaController.scala:680)
at kafka.server.KafkaServer.startup(KafkaServer.scala:200)


[jira] [Comment Edited] (KAFKA-3705) Support non-key joining in KTable

2016-07-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365365#comment-15365365
 ] 

Guozhang Wang edited comment on KAFKA-3705 at 7/7/16 12:09 AM:
---

[~jfilipiak] I am convinced that this combo-key is necessary to avoid out of 
ordering after talking with you offline. And I have updated the design proposal 
wiki accordingly: 
https://cwiki.apache.org/confluence/display/KAFKA/Discussion%3A+Non-key+KTable-KTable+Joins,
 feel free to take a look.

Just a random thought as for your use case specifically: are relation A, B, and 
C all need to be captured as a KTable (i.e. the records as binlog / etc from 
some database table)? If the one with foreign key can be captured just a stream 
(i.e. KStream), then what you can do is to re-model your computation as 
{{(stream Join table1) Join table2}}, where {{stream Join table returns a 
stream}}. And in Kafka Streams DSL you can just do 
{{stream.selectKey(table1.key).join(table1).selectKey(table2.key).join(table2)}}.


was (Author: guozhang):
[~jfilipiak] I am convinced that this combo-key is necessary to avoid out of 
ordering after talking with you offline. And I have updated the design proposal 
wiki accordingly: 
https://cwiki.apache.org/confluence/display/KAFKA/Discussion%3A+Non-key+KTable-KTable+Joins,
 feel free to take a look.

Just a random thought as for your use case specifically: are relation A, B, and 
C all need to be captured as a KTable (i.e. the records as binlog / etc from 
some database table)? If the one with foreign key can be captured just a stream 
(i.e. KStream), then what you can do is to re-model your computation as 
{{(stream Join table1) Join table2}}, where {stream Join table returns a 
stream}. And in Kafka Streams DSL you can just do 
{{stream.selectKey(table1.key).join(table1).selectKey(table2.key).join(table2)}}.

> Support non-key joining in KTable
> -
>
> Key: KAFKA-3705
> URL: https://issues.apache.org/jira/browse/KAFKA-3705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Liquan Pei
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Today in Kafka Streams DSL, KTable joins are only based on keys. If users 
> want to join a KTable A by key {{a}} with another KTable B by key {{b}} but 
> with a "foreign key" {{a}}, and assuming they are read from two topics which 
> are partitioned on {{a}} and {{b}} respectively, they need to do the 
> following pattern:
> {code}
> tableB' = tableB.groupBy(/* select on field "a" */).agg(...); // now tableB' 
> is partitioned on "a"
> tableA.join(tableB', joiner);
> {code}
> Even if these two tables are read from two topics which are already 
> partitioned on {{a}}, users still need to do the pre-aggregation in order to 
> make the two joining streams to be on the same key. This is a draw-back from 
> programability and we should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3705) Support non-key joining in KTable

2016-07-06 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365365#comment-15365365
 ] 

Guozhang Wang commented on KAFKA-3705:
--

[~jfilipiak] I am convinced that this combo-key is necessary to avoid out of 
ordering after talking with you offline. And I have updated the design proposal 
wiki accordingly: 
https://cwiki.apache.org/confluence/display/KAFKA/Discussion%3A+Non-key+KTable-KTable+Joins,
 feel free to take a look.

Just a random thought as for your use case specifically: are relation A, B, and 
C all need to be captured as a KTable (i.e. the records as binlog / etc from 
some database table)? If the one with foreign key can be captured just a stream 
(i.e. KStream), then what you can do is to re-model your computation as 
{{(stream Join table1) Join table2}}, where {stream Join table returns a 
stream}. And in Kafka Streams DSL you can just do 
{{stream.selectKey(table1.key).join(table1).selectKey(table2.key).join(table2)}}.

> Support non-key joining in KTable
> -
>
> Key: KAFKA-3705
> URL: https://issues.apache.org/jira/browse/KAFKA-3705
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Liquan Pei
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Today in Kafka Streams DSL, KTable joins are only based on keys. If users 
> want to join a KTable A by key {{a}} with another KTable B by key {{b}} but 
> with a "foreign key" {{a}}, and assuming they are read from two topics which 
> are partitioned on {{a}} and {{b}} respectively, they need to do the 
> following pattern:
> {code}
> tableB' = tableB.groupBy(/* select on field "a" */).agg(...); // now tableB' 
> is partitioned on "a"
> tableA.join(tableB', joiner);
> {code}
> Even if these two tables are read from two topics which are already 
> partitioned on {{a}}, users still need to do the pre-aggregation in order to 
> make the two joining streams to be on the same key. This is a draw-back from 
> programability and we should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-1543) Changing replication factor

2016-07-06 Thread Richard Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365357#comment-15365357
 ] 

Richard Lee commented on KAFKA-1543:


This jira item seems orphaned.  Anyway, I took another stab at this in the 
0.10.1.0 branch.  See https://github.com/apache/kafka/pull/1596

I decided that kafka-topics.sh might not be the best place to change 
replication factor, since it can have a pretty heavy impact on partition 
assignment in the cluster.  So, left it in kafka-reassign-partitions.sh, but 
made it a per-topic config rather than a global command line argument, so that 
each topic can specify its own replication factor.

Also, as kafka-reassign-partitions.sh is intended to operate on a running 
cluster with --verify feedback of progress, it seems more likely to sidestep 
any issues that would require cluster restart.

> Changing replication factor
> ---
>
> Key: KAFKA-1543
> URL: https://issues.apache.org/jira/browse/KAFKA-1543
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Alexey Ozeritskiy
>Assignee: Alexander Pakulov
> Attachments: can-change-replication.patch
>
>
> It is difficult to change replication factor by manual editing json config.
> I propose to add a key to kafka-reassign-partitions.sh command to 
> automatically create json config.
> Example of usage
> {code}
> kafka-reassign-partitions.sh --zookeeper zk --replicas new-replication-factor 
> --topics-to-move-json-file topics-file --broker-list 1,2,3,4 --generate > 
> output
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1596: Change replication factor during partition map gen...

2016-07-06 Thread llamahunter
GitHub user llamahunter opened a pull request:

https://github.com/apache/kafka/pull/1596

Change replication factor during partition map generation

If the topic-to-move-json-file contains a new replication-factor for
a topic, it is used when assigning partitions to brokers.  If missing,
the existing replication-factor of the topic is maintained.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TiVo/kafka re-replicate-topic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1596.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1596


commit 7dfe0884253599c460a2f381d5c729cc4c1dec70
Author: Richard Lee 
Date:   2016-07-06T23:20:45Z

Change replication factor during partition map generation
If the topic-to-move-json-file contains a new replication-factor for
a topic, it is used when assigning partitions to brokers.  If missing,
the existing replication-factor of the topic is maintained.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1595: MINOR: Typo fix in comments

2016-07-06 Thread naferx
GitHub user naferx opened a pull request:

https://github.com/apache/kafka/pull/1595

MINOR: Typo fix in comments



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/naferx/kafka minor-typo

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1595.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1595


commit 58c80e28ffb2971706f91c809e1d8f4be9a7a3e8
Author: Nafer Sanabria 
Date:   2016-07-06T23:13:35Z

MINOR: Typo fix in comments




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-06 Thread Tim Carey-Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365233#comment-15365233
 ] 

Tim Carey-Smith commented on KAFKA-3894:


Woohoo, more metrics is so excellent!

Regarding the issue I am reporting: it is somewhat broader than the specific 
issues related to the log cleaner which have been resolved across the lifetime 
of Kafka.

* Compaction thread dies when hitting compressed and unkeyed messages 
(https://github.com/apache/kafka/commit/1cd6ed9e2c07a63474ed80a8224bd431d5d4243c#diff-d7330411812d23e8a34889bee42fedfe)
 noted in KAFKA-1755
* Logcleaner fails due to incorrect offset map computation on a replica in 
KAFKA-3587

Unfortunately, there is a deeper issue: if these threads die, bad things happen.

KAFKA-3587 was a great step forward, now this exception will only occur if a 
single segment is unable to fit within the dedupe buffer. Unfortunately, in 
pathological cases the thread could still die. 

Compacted topics are built to rely on the log cleaner thread and because of 
this, any segments which are written must be compatible with the configuration 
for log cleaner threads. 
As I mentioned before, we are now monitoring the log cleaner threads and as a 
result do not have long periods where a broker is in a dangerous and degraded 
state. 
One situation which comes to mind is from a talk at Kafka Summit where the 
thread was offline for a large period of time. Upon restart, the 
{{__consumer_offsets}} topic took 17 minutes to load. 
http://www.slideshare.net/jjkoshy/kafkaesque-days-at-linked-in-in-2015/49

After talking with Tom, we came up with a few solutions which could help in 
resolving this issue. 

1) The monitoring suggested in KAFKA-3857 is a great start and would most 
definitely help with determining the state of the log cleaner.
2) After the change in KAFKA-3587, it could be possible to simply leave 
segments which are too large and leave them as zombie segments which will never 
be cleaned. This is less than ideal, but means that a single large segment 
would not take down the whole log cleaner subsystem. 
3) Upon encountering a large segment, we considered the possibility of 
splitting the segment to allow the log cleaner to continue. This would 
potentially delay some cleanup until a later time. 
4) Currently, it seems like the write path allows for segments to be created 
which are unable to be processed by the log cleaner. Would it make sense to 
include log cleaner heuristics when determining segment size for compacted 
topics? This would allow the log cleaner to always process a segment, unless 
the buffer size was changed. 

We'd love to help in any way we can. 

> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Parallelisation factor in kafka streams

2016-07-06 Thread Jeyhun Karimov
I meant restarting the underlying KAFKA on which Kafka Streams library is
running. So, with restarting Streams application (and not the underlying
KAFKA) one can change the number of threads within the application.

On Wed, Jul 6, 2016 at 11:22 PM Matthias J. Sax 
wrote:

> Jeyhun,
>
> you cannot change the number of threads within an application instance,
> but you can start new application instances. Internal Kafka Consumer
> re-balance with re-assign the partitions over all running application
> instances.
>
> Not sure what you mean by "restart the cluster"? For sure, you do not
> need to restart the Kafka Brokers. And for a Streams application, there
> is no cluster. Streams applications are regular Java applications and
> can run anywhere (not necessarily on the same machines as Kafka Brokers).
>
> -Matthias
>
> On 07/06/2016 10:33 PM, Jeyhun Karimov wrote:
> > Thank you for your answer Matthias.
> > Is it possible to change the parallelism in runtime? Or do we have to
> > restart the cluster?
> >
> >
> > On Wed, 6 Jul 2016 at 19:08, Matthias J. Sax 
> wrote:
> >
> >> Hi Jeyhun,
> >>
> >> the number of partitions determine the number of tasks within a Kafka
> >> Streams application and thus, the maximum number of parallelism for your
> >> application.
> >>
> >> For more details see
> >>
> http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model
> >>
> >> You can set the number of threads for a single application instance, via
> >> parameter "num.stream.threads" (default value is 1).
> >>
> >> See
> >>
> >>
> http://docs.confluent.io/3.0.0/streams/developer-guide.html#optional-configuration-parameters
> >>
> >>
> >> -Matthias
> >>
> >> On 07/06/2016 06:11 PM, Jeyhun Karimov wrote:
> >>> Hi community,
> >>>
> >>> How can I set parallelisation factor in kafka streams? Is it related
> with
> >>> the number of partitions within topics?
> >>>
> >>>
> >>> Cheers
> >>> Jeyhun
> >>>
> >>
> >> --
> > -Cheers
> >
> > Jeyhun
> >
>
> --
-Cheers

Jeyhun


[GitHub] kafka pull request #1594: KAFKA-3931: WIP

2016-07-06 Thread vahidhashemian
GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1594

KAFKA-3931: WIP



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3931

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1594


commit 2b5255391eb3805ae8adaf18960e3cc0c1ec20f0
Author: Vahid Hashemian 
Date:   2016-07-06T21:24:08Z

KAFKA-3931: WIP




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3931) kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365141#comment-15365141
 ] 

ASF GitHub Bot commented on KAFKA-3931:
---

GitHub user vahidhashemian opened a pull request:

https://github.com/apache/kafka/pull/1594

KAFKA-3931: WIP



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3931

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1594.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1594


commit 2b5255391eb3805ae8adaf18960e3cc0c1ec20f0
Author: Vahid Hashemian 
Date:   2016-07-06T21:24:08Z

KAFKA-3931: WIP




> kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure
> ---
>
> Key: KAFKA-3931
> URL: https://issues.apache.org/jira/browse/KAFKA-3931
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> Some of the recent builds are failing this test 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4579/testReport/kafka.api/PlaintextConsumerTest/testPatternUnsubscription/]).
> Some other are failing on 
> kafka.api.PlaintextConsumerTest.testPatternSubscription 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4565/testReport/junit/kafka.api/PlaintextConsumerTest/testPatternSubscription/])
> These failures seem to have started after [this 
> commit|https://github.com/apache/kafka/commit/d7de59a579af5ba4ecb1aec8fed84054f8b86443].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Parallelisation factor in kafka streams

2016-07-06 Thread Matthias J. Sax
Jeyhun,

you cannot change the number of threads within an application instance,
but you can start new application instances. Internal Kafka Consumer
re-balance with re-assign the partitions over all running application
instances.

Not sure what you mean by "restart the cluster"? For sure, you do not
need to restart the Kafka Brokers. And for a Streams application, there
is no cluster. Streams applications are regular Java applications and
can run anywhere (not necessarily on the same machines as Kafka Brokers).

-Matthias

On 07/06/2016 10:33 PM, Jeyhun Karimov wrote:
> Thank you for your answer Matthias.
> Is it possible to change the parallelism in runtime? Or do we have to
> restart the cluster?
> 
> 
> On Wed, 6 Jul 2016 at 19:08, Matthias J. Sax  wrote:
> 
>> Hi Jeyhun,
>>
>> the number of partitions determine the number of tasks within a Kafka
>> Streams application and thus, the maximum number of parallelism for your
>> application.
>>
>> For more details see
>> http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model
>>
>> You can set the number of threads for a single application instance, via
>> parameter "num.stream.threads" (default value is 1).
>>
>> See
>>
>> http://docs.confluent.io/3.0.0/streams/developer-guide.html#optional-configuration-parameters
>>
>>
>> -Matthias
>>
>> On 07/06/2016 06:11 PM, Jeyhun Karimov wrote:
>>> Hi community,
>>>
>>> How can I set parallelisation factor in kafka streams? Is it related with
>>> the number of partitions within topics?
>>>
>>>
>>> Cheers
>>> Jeyhun
>>>
>>
>> --
> -Cheers
> 
> Jeyhun
> 



signature.asc
Description: OpenPGP digital signature


Build failed in Jenkins: kafka-trunk-jdk8 #738

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3836: KStreamReduce and KTableReduce should not pass nulls to

[wangguoz] KAFKA-3926: Fix transient test failure in RegexSourceIntegrationTest

--
[...truncated 3394 lines...]

kafka.coordinator.GroupCoordinatorResponseTest > 
testJoinGroupFromUnchangedFollowerDoesNotRebalance PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testValidJoinGroup STARTED

kafka.coordinator.GroupCoordinatorResponseTest > testValidJoinGroup PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testSyncGroupLeaderAfterFollower STARTED

kafka.coordinator.GroupCoordinatorResponseTest > 
testSyncGroupLeaderAfterFollower PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testSyncGroupFromUnknownMember 
STARTED

kafka.coordinator.GroupCoordinatorResponseTest > testSyncGroupFromUnknownMember 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testValidLeaveGroup STARTED

kafka.coordinator.GroupCoordinatorResponseTest > testValidLeaveGroup PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testDescribeGroupInactiveGroup 
STARTED

kafka.coordinator.GroupCoordinatorResponseTest > testDescribeGroupInactiveGroup 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testSyncGroupNotCoordinator 
STARTED

kafka.coordinator.GroupCoordinatorResponseTest > testSyncGroupNotCoordinator 
PASSED

kafka.coordinator.GroupCoordinatorResponseTest > 
testHeartbeatUnknownConsumerExistingGroup STARTED

kafka.coordinator.GroupCoordinatorResponseTest > 
testHeartbeatUnknownConsumerExistingGroup PASSED

kafka.coordinator.GroupCoordinatorResponseTest > testValidHeartbeat STARTED

kafka.coordinator.GroupCoordinatorResponseTest > testValidHeartbeat PASSED

kafka.coordinator.GroupMetadataTest > testDeadToAwaitingSyncIllegalTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testDeadToAwaitingSyncIllegalTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testOffsetCommitFailure STARTED

kafka.coordinator.GroupMetadataTest > testOffsetCommitFailure PASSED

kafka.coordinator.GroupMetadataTest > 
testPreparingRebalanceToStableIllegalTransition STARTED

kafka.coordinator.GroupMetadataTest > 
testPreparingRebalanceToStableIllegalTransition PASSED

kafka.coordinator.GroupMetadataTest > testStableToDeadTransition STARTED

kafka.coordinator.GroupMetadataTest > testStableToDeadTransition PASSED

kafka.coordinator.GroupMetadataTest > testInitNextGenerationEmptyGroup STARTED

kafka.coordinator.GroupMetadataTest > testInitNextGenerationEmptyGroup PASSED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenDead STARTED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenDead PASSED

kafka.coordinator.GroupMetadataTest > testInitNextGeneration STARTED

kafka.coordinator.GroupMetadataTest > testInitNextGeneration PASSED

kafka.coordinator.GroupMetadataTest > testPreparingRebalanceToEmptyTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testPreparingRebalanceToEmptyTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testSelectProtocol STARTED

kafka.coordinator.GroupMetadataTest > testSelectProtocol PASSED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenPreparingRebalance 
STARTED

kafka.coordinator.GroupMetadataTest > testCannotRebalanceWhenPreparingRebalance 
PASSED

kafka.coordinator.GroupMetadataTest > 
testDeadToPreparingRebalanceIllegalTransition STARTED

kafka.coordinator.GroupMetadataTest > 
testDeadToPreparingRebalanceIllegalTransition PASSED

kafka.coordinator.GroupMetadataTest > testCanRebalanceWhenAwaitingSync STARTED

kafka.coordinator.GroupMetadataTest > testCanRebalanceWhenAwaitingSync PASSED

kafka.coordinator.GroupMetadataTest > 
testAwaitingSyncToPreparingRebalanceTransition STARTED

kafka.coordinator.GroupMetadataTest > 
testAwaitingSyncToPreparingRebalanceTransition PASSED

kafka.coordinator.GroupMetadataTest > testStableToAwaitingSyncIllegalTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testStableToAwaitingSyncIllegalTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testEmptyToDeadTransition STARTED

kafka.coordinator.GroupMetadataTest > testEmptyToDeadTransition PASSED

kafka.coordinator.GroupMetadataTest > testSelectProtocolRaisesIfNoMembers 
STARTED

kafka.coordinator.GroupMetadataTest > testSelectProtocolRaisesIfNoMembers PASSED

kafka.coordinator.GroupMetadataTest > testStableToPreparingRebalanceTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testStableToPreparingRebalanceTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testPreparingRebalanceToDeadTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testPreparingRebalanceToDeadTransition 
PASSED

kafka.coordinator.GroupMetadataTest > testStableToStableIllegalTransition 
STARTED

kafka.coordinator.GroupMetadataTest > testStableToStableIllegalTransition PASSED

kafka.coordinator.GroupMetadataTest > testAwaitingSyncToStableTransition STAR

Build failed in Jenkins: kafka-0.10.0-jdk7 #144

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3836: KStreamReduce and KTableReduce should not pass nulls to

--
[...truncated 1674 lines...]

kafka.api.PlaintextConsumerTest > testPartitionPauseAndResume PASSED

kafka.api.PlaintextConsumerTest > testConsumeMessagesWithLogAppendTime PASSED

kafka.api.PlaintextConsumerTest > testAutoCommitOnCloseAfterWakeup PASSED

kafka.api.PlaintextConsumerTest > testMaxPollRecords PASSED

kafka.api.PlaintextConsumerTest > testAutoOffsetReset PASSED

kafka.api.PlaintextConsumerTest > testFetchInvalidOffset PASSED

kafka.api.PlaintextConsumerTest > testAutoCommitIntercept PASSED

kafka.api.PlaintextConsumerTest > testCommitMetadata PASSED

kafka.api.PlaintextConsumerTest > testRoundRobinAssignment PASSED

kafka.api.PlaintextConsumerTest > testPatternSubscription FAILED
java.lang.AssertionError: Expected partitions [topic-0, topic-1, 
tblablac-0, tblablac-1] but actually got []
at org.junit.Assert.fail(Assert.java:88)
at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:771)
at 
kafka.api.PlaintextConsumerTest.testPatternSubscription(PlaintextConsumerTest.scala:171)

kafka.api.PlaintextConsumerTest > testPauseStateNotPreservedByRebalance PASSED

kafka.api.PlaintextConsumerTest > testUnsubscribeTopic PASSED

kafka.api.PlaintextConsumerTest > testListTopics PASSED

kafka.api.PlaintextConsumerTest > testAutoCommitOnRebalance PASSED

kafka.api.PlaintextConsumerTest > testSimpleConsumption PASSED

kafka.api.PlaintextConsumerTest > testPartitionReassignmentCallback PASSED

kafka.api.PlaintextConsumerTest > testCommitSpecifiedOffsets PASSED

kafka.api.ProducerBounceTest > testBrokerFailure PASSED

kafka.api.ProducerFailureHandlingTest > testCannotSendToInternalTopic PASSED

kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckOne PASSED

kafka.api.ProducerFailureHandlingTest > testWrongBrokerList PASSED

kafka.api.ProducerFailureHandlingTest > testNotEnoughReplicas PASSED

kafka.api.ProducerFailureHandlingTest > testNonExistentTopic PASSED

kafka.api.ProducerFailureHandlingTest > testInvalidPartition PASSED

kafka.api.ProducerFailureHandlingTest > testSendAfterClosed PASSED

kafka.api.ProducerFailureHandlingTest > testTooLargeRecordWithAckZero PASSED

kafka.api.ProducerFailureHandlingTest > 
testNotEnoughReplicasAfterBrokerShutdown PASSED

kafka.api.RackAwareAutoTopicCreationTest > testAutoCreateTopic PASSED

kafka.api.SaslPlaintextConsumerTest > testPauseStateNotPreservedByRebalance 
PASSED

kafka.api.SaslPlaintextConsumerTest > testUnsubscribeTopic PASSED

kafka.api.SaslPlaintextConsumerTest > testListTopics PASSED

kafka.api.SaslPlaintextConsumerTest > testAutoCommitOnRebalance PASSED

kafka.api.SaslPlaintextConsumerTest > testSimpleConsumption PASSED

kafka.api.SaslPlaintextConsumerTest > testPartitionReassignmentCallback PASSED

kafka.api.SaslPlaintextConsumerTest > testCommitSpecifiedOffsets PASSED

kafka.api.SslConsumerTest > testPauseStateNotPreservedByRebalance PASSED

kafka.api.SslConsumerTest > testUnsubscribeTopic PASSED

kafka.api.SslConsumerTest > testListTopics PASSED

kafka.api.SslConsumerTest > testAutoCommitOnRebalance PASSED

kafka.api.SslConsumerTest > testSimpleConsumption PASSED

kafka.api.SslConsumerTest > testPartitionReassignmentCallback PASSED

kafka.api.SslConsumerTest > testCommitSpecifiedOffsets PASSED

kafka.api.ConsumerBounceTest > testSeekAndCommitWithBrokerFailures PASSED

kafka.api.ConsumerBounceTest > testConsumptionWithBrokerFailures PASSED

kafka.api.AuthorizerIntegrationTest > testCommitWithTopicWrite PASSED

kafka.api.AuthorizerIntegrationTest > testConsumeWithNoTopicAccess PASSED

kafka.api.AuthorizerIntegrationTest > 
testCreatePermissionNeededToReadFromNonExistentTopic PASSED

kafka.api.AuthorizerIntegrationTest > testOffsetFetchWithNoAccess PASSED

kafka.api.AuthorizerIntegrationTest > testListOfsetsWithTopicDescribe PASSED

kafka.api.AuthorizerIntegrationTest > testProduceWithTopicRead PASSED

kafka.api.AuthorizerIntegrationTest > testListOffsetsWithNoTopicAccess PASSED

kafka.api.AuthorizerIntegrationTest > 
testCreatePermissionNeededForWritingToNonExistentTopic PASSED

kafka.api.AuthorizerIntegrationTest > testConsumeWithNoGroupAccess PASSED

kafka.api.AuthorizerIntegrationTest > testConsumeWithTopicDescribe PASSED

kafka.api.AuthorizerIntegrationTest > testOffsetFetchWithNoTopicAccess PASSED

kafka.api.AuthorizerIntegrationTest > testCommitWithNoTopicAccess PASSED

kafka.api.AuthorizerIntegrationTest > testProduceWithNoTopicAccess PASSED

kafka.api.AuthorizerIntegrationTest > testProduceWithTopicWrite PASSED

kafka.api.AuthorizerIntegrationTest > testAuthorization PASSED

kafka.api.AuthorizerIntegrationTest > testConsumeWithTopicAndGroupRead PASSED

kafka.api.AuthorizerIntegrationTest > testConsumeWithTopicWrite PASSED

kafka.api.AuthorizerIntegrationTest > testOffsetFetchWithNoGroupAccess PASSED

kafka.api

[jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-07-06 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365046#comment-15365046
 ] 

Jun Rao commented on KAFKA-3919:


[~BigAndy], yes, you are right. We actually do index based on the first offset 
in a compressed messageSet during recovery. So, the hypothesis can indeed 
happen. It seems that you lost multiple brokers in a hard way at the same time, 
was that due to a power outage?

To fix this properly, we probably want to fix KAFKA-1211. This is a bit 
involved since we need to keep track of the leader generations in the log. 
However, if we can do that, if the same situation occurs when a follower wants 
to fetch from an offset that has been overwritten with new messages in the 
leader, the follower would know those messages are from a newer generation of 
the leader and will truncate its local log to a correct offset.

Another thing that's a bit weird right now is that in the leader, we index 
based on the first offset in a compressed message set. But in the follower, we 
index based on the last offset in a compressed message set. This is mostly to 
avoid decompression on the follower side since we only store the last offset in 
the wrapper message. Indexing either first offset or last offset is 
semantically correct, but it would be good to make this consistent. Next time 
when we evolve the message format, we probably want to consider adding the 
first offset in the wrapper message.

> Broker faills to start after ungraceful shutdown due to non-monotonically 
> incrementing offsets in logs
> --
>
> Key: KAFKA-3919
> URL: https://issues.apache.org/jira/browse/KAFKA-3919
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Andy Coates
>
> Hi All,
> I encountered an issue with Kafka following a power outage that saw a 
> proportion of our cluster disappear. When the power came back on several 
> brokers halted on start up with the error:
> {noformat}
>   Fatal error during KafkaServerStartable startup. Prepare to shutdown”
>   kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1239742691) to position 35728 no larger than the last offset appended 
> (1239742822) to 
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
>   at kafka.log.LogSegment.recover(LogSegment.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at kafka.log.Log.loadSegments(Log.scala:160)
>   at kafka.log.Log.(Log.scala:90)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only way to recover the brokers was to delete the log files that 
> contained non monotonically incrementing offsets.
> We've spent some time digging through the logs and I feel I may have worked 
> out the sequence of events leading to this issue, (though this is based on 
> some assumptions I've made about the way Kafka is working, which may be 
> wrong).
> First off, we have unclean leadership elections disable. (We did later enable 
> them to help get around some other issues we were having, but this was 
> several hours after this issue manifested), and we're producing to the topic 
> with gzip compression and acks=1
> We looked through the data logs that were causing the brokers to not start. 
> What we found the initial part of the log has monotonically increasing 
> offset,

Build failed in Jenkins: kafka-trunk-jdk7 #1407

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3836: KStreamReduce and KTableReduce should not pass nulls to

[wangguoz] KAFKA-3926: Fix transient test failure in RegexSourceIntegrationTest

--
[...truncated 3325 lines...]

kafka.producer.AsyncProducerTest > testJavaProducer PASSED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder STARTED

kafka.producer.AsyncProducerTest > testIncompatibleEncoder PASSED

kafka.producer.ProducerTest > testSendToNewTopic STARTED

kafka.producer.ProducerTest > testSendToNewTopic PASSED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout STARTED

kafka.producer.ProducerTest > testAsyncSendCanCorrectlyFailWithTimeout PASSED

kafka.producer.ProducerTest > testSendNullMessage STARTED

kafka.producer.ProducerTest > testSendNullMessage PASSED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo STARTED

kafka.producer.ProducerTest > testUpdateBrokerPartitionInfo PASSED

kafka.producer.ProducerTest > testSendWithDeadBroker STARTED

kafka.producer.ProducerTest > testSendWithDeadBroker PASSED

kafka.tools.ConsoleConsumerTest > shouldLimitReadsToMaxMessageLimit STARTED

kafka.tools.ConsoleConsumerTest > shouldLimitReadsToMaxMessageLimit PASSED

kafka.tools.ConsoleConsumerTest > shouldParseValidNewConsumerValidConfig STARTED

kafka.tools.ConsoleConsumerTest > shouldParseValidNewConsumerValidConfig PASSED

kafka.tools.ConsoleConsumerTest > shouldStopWhenOutputCheckErrorFails STARTED

kafka.tools.ConsoleConsumerTest > shouldStopWhenOutputCheckErrorFails PASSED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithStringOffset STARTED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithStringOffset PASSED

kafka.tools.ConsoleConsumerTest > shouldParseConfigsFromFile STARTED

kafka.tools.ConsoleConsumerTest > shouldParseConfigsFromFile PASSED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithNumericOffset STARTED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithNumericOffset PASSED

kafka.tools.ConsoleConsumerTest > shouldParseValidOldConsumerValidConfig STARTED

kafka.tools.ConsoleConsumerTest > shouldParseValidOldConsumerValidConfig PASSED

kafka.tools.ConsoleProducerTest > testParseKeyProp STARTED

kafka.tools.ConsoleProducerTest > testParseKeyProp PASSED

kafka.tools.ConsoleProducerTest > testValidConfigsOldProducer STARTED

kafka.tools.ConsoleProducerTest > testValidConfigsOldProducer PASSED

kafka.tools.ConsoleProducerTest > testInvalidConfigs STARTED

kafka.tools.ConsoleProducerTest > testInvalidConfigs PASSED

kafka.tools.ConsoleProducerTest > testValidConfigsNewProducer STARTED

kafka.tools.ConsoleProducerTest > testValidConfigsNewProducer PASSED

kafka.tools.MirrorMakerTest > testDefaultMirrorMakerMessageHandler STARTED

kafka.tools.MirrorMakerTest > testDefaultMirrorMakerMessageHandler PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytesWithCompression 
STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytesWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > 
testIteratorIsConsistentWithCompression STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > 
testIteratorIsConsistentWithCompression PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testIteratorIsConsistent 
STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEqualsWithCompression 
STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testEqualsWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testWrittenEqualsRead STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEquals STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testEquals PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytes STARTED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytes PASSED

kafka.javaapi.consumer.ZookeeperConsumerConnectorTest > testBasic STARTED

kafka.javaapi.consumer.ZookeeperConsumerConnectorTest > testBasic PASSED

kafka.common.ConfigTest > testInvalidGroupIds STARTED

kafka.common.ConfigTest > testInvalidGroupIds PASSED

kafka.common.ConfigTest > testInvalidClientIds STARTED

kafka.common.ConfigTest > testInvalidClientIds PASSED

kafka.common.ZkNodeChangeNotificationListenerTest > testProcessNotification 
STARTED

kafka.common.ZkNodeChangeNotificationListenerTest > testProcessNotification 
PASSED

kafka.common.TopicTest > testInvalidTopicNames STARTED

kafka.common.TopicTest > testInvalidTopicNames PASSED

kafka.common.TopicTest > testTopicHasCollision STARTED

kafka.common.TopicTest > testTopicHasCollision PASSED

kafka.common.TopicTest > testTopicHasCollisionChars STAR

Re: Parallelisation factor in kafka streams

2016-07-06 Thread Jeyhun Karimov
Thank you for your answer Matthias.
Is it possible to change the parallelism in runtime? Or do we have to
restart the cluster?


On Wed, 6 Jul 2016 at 19:08, Matthias J. Sax  wrote:

> Hi Jeyhun,
>
> the number of partitions determine the number of tasks within a Kafka
> Streams application and thus, the maximum number of parallelism for your
> application.
>
> For more details see
> http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model
>
> You can set the number of threads for a single application instance, via
> parameter "num.stream.threads" (default value is 1).
>
> See
>
> http://docs.confluent.io/3.0.0/streams/developer-guide.html#optional-configuration-parameters
>
>
> -Matthias
>
> On 07/06/2016 06:11 PM, Jeyhun Karimov wrote:
> > Hi community,
> >
> > How can I set parallelisation factor in kafka streams? Is it related with
> > the number of partitions within topics?
> >
> >
> > Cheers
> > Jeyhun
> >
>
> --
-Cheers

Jeyhun


[jira] [Commented] (KAFKA-3894) Log Cleaner thread crashes and never restarts

2016-07-06 Thread Kiran Pillarisetty (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365026#comment-15365026
 ] 

Kiran Pillarisetty commented on KAFKA-3894:
---

Regarding Log cleaner JMX metrics, I just submitted a PR. Please take a look:
https://github.com/apache/kafka/pull/1593

JIRA: https://issues.apache.org/jira/browse/KAFKA-3857


> Log Cleaner thread crashes and never restarts
> -
>
> Key: KAFKA-3894
> URL: https://issues.apache.org/jira/browse/KAFKA-3894
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.2, 0.9.0.1
> Environment: Oracle JDK 8
> Ubuntu Precise
>Reporter: Tim Carey-Smith
>  Labels: compaction
>
> The log-cleaner thread can crash if the number of keys in a topic grows to be 
> too large to fit into the dedupe buffer. 
> The result of this is a log line: 
> {quote}
> broker=0 pri=ERROR t=kafka-log-cleaner-thread-0 at=LogCleaner 
> \[kafka-log-cleaner-thread-0\], Error due to  
> java.lang.IllegalArgumentException: requirement failed: 9750860 messages in 
> segment MY_FAVORITE_TOPIC-2/47580165.log but offset map can fit 
> only 5033164. You can increase log.cleaner.dedupe.buffer.size or decrease 
> log.cleaner.threads
> {quote}
> As a result, the broker is left in a potentially dangerous situation where 
> cleaning of compacted topics is not running. 
> It is unclear if the broader strategy for the {{LogCleaner}} is the reason 
> for this upper bound, or if this is a value which must be tuned for each 
> specific use-case. 
> Of more immediate concern is the fact that the thread crash is not visible 
> via JMX or exposed as some form of service degradation. 
> Some short-term remediations we have made are:
> * increasing the size of the dedupe buffer
> * monitoring the log-cleaner threads inside the JVM



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3857) Additional log cleaner metrics

2016-07-06 Thread Kiran Pillarisetty (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kiran Pillarisetty updated KAFKA-3857:
--
Reviewer: Ismael Juma
  Status: Patch Available  (was: Open)

[~ijuma] Since there was some discussion about Log Cleaner JMX metrics in 
https://issues.apache.org/jira/browse/KAFKA-3894 I thought you might be 
interested in this PR. Would you be able to review this or suggest some one 
else?

> Additional log cleaner metrics
> --
>
> Key: KAFKA-3857
> URL: https://issues.apache.org/jira/browse/KAFKA-3857
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kiran Pillarisetty
>
> The proposal would be to add a couple of additional log cleaner metrics: 
> 1. Time of last log cleaner run 
> 2. Cumulative number of successful log cleaner runs since last broker restart.
> Existing log cleaner metrics (max-buffer-utilization-percent, 
> cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not 
> differentiate an idle log cleaner from a dead log cleaner. It would be useful 
> to have the above two metrics added, to indicate whether log cleaner is alive 
> (and successfully cleaning) or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3857) Additional log cleaner metrics

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365013#comment-15365013
 ] 

ASF GitHub Bot commented on KAFKA-3857:
---

GitHub user kiranptivo reopened a pull request:

https://github.com/apache/kafka/pull/1593

KAFKA-3857 Additional log cleaner metrics

Fixes KAFKA-3857

Changes proposed in this pull request:

The following additional log cleaner metrics have been added.
1. num-runs: Cumulative number of successful log cleaner runs since last 
broker restart.
2. last-run-time: Time of last log cleaner run.
3. num-filthy-logs: Number of filthy logs. A non zero value for an extended 
period of time indicates that the cleaner has not been successful in cleaning 
the logs.

A note on num-filthy-logs: It is incremented whenever a filthy topic 
partition is added to inProgress HashMap. And it is decremented once the 
cleaning is successful, or if the cleaning is aborted. Note that the existing 
LogCleaner code does not provide a metric to check if the clean operation is 
successful or not. There is an inProgress HashMap with topicPartition  => 
LogCleaningInProgress entries in it, but the entries are removed from the 
HashMap even when clean operation throws an exception. So, added an additional 
metric num-filthy-logs, to differentiate between a successful log clean case 
and an exception case.

The code is ready. I have tested and verified JMX metrics. There is one 
case I couldn't test though. It's the case where numFilthyLogs is decremented 
in 'resumeCleaning(...)' in LogCleanerManager.scala Line 188. It seems to be a 
part of the workflow that aborts the cleaning of a particular partition. Any 
ideas on how to test this scenario?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TiVo/kafka log_cleaner_jmx_metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1593


commit f00de412f6b1f6568adef479687ae0df789f9c96
Author: Kiran Pillarisetty 
Date:   2016-06-14T17:40:26Z

Create a couple of additional Log Cleaner JMX metrics
log-clean-last-run: Log cleaner's last run time
log-clean-runs: Number of log cleaner runs.

commit 7dc7511ee2b6d3cdf9df0c366fe23bf34d062a54
Author: Kiran Pillarisetty 
Date:   2016-06-14T20:24:00Z

Created a couple of additional Log Cleaner JMX metrics
log-clean-last-run: a metric to track last log cleaner run (unix timestamp)
log-clean-runs: a metric to track number of log cleaner runs

Committer: Kiran Pillarisetty 

commit 7f1214ff1118103dd639df717e988a22bad8033d
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:14:57Z

Add additional JMX metric to track successful cleaning of a log segment

commit 1ac346bb37008312e41035167dbfd75803595cd6
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:17:25Z

Add additional JMX metric to track successful cleaning of a log segment

commit 4f08d875e05c35bd7d7c849584b8b029031f884b
Author: Kiran Pillarisetty 
Date:   2016-07-05T22:23:20Z

Metric name updated to num-filthy-logs. Metric incremented as it is grabbed 
for cleaning, and decremented once the cleaning is done, or if the cleaning is 
aborted

commit cd887c05bf1d56b7566c5b72b3ddf3bcdfb70898
Author: Kiran Pillarisetty 
Date:   2016-07-05T23:31:32Z

Changed a metric name (number-of-runs to num-runs). Removed an extra \n 
around line 164. It is not present in the trunk




> Additional log cleaner metrics
> --
>
> Key: KAFKA-3857
> URL: https://issues.apache.org/jira/browse/KAFKA-3857
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kiran Pillarisetty
>
> The proposal would be to add a couple of additional log cleaner metrics: 
> 1. Time of last log cleaner run 
> 2. Cumulative number of successful log cleaner runs since last broker restart.
> Existing log cleaner metrics (max-buffer-utilization-percent, 
> cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not 
> differentiate an idle log cleaner from a dead log cleaner. It would be useful 
> to have the above two metrics added, to indicate whether log cleaner is alive 
> (and successfully cleaning) or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1593: KAFKA-3857 Additional log cleaner metrics

2016-07-06 Thread kiranptivo
GitHub user kiranptivo reopened a pull request:

https://github.com/apache/kafka/pull/1593

KAFKA-3857 Additional log cleaner metrics

Fixes KAFKA-3857

Changes proposed in this pull request:

The following additional log cleaner metrics have been added.
1. num-runs: Cumulative number of successful log cleaner runs since last 
broker restart.
2. last-run-time: Time of last log cleaner run.
3. num-filthy-logs: Number of filthy logs. A non zero value for an extended 
period of time indicates that the cleaner has not been successful in cleaning 
the logs.

A note on num-filthy-logs: It is incremented whenever a filthy topic 
partition is added to inProgress HashMap. And it is decremented once the 
cleaning is successful, or if the cleaning is aborted. Note that the existing 
LogCleaner code does not provide a metric to check if the clean operation is 
successful or not. There is an inProgress HashMap with topicPartition  => 
LogCleaningInProgress entries in it, but the entries are removed from the 
HashMap even when clean operation throws an exception. So, added an additional 
metric num-filthy-logs, to differentiate between a successful log clean case 
and an exception case.

The code is ready. I have tested and verified JMX metrics. There is one 
case I couldn't test though. It's the case where numFilthyLogs is decremented 
in 'resumeCleaning(...)' in LogCleanerManager.scala Line 188. It seems to be a 
part of the workflow that aborts the cleaning of a particular partition. Any 
ideas on how to test this scenario?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TiVo/kafka log_cleaner_jmx_metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1593


commit f00de412f6b1f6568adef479687ae0df789f9c96
Author: Kiran Pillarisetty 
Date:   2016-06-14T17:40:26Z

Create a couple of additional Log Cleaner JMX metrics
log-clean-last-run: Log cleaner's last run time
log-clean-runs: Number of log cleaner runs.

commit 7dc7511ee2b6d3cdf9df0c366fe23bf34d062a54
Author: Kiran Pillarisetty 
Date:   2016-06-14T20:24:00Z

Created a couple of additional Log Cleaner JMX metrics
log-clean-last-run: a metric to track last log cleaner run (unix timestamp)
log-clean-runs: a metric to track number of log cleaner runs

Committer: Kiran Pillarisetty 

commit 7f1214ff1118103dd639df717e988a22bad8033d
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:14:57Z

Add additional JMX metric to track successful cleaning of a log segment

commit 1ac346bb37008312e41035167dbfd75803595cd6
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:17:25Z

Add additional JMX metric to track successful cleaning of a log segment

commit 4f08d875e05c35bd7d7c849584b8b029031f884b
Author: Kiran Pillarisetty 
Date:   2016-07-05T22:23:20Z

Metric name updated to num-filthy-logs. Metric incremented as it is grabbed 
for cleaning, and decremented once the cleaning is done, or if the cleaning is 
aborted

commit cd887c05bf1d56b7566c5b72b3ddf3bcdfb70898
Author: Kiran Pillarisetty 
Date:   2016-07-05T23:31:32Z

Changed a metric name (number-of-runs to num-runs). Removed an extra \n 
around line 164. It is not present in the trunk




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3857) Additional log cleaner metrics

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365010#comment-15365010
 ] 

ASF GitHub Bot commented on KAFKA-3857:
---

Github user kiranptivo closed the pull request at:

https://github.com/apache/kafka/pull/1593


> Additional log cleaner metrics
> --
>
> Key: KAFKA-3857
> URL: https://issues.apache.org/jira/browse/KAFKA-3857
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kiran Pillarisetty
>
> The proposal would be to add a couple of additional log cleaner metrics: 
> 1. Time of last log cleaner run 
> 2. Cumulative number of successful log cleaner runs since last broker restart.
> Existing log cleaner metrics (max-buffer-utilization-percent, 
> cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not 
> differentiate an idle log cleaner from a dead log cleaner. It would be useful 
> to have the above two metrics added, to indicate whether log cleaner is alive 
> (and successfully cleaning) or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1593: KAFKA-3857 Additional log cleaner metrics

2016-07-06 Thread kiranptivo
Github user kiranptivo closed the pull request at:

https://github.com/apache/kafka/pull/1593


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[VOTE] KIP-67: Queryable state for Kafka Streams

2016-07-06 Thread Damian Guy
Hi all,

I'd like to initiate the voting process for KIP-67


KAFKA-3909  is the top
level JIRA for this effort.

Initial PRs for Step 1 of the process are:
Expose State Store Names  and
Query Local State Stores 

Thanks,
Damian


[jira] [Resolved] (KAFKA-3926) Transient failures in org.apache.kafka.streams.integration.RegexSourceIntegrationTest

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-3926.
--
   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1590
[https://github.com/apache/kafka/pull/1590]

> Transient failures in 
> org.apache.kafka.streams.integration.RegexSourceIntegrationTest
> -
>
> Key: KAFKA-3926
> URL: https://issues.apache.org/jira/browse/KAFKA-3926
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
> Fix For: 0.10.1.0
>
>
> {code}
> org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
> testRegexMatchesTopicsAWhenDeleted FAILED
> java.lang.AssertionError: 
> Expected: <[TEST-TOPIC-A, TEST-TOPIC-B]>
>  but: was <[TEST-TOPIC-A]>
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.junit.Assert.assertThat(Assert.java:956)
> at org.junit.Assert.assertThat(Assert.java:923)
> at 
> org.apache.kafka.streams.integration.RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted(RegexSourceIntegrationTest.java:211)
> {code}
> I think it is due to the fact that some times the rebalance takes much longer 
> than the specified 60 seconds.
> One example: https://builds.apache.org/job/kafka-trunk-jdk8/730/consoleFull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-67: Queryable state for Kafka Stream

2016-07-06 Thread Damian Guy
Thanks - updated

On Wed, 6 Jul 2016 at 20:08 Guozhang Wang  wrote:

> Thanks Damian, the KIP wiki looks good to me. One minor comment on the
> "Compatibility, Deprecation, and Migration Plan" section: we probably also
> want to mentions that since we need to handle concurrent access with the
> queryable state support, this may incur slight overhead on the streams
> applications while query is on-going, and we will quantize the overhead in
> our benchmarks.
>
> On Mon, Jul 4, 2016 at 12:44 AM, Damian Guy  wrote:
>
> > Thanks Jay - i've updated the KIP accordingly.
> >
> > Thanks,
> > Damian
> >
> > On Fri, 1 Jul 2016 at 16:19 Jay Kreps  wrote:
> >
> > > We have not used the "get" prefex in methods, like getXyz(), elsewhere
> in
> > > our java code, instead sticking with the scala style methods like
> xyz().
> > > It'd be good to change those.
> > >
> > > -Jay
> > >
> > > On Fri, Jul 1, 2016 at 4:09 AM, Damian Guy 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > We've made some modifications to the KIP. The "Discovery" API has
> been
> > > > changed
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams#KIP-67:QueryablestateforKafkaStreams-Step2inproposal:globaldiscoveryofstatestores
> > > >
> > > > Please take a look.
> > > >
> > > > Many thanks,
> > > > Damian
> > > >
> > > > On Tue, 28 Jun 2016 at 09:34 Damian Guy 
> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We have created KIP 67: Queryable state for Kafka Streams`
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
> > > > >
> > > > > Please take a look. Feedback is appreciated.
> > > > >
> > > > > Thank you
> > > > >
> > > >
> > >
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Commented] (KAFKA-3926) Transient failures in org.apache.kafka.streams.integration.RegexSourceIntegrationTest

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364942#comment-15364942
 ] 

ASF GitHub Bot commented on KAFKA-3926:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1590


> Transient failures in 
> org.apache.kafka.streams.integration.RegexSourceIntegrationTest
> -
>
> Key: KAFKA-3926
> URL: https://issues.apache.org/jira/browse/KAFKA-3926
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams, unit tests
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
> Fix For: 0.10.1.0
>
>
> {code}
> org.apache.kafka.streams.integration.RegexSourceIntegrationTest > 
> testRegexMatchesTopicsAWhenDeleted FAILED
> java.lang.AssertionError: 
> Expected: <[TEST-TOPIC-A, TEST-TOPIC-B]>
>  but: was <[TEST-TOPIC-A]>
> at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
> at org.junit.Assert.assertThat(Assert.java:956)
> at org.junit.Assert.assertThat(Assert.java:923)
> at 
> org.apache.kafka.streams.integration.RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted(RegexSourceIntegrationTest.java:211)
> {code}
> I think it is due to the fact that some times the rebalance takes much longer 
> than the specified 60 seconds.
> One example: https://builds.apache.org/job/kafka-trunk-jdk8/730/consoleFull



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1590: KAFKA-3926: Transient test failures - confirm all ...

2016-07-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1590


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3836) KStreamReduce and KTableReduce should not pass nulls to Deserializers

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-3836.
--
   Resolution: Fixed
Fix Version/s: 0.10.0.1

Issue resolved by pull request 1591
[https://github.com/apache/kafka/pull/1591]

> KStreamReduce and KTableReduce should not pass nulls to Deserializers
> -
>
> Key: KAFKA-3836
> URL: https://issues.apache.org/jira/browse/KAFKA-3836
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Avi Flax
>Assignee: Jeyhun Karimov
>Priority: Trivial
>  Labels: architecture
> Fix For: 0.10.0.1
>
>
> As per [this 
> discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccahwhrru29jw4jgvhsijwbvlzb3bc6qz6pbh9tqcfbcorjk4...@mail.gmail.com%3e]
>  these classes currently pass null values along to Deserializers, so those 
> Deserializers need to handle null inputs and pass them through without 
> throwing. It would be better for these classes to simply not call the 
> Deserializers in this case; this would reduce the burden of implementers of 
> {{Deserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3836) RocksDBStore.get() should not pass nulls to Deserializers

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3836:
-
Summary: RocksDBStore.get() should not pass nulls to Deserializers  (was: 
KStreamReduce and KTableReduce should not pass nulls to Deserializers)

> RocksDBStore.get() should not pass nulls to Deserializers
> -
>
> Key: KAFKA-3836
> URL: https://issues.apache.org/jira/browse/KAFKA-3836
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Avi Flax
>Assignee: Jeyhun Karimov
>Priority: Trivial
>  Labels: architecture
> Fix For: 0.10.0.1
>
>
> As per [this 
> discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccahwhrru29jw4jgvhsijwbvlzb3bc6qz6pbh9tqcfbcorjk4...@mail.gmail.com%3e]
>  these classes currently pass null values along to Deserializers, so those 
> Deserializers need to handle null inputs and pass them through without 
> throwing. It would be better for these classes to simply not call the 
> Deserializers in this case; this would reduce the burden of implementers of 
> {{Deserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1406

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3794: added stream / table names as prefix to print / 
writeAsText

--
[...truncated 5680 lines...]

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic PASSED

kafka.log.LogTest > testIndexRebuild STARTED

kafka.log.LogTest > testIndexRebuild PASSED

kafka.log.LogTest > testLogRolls STARTED

kafka.log.LogTest > testLogRolls PASSED

kafka.log.LogTest > testMessageSizeCheck STARTED

kafka.log.LogTest > testMessageSizeCheck PASSED

kafka.log.LogTest > testAsyncDelete STARTED

kafka.log.LogTest > testAsyncDelete PASSED

kafka.log.LogTest > testReadOutOfRange STARTED

kafka.log.LogTest > testReadOutOfRange PASSED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException STARTED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException PASSED

kafka.log.LogTest > testReadAtLogGap STARTED

kafka.log.LogTest > testReadAtLogGap PASSED

kafka.log.LogTest > testTimeBasedLogRoll STARTED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testLoadEmptyLog STARTED

kafka.log.LogTest > testLoadEmptyLog PASSED

kafka.log.LogTest > testMessageSetSizeCheck STARTED

kafka.log.LogTest > testMessageSetSizeCheck PASSED

kafka.log.LogTest > testIndexResizingAtTruncation STARTED

kafka.log.LogTest > testIndexResizingAtTruncation PASSED

kafka.log.LogTest > testCompactedTopicConstraints STARTED

kafka.log.LogTest > testCompactedTopicConstraints PASSED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset STARTED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets STARTED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets PASSED

kafka.log.LogTest > testDeleteOldSegmentsMethod STARTED

kafka.log.LogTest > testDeleteOldSegmentsMethod PASSED

kafka.log.LogTest > testParseTopicPartitionNameForNull STARTED

kafka.log.LogTest > testParseTopicPartitionNameForNull PASSED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets STARTED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator PASSED

kafka.log.LogTest > testCorruptIndexRebuild STARTED

kafka.log.LogTest > testCorruptIndexRebuild PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved STARTED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages STARTED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload STARTED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog STARTED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset STARTED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate STARTED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName STARTED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles STARTED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testSizeBasedLogRoll STARTED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter STARTED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName STARTED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo STARTED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile STARTED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogConfigTest > testFromPropsEmpty STARTED

kafka.log.LogConfigTest > testFromPropsEmpty PASSED

kafka.log.LogConfigTest > testKafkaConfigToProps STARTED

kafka.log.LogConfigTest > testKafkaConfigToProps PASSED

kafka.log.LogConfigTest > testFromPropsInvalid STARTED

kafka.log.LogConfigTest > testFromPropsInvalid PASSED

kafka.log.CleanerTest > testBuildOffsetMap STARTED

kafka.log.CleanerTest > testBuildOffsetMap PASSED

kafka.log.CleanerTest 

[GitHub] kafka pull request #1591: KAFKA-3836: KStreamReduce and KTableReduce should ...

2016-07-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1591


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3836) KStreamReduce and KTableReduce should not pass nulls to Deserializers

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364936#comment-15364936
 ] 

ASF GitHub Bot commented on KAFKA-3836:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1591


> KStreamReduce and KTableReduce should not pass nulls to Deserializers
> -
>
> Key: KAFKA-3836
> URL: https://issues.apache.org/jira/browse/KAFKA-3836
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Avi Flax
>Assignee: Jeyhun Karimov
>Priority: Trivial
>  Labels: architecture
>
> As per [this 
> discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccahwhrru29jw4jgvhsijwbvlzb3bc6qz6pbh9tqcfbcorjk4...@mail.gmail.com%3e]
>  these classes currently pass null values along to Deserializers, so those 
> Deserializers need to handle null inputs and pass them through without 
> throwing. It would be better for these classes to simply not call the 
> Deserializers in this case; this would reduce the burden of implementers of 
> {{Deserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3101) Optimize Aggregation Outputs

2016-07-06 Thread Bill Bejeck (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364919#comment-15364919
 ] 

Bill Bejeck commented on KAFKA-3101:


Is this available now? If so I'd like to pick this up if possible.

> Optimize Aggregation Outputs
> 
>
> Key: KAFKA-3101
> URL: https://issues.apache.org/jira/browse/KAFKA-3101
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: architecture
> Fix For: 0.10.1.0
>
>
> Today we emit one output record for each incoming message for Table / 
> Windowed Stream Aggregations. For example, say we have a sequence of 
> aggregate outputs computed from the input stream (assuming there is no agg 
> value for this key before):
> V1, V2, V3, V4, V5
> Then the aggregator will output the following sequence of Change oldValue>:
> , , , , 
> where could cost a lot of CPU overhead computing the intermediate results. 
> Instead if we can let the underlying state store to "remember" the last 
> emitted old value, we can reduce the number of emits based on some configs. 
> More specifically, we can add one more field in the KV store engine storing 
> the last emitted old value, which only get updated when we emit to the 
> downstream processor. For example:
> At Beginning: 
> Store: key => empty (no agg values yet)
> V1 computed: 
> Update Both in Store: key => (V1, V1), Emit 
> V2 computed: 
> Update NewValue in Store: key => (V2, V1), No Emit
> V3 computed: 
> Update NewValue in Store: key => (V3, V1), No Emit
> V4 computed: 
> Update Both in Store: key => (V4, V4), Emit 
> V5 computed: 
> Update NewValue in Store: key => (V5, V4), No Emit
> One more thing to consider is that, we need a "closing" time control on the 
> not-yet-emitted keys; when some time has elapsed (or the window is to be 
> closed), we need to check for any key if their current materialized pairs 
> have not been emitted (for example  in the above example). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-67: Queryable state for Kafka Stream

2016-07-06 Thread Guozhang Wang
Thanks Damian, the KIP wiki looks good to me. One minor comment on the
"Compatibility, Deprecation, and Migration Plan" section: we probably also
want to mentions that since we need to handle concurrent access with the
queryable state support, this may incur slight overhead on the streams
applications while query is on-going, and we will quantize the overhead in
our benchmarks.

On Mon, Jul 4, 2016 at 12:44 AM, Damian Guy  wrote:

> Thanks Jay - i've updated the KIP accordingly.
>
> Thanks,
> Damian
>
> On Fri, 1 Jul 2016 at 16:19 Jay Kreps  wrote:
>
> > We have not used the "get" prefex in methods, like getXyz(), elsewhere in
> > our java code, instead sticking with the scala style methods like xyz().
> > It'd be good to change those.
> >
> > -Jay
> >
> > On Fri, Jul 1, 2016 at 4:09 AM, Damian Guy  wrote:
> >
> > > Hi,
> > >
> > > We've made some modifications to the KIP. The "Discovery" API has been
> > > changed
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams#KIP-67:QueryablestateforKafkaStreams-Step2inproposal:globaldiscoveryofstatestores
> > >
> > > Please take a look.
> > >
> > > Many thanks,
> > > Damian
> > >
> > > On Tue, 28 Jun 2016 at 09:34 Damian Guy  wrote:
> > >
> > > > Hi,
> > > >
> > > > We have created KIP 67: Queryable state for Kafka Streams`
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams
> > > >
> > > > Please take a look. Feedback is appreciated.
> > > >
> > > > Thank you
> > > >
> > >
> >
>



-- 
-- Guozhang


[jira] [Created] (KAFKA-3932) Consumer fails to consume in a round robin fashion

2016-07-06 Thread Elias Levy (JIRA)
Elias Levy created KAFKA-3932:
-

 Summary: Consumer fails to consume in a round robin fashion
 Key: KAFKA-3932
 URL: https://issues.apache.org/jira/browse/KAFKA-3932
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Affects Versions: 0.10.0.0
Reporter: Elias Levy


The Java consumer fails consume messages in a round robin fashion.  This can 
lead to an unbalance consumption.

In our use case we have a set of consumer that can take a significant amount of 
time consuming messages off a topic.  For this reason, we are using the 
pause/poll/resume pattern to ensure the consumer session is not timeout.  The 
topic that is being consumed has been preloaded with message.  That means there 
is a significant message lag when the consumer is first started.  To limit how 
many messages are consumed at a time, the consumer has been configured with 
max.poll.records=1.

The first initial observation is that the client receive a large batch of 
messages for the first partition it decides to consume from and will consume 
all those messages before moving on, rather than returning a message from a 
different partition for each call to poll.

We solved this issue by configuring max.partition.fetch.bytes to be small 
enough that only a single message will be returned by the broker on each fetch, 
although this would not be feasible if message size were highly variable.

The behavior of the consumer after this change is to largely consume from a 
small number of partitions, usually just two, iterating between them, until it 
exhausts them, before moving to another partition.   This behavior is 
problematic if the messages have some rough time semantics and need to be 
process roughly time ordered across all partitions.

It would be useful if the consumer has a pluggable API that allowed custom 
logic to select which partition to consume from next, thus enabling the 
creation of a round robin partition consumer.










--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #737

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] KAFKA-3794: added stream / table names as prefix to print / 
writeAsText

--
[...truncated 3356 lines...]

kafka.message.ByteBufferMessageSetTest > testValidBytes PASSED

kafka.message.ByteBufferMessageSetTest > testValidBytesWithCompression STARTED

kafka.message.ByteBufferMessageSetTest > testValidBytesWithCompression PASSED

kafka.message.ByteBufferMessageSetTest > 
testOffsetAssignmentAfterMessageFormatConversion STARTED

kafka.message.ByteBufferMessageSetTest > 
testOffsetAssignmentAfterMessageFormatConversion PASSED

kafka.message.ByteBufferMessageSetTest > testIteratorIsConsistent STARTED

kafka.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.message.ByteBufferMessageSetTest > testAbsoluteOffsetAssignment STARTED

kafka.message.ByteBufferMessageSetTest > testAbsoluteOffsetAssignment PASSED

kafka.message.ByteBufferMessageSetTest > testCreateTime STARTED

kafka.message.ByteBufferMessageSetTest > testCreateTime PASSED

kafka.message.ByteBufferMessageSetTest > testInvalidCreateTime STARTED

kafka.message.ByteBufferMessageSetTest > testInvalidCreateTime PASSED

kafka.message.ByteBufferMessageSetTest > testWrittenEqualsRead STARTED

kafka.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.message.ByteBufferMessageSetTest > testLogAppendTime STARTED

kafka.message.ByteBufferMessageSetTest > testLogAppendTime PASSED

kafka.message.ByteBufferMessageSetTest > testWriteTo STARTED

kafka.message.ByteBufferMessageSetTest > testWriteTo PASSED

kafka.message.ByteBufferMessageSetTest > testEquals STARTED

kafka.message.ByteBufferMessageSetTest > testEquals PASSED

kafka.message.ByteBufferMessageSetTest > testSizeInBytes STARTED

kafka.message.ByteBufferMessageSetTest > testSizeInBytes PASSED

kafka.message.ByteBufferMessageSetTest > testIterator STARTED

kafka.message.ByteBufferMessageSetTest > testIterator PASSED

kafka.message.ByteBufferMessageSetTest > testRelativeOffsetAssignment STARTED

kafka.message.ByteBufferMessageSetTest > testRelativeOffsetAssignment PASSED

kafka.tools.ConsoleConsumerTest > shouldLimitReadsToMaxMessageLimit STARTED

kafka.tools.ConsoleConsumerTest > shouldLimitReadsToMaxMessageLimit PASSED

kafka.tools.ConsoleConsumerTest > shouldParseValidNewConsumerValidConfig STARTED

kafka.tools.ConsoleConsumerTest > shouldParseValidNewConsumerValidConfig PASSED

kafka.tools.ConsoleConsumerTest > shouldStopWhenOutputCheckErrorFails STARTED

kafka.tools.ConsoleConsumerTest > shouldStopWhenOutputCheckErrorFails PASSED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithStringOffset STARTED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithStringOffset PASSED

kafka.tools.ConsoleConsumerTest > shouldParseConfigsFromFile STARTED

kafka.tools.ConsoleConsumerTest > shouldParseConfigsFromFile PASSED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithNumericOffset STARTED

kafka.tools.ConsoleConsumerTest > 
shouldParseValidNewSimpleConsumerValidConfigWithNumericOffset PASSED

kafka.tools.ConsoleConsumerTest > shouldParseValidOldConsumerValidConfig STARTED

kafka.tools.ConsoleConsumerTest > shouldParseValidOldConsumerValidConfig PASSED

kafka.tools.MirrorMakerTest > testDefaultMirrorMakerMessageHandler STARTED

kafka.tools.MirrorMakerTest > testDefaultMirrorMakerMessageHandler PASSED

kafka.tools.ConsoleProducerTest > testParseKeyProp STARTED

kafka.tools.ConsoleProducerTest > testParseKeyProp PASSED

kafka.tools.ConsoleProducerTest > testValidConfigsOldProducer STARTED

kafka.tools.ConsoleProducerTest > testValidConfigsOldProducer PASSED

kafka.tools.ConsoleProducerTest > testInvalidConfigs STARTED

kafka.tools.ConsoleProducerTest > testInvalidConfigs PASSED

kafka.tools.ConsoleProducerTest > testValidConfigsNewProducer STARTED

kafka.tools.ConsoleProducerTest > testValidConfigsNewProducer PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
STARTED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics STARTED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > simpleRequest STARTED

kafka.network.SocketServerTest > simpleRequest PASSED

kafka.network.SocketServerTest > testSessionPrincipal STARTED

kafka.network.SocketServerTest > testSessionPrincipal PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides STARTED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides PASSED

kafka.network.SocketServerTest > testSocketsCloseOnShutdown STARTED

kafka.n

[jira] [Commented] (KAFKA-3857) Additional log cleaner metrics

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364847#comment-15364847
 ] 

ASF GitHub Bot commented on KAFKA-3857:
---

GitHub user kiranptivo opened a pull request:

https://github.com/apache/kafka/pull/1593

KAFKA-3857 Additional log cleaner metrics

Fixes KAFKA-3857

Changes proposed in this pull request:

The following additional log cleaner metrics have been added.
1. num-runs: Cumulative number of successful log cleaner runs since last 
broker restart.
2. last-run-time: Time of last log cleaner run.
3. num-filthy-logs: Number of filthy logs. A non zero value for an extended 
period of time indicates that the cleaner has not been successful in cleaning 
the logs.

A note on num-filthy-logs: It is incremented whenever a filthy topic 
partition is added to inProgress HashMap. And it is decremented once the 
cleaning is successful, or if the cleaning is aborted. Note that the existing 
LogCleaner code does not provide a metric to check if the clean operation is 
successful or not. There is an inProgress HashMap with topicPartition  => 
LogCleaningInProgress entries in it, but the entries are removed from the 
HashMap even when clean operation throws an exception. So, added an additional 
metric num-filthy-logs, to differentiate between a successful log clean case 
and an exception case.

The code is ready. I have tested and verified JMX metrics. There is one 
case I couldn't test though. It's the case where numFilthyLogs is decremented 
in 'resumeCleaning(...)' in LogCleanerManager.scala Line 188. It seems to be a 
part of the workflow that aborts the cleaning of a particular partition. Any 
ideas on how to test this scenario?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TiVo/kafka log_cleaner_jmx_metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1593


commit f00de412f6b1f6568adef479687ae0df789f9c96
Author: Kiran Pillarisetty 
Date:   2016-06-14T17:40:26Z

Create a couple of additional Log Cleaner JMX metrics
log-clean-last-run: Log cleaner's last run time
log-clean-runs: Number of log cleaner runs.

commit 7dc7511ee2b6d3cdf9df0c366fe23bf34d062a54
Author: Kiran Pillarisetty 
Date:   2016-06-14T20:24:00Z

Created a couple of additional Log Cleaner JMX metrics
log-clean-last-run: a metric to track last log cleaner run (unix timestamp)
log-clean-runs: a metric to track number of log cleaner runs

Committer: Kiran Pillarisetty 

commit 7f1214ff1118103dd639df717e988a22bad8033d
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:14:57Z

Add additional JMX metric to track successful cleaning of a log segment

commit 1ac346bb37008312e41035167dbfd75803595cd6
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:17:25Z

Add additional JMX metric to track successful cleaning of a log segment

commit 4f08d875e05c35bd7d7c849584b8b029031f884b
Author: Kiran Pillarisetty 
Date:   2016-07-05T22:23:20Z

Metric name updated to num-filthy-logs. Metric incremented as it is grabbed 
for cleaning, and decremented once the cleaning is done, or if the cleaning is 
aborted

commit cd887c05bf1d56b7566c5b72b3ddf3bcdfb70898
Author: Kiran Pillarisetty 
Date:   2016-07-05T23:31:32Z

Changed a metric name (number-of-runs to num-runs). Removed an extra \n 
around line 164. It is not present in the trunk




> Additional log cleaner metrics
> --
>
> Key: KAFKA-3857
> URL: https://issues.apache.org/jira/browse/KAFKA-3857
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Kiran Pillarisetty
>
> The proposal would be to add a couple of additional log cleaner metrics: 
> 1. Time of last log cleaner run 
> 2. Cumulative number of successful log cleaner runs since last broker restart.
> Existing log cleaner metrics (max-buffer-utilization-percent, 
> cleaner-recopy-percent, max-clean-time-secs, max-dirty-percent) do not 
> differentiate an idle log cleaner from a dead log cleaner. It would be useful 
> to have the above two metrics added, to indicate whether log cleaner is alive 
> (and successfully cleaning) or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1593: KAFKA-3857 Additional log cleaner metrics

2016-07-06 Thread kiranptivo
GitHub user kiranptivo opened a pull request:

https://github.com/apache/kafka/pull/1593

KAFKA-3857 Additional log cleaner metrics

Fixes KAFKA-3857

Changes proposed in this pull request:

The following additional log cleaner metrics have been added.
1. num-runs: Cumulative number of successful log cleaner runs since last 
broker restart.
2. last-run-time: Time of last log cleaner run.
3. num-filthy-logs: Number of filthy logs. A non zero value for an extended 
period of time indicates that the cleaner has not been successful in cleaning 
the logs.

A note on num-filthy-logs: It is incremented whenever a filthy topic 
partition is added to inProgress HashMap. And it is decremented once the 
cleaning is successful, or if the cleaning is aborted. Note that the existing 
LogCleaner code does not provide a metric to check if the clean operation is 
successful or not. There is an inProgress HashMap with topicPartition  => 
LogCleaningInProgress entries in it, but the entries are removed from the 
HashMap even when clean operation throws an exception. So, added an additional 
metric num-filthy-logs, to differentiate between a successful log clean case 
and an exception case.

The code is ready. I have tested and verified JMX metrics. There is one 
case I couldn't test though. It's the case where numFilthyLogs is decremented 
in 'resumeCleaning(...)' in LogCleanerManager.scala Line 188. It seems to be a 
part of the workflow that aborts the cleaning of a particular partition. Any 
ideas on how to test this scenario?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TiVo/kafka log_cleaner_jmx_metrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1593.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1593


commit f00de412f6b1f6568adef479687ae0df789f9c96
Author: Kiran Pillarisetty 
Date:   2016-06-14T17:40:26Z

Create a couple of additional Log Cleaner JMX metrics
log-clean-last-run: Log cleaner's last run time
log-clean-runs: Number of log cleaner runs.

commit 7dc7511ee2b6d3cdf9df0c366fe23bf34d062a54
Author: Kiran Pillarisetty 
Date:   2016-06-14T20:24:00Z

Created a couple of additional Log Cleaner JMX metrics
log-clean-last-run: a metric to track last log cleaner run (unix timestamp)
log-clean-runs: a metric to track number of log cleaner runs

Committer: Kiran Pillarisetty 

commit 7f1214ff1118103dd639df717e988a22bad8033d
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:14:57Z

Add additional JMX metric to track successful cleaning of a log segment

commit 1ac346bb37008312e41035167dbfd75803595cd6
Author: Kiran Pillarisetty 
Date:   2016-07-01T22:17:25Z

Add additional JMX metric to track successful cleaning of a log segment

commit 4f08d875e05c35bd7d7c849584b8b029031f884b
Author: Kiran Pillarisetty 
Date:   2016-07-05T22:23:20Z

Metric name updated to num-filthy-logs. Metric incremented as it is grabbed 
for cleaning, and decremented once the cleaning is done, or if the cleaning is 
aborted

commit cd887c05bf1d56b7566c5b72b3ddf3bcdfb70898
Author: Kiran Pillarisetty 
Date:   2016-07-05T23:31:32Z

Changed a metric name (number-of-runs to num-runs). Removed an extra \n 
around line 164. It is not present in the trunk




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3929) Add prefix for underlying clients configs in StreamConfig

2016-07-06 Thread Ishita Mandhan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364802#comment-15364802
 ] 

Ishita Mandhan commented on KAFKA-3929:
---

So have a function say appConfigsWithPrefix that is called in StreamsConfig's 
getConsumerConfigs function which is responsible for adding a prefix 
"kafka.consumer"?

> Add prefix for underlying clients configs in StreamConfig
> -
>
> Key: KAFKA-3929
> URL: https://issues.apache.org/jira/browse/KAFKA-3929
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Ishita Mandhan
>  Labels: api
>
> There are a couple of configs that have the same name for producer / consumer 
> configs, e.g. take a look at {{CommonClientConfigs}}, and also for producer / 
> consumer interceptors there are commonly named configs as well.
> This is semi-related to KAFKA-3740 since we need to add "sub-class" configs 
> for RocksDB as well, and we'd better have some prefix mechanism for such 
> hierarchical configs in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3452) Support session windows besides time interval windows

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3452:
-
Issue Type: New Feature  (was: Sub-task)
Parent: (was: KAFKA-2590)

> Support session windows besides time interval windows
> -
>
> Key: KAFKA-3452
> URL: https://issues.apache.org/jira/browse/KAFKA-3452
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: api
> Fix For: 0.10.1.0
>
>
> The Streams DSL currently does not provide session window as in the DataFlow 
> model. We have seen some common use cases for this feature and it's better 
> adding this support asap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3262) Make KafkaStreams debugging friendly

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3262:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: KAFKA-2590)

> Make KafkaStreams debugging friendly
> 
>
> Key: KAFKA-3262
> URL: https://issues.apache.org/jira/browse/KAFKA-3262
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Yasuhiro Matsuda
>Assignee: Eno Thereska
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Current KafkaStreams polls records in the same thread as the data processing 
> thread. This makes debugging user code, as well as KafkaStreams itself, 
> difficult. When the thread is suspended by the debugger, the next heartbeat 
> of the consumer tie to the thread won't be send until the thread is resumed. 
> This often results in missed heartbeats and causes a group rebalance. So it 
> may will be a completely different context then the thread hits the break 
> point the next time.
> We should consider using separate threads for polling and processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3185) Allow users to cleanup internal data

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3185:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: KAFKA-2590)

> Allow users to cleanup internal data
> 
>
> Key: KAFKA-3185
> URL: https://issues.apache.org/jira/browse/KAFKA-3185
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Matthias J. Sax
>Priority: Blocker
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently the internal data is managed completely by Kafka Streams framework 
> and users cannot clean them up actively. This results in a bad out-of-the-box 
> user experience especially for running demo programs since it results 
> internal data (changelog topics, RocksDB files, etc) that need to be cleaned 
> manually. It will be better to add a
> {code}
> KafkaStreams.cleanup()
> {code}
> function call to clean up these internal data programmatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3184) Add Checkpoint for In-memory State Store

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3184:
-
Issue Type: Improvement  (was: Sub-task)
Parent: (was: KAFKA-2590)

> Add Checkpoint for In-memory State Store
> 
>
> Key: KAFKA-3184
> URL: https://issues.apache.org/jira/browse/KAFKA-3184
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> Currently Kafka Streams does not make a checkpoint of the persistent state 
> store upon committing, which would be expensive since it is "stopping the 
> world" and write on disks: for example, RocksDB would require you to copy the 
> file directory to make a copy naively. 
> However, for in-memory stores checkpointing maybe doable in an asynchronous 
> manner hence it can be done quickly. And the benefit of having intermediate 
> checkpoint is to avoid restoring from scratch if standby tasks are not 
> present.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3183) Add metrics for persistent store caching layer

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3183:
-
Issue Type: Bug  (was: Sub-task)
Parent: (was: KAFKA-2590)

> Add metrics for persistent store caching layer
> --
>
> Key: KAFKA-3183
> URL: https://issues.apache.org/jira/browse/KAFKA-3183
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: user-experience
> Fix For: 0.10.1.0
>
>
> We need to add the metrics collection such as cache hits / misses, cache 
> size, dirty key size, etc for the RocksDBStore. However this may need to 
> refactor the RocksDBStore a little bit since currently caching is not exposed 
> to the MeteredKeyValueStore, and it uses an LRUCacheStore as the cache that 
> does not keep the dirty key set.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3101) Optimize Aggregation Outputs

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang updated KAFKA-3101:
-
Issue Type: Improvement  (was: Sub-task)
Parent: (was: KAFKA-2590)

> Optimize Aggregation Outputs
> 
>
> Key: KAFKA-3101
> URL: https://issues.apache.org/jira/browse/KAFKA-3101
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>  Labels: architecture
> Fix For: 0.10.1.0
>
>
> Today we emit one output record for each incoming message for Table / 
> Windowed Stream Aggregations. For example, say we have a sequence of 
> aggregate outputs computed from the input stream (assuming there is no agg 
> value for this key before):
> V1, V2, V3, V4, V5
> Then the aggregator will output the following sequence of Change oldValue>:
> , , , , 
> where could cost a lot of CPU overhead computing the intermediate results. 
> Instead if we can let the underlying state store to "remember" the last 
> emitted old value, we can reduce the number of emits based on some configs. 
> More specifically, we can add one more field in the KV store engine storing 
> the last emitted old value, which only get updated when we emit to the 
> downstream processor. For example:
> At Beginning: 
> Store: key => empty (no agg values yet)
> V1 computed: 
> Update Both in Store: key => (V1, V1), Emit 
> V2 computed: 
> Update NewValue in Store: key => (V2, V1), No Emit
> V3 computed: 
> Update NewValue in Store: key => (V3, V1), No Emit
> V4 computed: 
> Update Both in Store: key => (V4, V4), Emit 
> V5 computed: 
> Update NewValue in Store: key => (V5, V4), No Emit
> One more thing to consider is that, we need a "closing" time control on the 
> not-yet-emitted keys; when some time has elapsed (or the window is to be 
> closed), we need to check for any key if their current materialized pairs 
> have not been emitted (for example  in the above example). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3794) Add Stream / Table prefix in print functions

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364726#comment-15364726
 ] 

ASF GitHub Bot commented on KAFKA-3794:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1577


> Add Stream / Table prefix in print functions
> 
>
> Key: KAFKA-3794
> URL: https://issues.apache.org/jira/browse/KAFKA-3794
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie, user-experience
> Fix For: 0.10.1.0
>
>
> Currently the KTable/KStream.print() operator will print the key-value pair 
> as it was forwarded to this operator. However, if there are multiple 
> operators in the topologies with the same {{PrintStream}} (e.g. stdout), 
> their printed key-value pairs will be interleaving on that stream channel.
> Hence it is better to add a prefix for different KStream/KTable.print 
> operators. One proposal:
> 1) For KTable, it inherits a table name when created, and we can use that 
> name as the prefix as {{[table-name]: key, value}}.
> 2) For KStream, we can overload the function with an additional "name" 
> parameter that we use as the prefix; if it is not specified, then we can use 
> the parent processor node name, which has the pattern like 
> {{KSTREAM-JOIN-suffix_index}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1577: KAFKA-3794: added stream / table names as prefix t...

2016-07-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1577


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-3794) Add Stream / Table prefix in print functions

2016-07-06 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-3794.
--
   Resolution: Fixed
Fix Version/s: 0.10.1.0

Issue resolved by pull request 1577
[https://github.com/apache/kafka/pull/1577]

> Add Stream / Table prefix in print functions
> 
>
> Key: KAFKA-3794
> URL: https://issues.apache.org/jira/browse/KAFKA-3794
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Bill Bejeck
>  Labels: newbie, user-experience
> Fix For: 0.10.1.0
>
>
> Currently the KTable/KStream.print() operator will print the key-value pair 
> as it was forwarded to this operator. However, if there are multiple 
> operators in the topologies with the same {{PrintStream}} (e.g. stdout), 
> their printed key-value pairs will be interleaving on that stream channel.
> Hence it is better to add a prefix for different KStream/KTable.print 
> operators. One proposal:
> 1) For KTable, it inherits a table name when created, and we can use that 
> name as the prefix as {{[table-name]: key, value}}.
> 2) For KStream, we can overload the function with an additional "name" 
> parameter that we use as the prefix; if it is not specified, then we can use 
> the parent processor node name, which has the pattern like 
> {{KSTREAM-JOIN-suffix_index}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3809) Auto-generate documentation for topic-level configuration

2016-07-06 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364686#comment-15364686
 ] 

James Cheng commented on KAFKA-3809:


[~ijuma], [~ewencp], would either of you be able to help me find a reviewer for 
this?

> Auto-generate documentation for topic-level configuration
> -
>
> Key: KAFKA-3809
> URL: https://issues.apache.org/jira/browse/KAFKA-3809
> Project: Kafka
>  Issue Type: Bug
>Reporter: James Cheng
>Assignee: James Cheng
> Attachments: configuration.html, topic_config.html
>
>
> The documentation for topic-level configuration is not auto-generated from 
> the code. configuration.html still contains hand-maintained documentation.
> I noticed this because I wanted to set message.timestamp.type on a topic, and 
> didn't see that it was supported, but grepped through the code and it looked 
> like it was.
> The code to auto-generate the docs is quite close, but needs some additional 
> work. In particular, topic-level configuration is different from all the 
> other ConfigDefs in that topic-level configuration docs list the broker-level 
> config that they inherit from. We would need to have a way to show what 
> broker-level config applies to each topic-level config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Parallelisation factor in kafka streams

2016-07-06 Thread Matthias J. Sax
Hi Jeyhun,

the number of partitions determine the number of tasks within a Kafka
Streams application and thus, the maximum number of parallelism for your
application.

For more details see
http://docs.confluent.io/3.0.0/streams/architecture.html#parallelism-model

You can set the number of threads for a single application instance, via
parameter "num.stream.threads" (default value is 1).

See
http://docs.confluent.io/3.0.0/streams/developer-guide.html#optional-configuration-parameters


-Matthias

On 07/06/2016 06:11 PM, Jeyhun Karimov wrote:
> Hi community,
> 
> How can I set parallelisation factor in kafka streams? Is it related with
> the number of partitions within topics?
> 
> 
> Cheers
> Jeyhun
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Work started] (KAFKA-3931) kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure

2016-07-06 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on KAFKA-3931 started by Vahid Hashemian.
--
> kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure
> ---
>
> Key: KAFKA-3931
> URL: https://issues.apache.org/jira/browse/KAFKA-3931
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> Some of the recent builds are failing this test 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4579/testReport/kafka.api/PlaintextConsumerTest/testPatternUnsubscription/]).
> Some other are failing on 
> kafka.api.PlaintextConsumerTest.testPatternSubscription 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4565/testReport/junit/kafka.api/PlaintextConsumerTest/testPatternSubscription/])
> These failures seem to have started after [this 
> commit|https://github.com/apache/kafka/commit/d7de59a579af5ba4ecb1aec8fed84054f8b86443].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #736

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3802; log mtimes reset on broker restart / shutdown

--
[...truncated 4384 lines...]

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue STARTED

kafka.utils.ByteBoundedBlockingQueueTest > testByteBoundedBlockingQueue PASSED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr STARTED

kafka.utils.ReplicationUtilsTest > testUpdateLeaderAndIsr PASSED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition STARTED

kafka.utils.ReplicationUtilsTest > testGetLeaderIsrAndEpochForPartition PASSED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask STARTED

kafka.utils.timer.TimerTest > testAlreadyExpiredTask PASSED

kafka.utils.timer.TimerTest > testTaskExpiration STARTED

kafka.utils.timer.TimerTest > testTaskExpiration PASSED

kafka.utils.timer.TimerTaskListTest > testAll STARTED

kafka.utils.timer.TimerTaskListTest > testAll PASSED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask STARTED

kafka.utils.SchedulerTest > testMockSchedulerPeriodicTask PASSED

kafka.utils.SchedulerTest > testNonPeriodicTask STARTED

kafka.utils.SchedulerTest > testNonPeriodicTask PASSED

kafka.utils.SchedulerTest > testRestart STARTED

kafka.utils.SchedulerTest > testRestart PASSED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler STARTED

kafka.utils.SchedulerTest > testReentrantTaskInMockScheduler PASSED

kafka.utils.SchedulerTest > testPeriodicTask STARTED

kafka.utils.SchedulerTest > testPeriodicTask PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArg PASSED

kafka.utils.CommandLineUtilsTest > testParseSingleArg STARTED

kafka.utils.CommandLineUtilsTest > testParseSingleArg PASSED

kafka.utils.CommandLineUtilsTest > testParseArgs STARTED

kafka.utils.CommandLineUtilsTest > testParseArgs PASSED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid STARTED

kafka.utils.CommandLineUtilsTest > testParseEmptyArgAsValid PASSED

kafka.utils.UtilsTest > testAbs STARTED

kafka.utils.UtilsTest > testAbs PASSED

kafka.utils.UtilsTest > testReplaceSuffix STARTED

kafka.utils.UtilsTest > testReplaceSuffix PASSED

kafka.utils.UtilsTest > testCircularIterator STARTED

kafka.utils.UtilsTest > testCircularIterator PASSED

kafka.utils.UtilsTest > testReadBytes STARTED

kafka.utils.UtilsTest > testReadBytes PASSED

kafka.utils.UtilsTest > testCsvList STARTED

kafka.utils.UtilsTest > testCsvList PASSED

kafka.utils.UtilsTest > testReadInt STARTED

kafka.utils.UtilsTest > testReadInt PASSED

kafka.utils.UtilsTest > testCsvMap STARTED

kafka.utils.UtilsTest > testCsvMap PASSED

kafka.utils.UtilsTest > testInLock STARTED

kafka.utils.UtilsTest > testInLock PASSED

kafka.utils.UtilsTest > testSwallow STARTED

kafka.utils.UtilsTest > testSwallow PASSED

kafka.utils.JsonTest > testJsonEncoding STARTED

kafka.utils.JsonTest > testJsonEncoding PASSED

kafka.utils.IteratorTemplateTest > testIterator STARTED

kafka.utils.IteratorTemplateTest > testIterator PASSED

kafka.metrics.KafkaTimerTest > testKafkaTimer STARTED

kafka.metrics.KafkaTimerTest > testKafkaTimer PASSED

kafka.metrics.MetricsTest > testMetricsReporterAfterDeletingTopic STARTED

kafka.metrics.MetricsTest > testMetricsReporterAfterDeletingTopic PASSED

kafka.metrics.MetricsTest > 
testBrokerTopicMetricsUnregisteredAfterDeletingTopic STARTED

kafka.metrics.MetricsTest > 
testBrokerTopicMetricsUnregisteredAfterDeletingTopic PASSED

kafka.metrics.MetricsTest > testMetricsLeak STARTED

kafka.metrics.MetricsTest > testMetricsLeak PASSED

kafka.consumer.TopicFilterTest > testWhitelists STARTED

kafka.consumer.TopicFilterTest > testWhitelists PASSED

kafka.consumer.TopicFilterTest > 
testWildcardTopicCountGetTopicCountMapEscapeJson STARTED

kafka.consumer.TopicFilterTest > 
testWildcardTopicCountGetTopicCountMapEscapeJson PASSED

kafka.consumer.TopicFilterTest > testBlacklists STARTED

kafka.consumer.TopicFilterTest > testBlacklists PASSED

kafka.consumer.PartitionAssignorTest > testRoundRobinPartitionAssignor STARTED

kafka.consumer.PartitionAssignorTest > testRoundRobinPartitionAssignor PASSED

kafka.consumer.PartitionAssignorTest > testRangePartitionAssignor STARTED

kafka.consumer.PartitionAssignorTest > testRangePartitionAssignor PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testBasic STARTED

kafka.consumer.ZookeeperConsumerConnectorTest > testBasic PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testCompressionSetConsumption 
STARTED

kafka.consumer.ZookeeperConsumerConnectorTest > testCompressionSetConsumption 
PASSED

kafka.consumer.ZookeeperConsumerConnectorTest > testLeaderSelectionForPartition 
STARTED

kafka.consumer.ZookeeperConsumerConnectorTest > testLeaderSe

[jira] [Created] (KAFKA-3931) kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure

2016-07-06 Thread Vahid Hashemian (JIRA)
Vahid Hashemian created KAFKA-3931:
--

 Summary: kafka.api.PlaintextConsumerTest.testPatternUnsubscription 
transient failure
 Key: KAFKA-3931
 URL: https://issues.apache.org/jira/browse/KAFKA-3931
 Project: Kafka
  Issue Type: Bug
Reporter: Vahid Hashemian
Assignee: Vahid Hashemian


Some of the recent builds are failing this test 
([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4579/testReport/kafka.api/PlaintextConsumerTest/testPatternUnsubscription/]).

Some other are failing on 
kafka.api.PlaintextConsumerTest.testPatternSubscription 
([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4565/testReport/junit/kafka.api/PlaintextConsumerTest/testPatternSubscription/])

These failures seem to have started after [this 
commit|https://github.com/apache/kafka/commit/d7de59a579af5ba4ecb1aec8fed84054f8b86443].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3931) kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure

2016-07-06 Thread Vahid Hashemian (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vahid Hashemian updated KAFKA-3931:
---
Issue Type: Sub-task  (was: Bug)
Parent: KAFKA-2054

> kafka.api.PlaintextConsumerTest.testPatternUnsubscription transient failure
> ---
>
> Key: KAFKA-3931
> URL: https://issues.apache.org/jira/browse/KAFKA-3931
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Vahid Hashemian
>Assignee: Vahid Hashemian
>
> Some of the recent builds are failing this test 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4579/testReport/kafka.api/PlaintextConsumerTest/testPatternUnsubscription/]).
> Some other are failing on 
> kafka.api.PlaintextConsumerTest.testPatternSubscription 
> ([example|https://builds.apache.org/job/kafka-trunk-git-pr-jdk7/4565/testReport/junit/kafka.api/PlaintextConsumerTest/testPatternSubscription/])
> These failures seem to have started after [this 
> commit|https://github.com/apache/kafka/commit/d7de59a579af5ba4ecb1aec8fed84054f8b86443].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1405

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3802; log mtimes reset on broker restart / shutdown

--
[...truncated 3435 lines...]

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] STARTED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic PASSED

kafka.log.LogTest > testIndexRebuild STARTED

kafka.log.LogTest > testIndexRebuild PASSED

kafka.log.LogTest > testLogRolls STARTED

kafka.log.LogTest > testLogRolls PASSED

kafka.log.LogTest > testMessageSizeCheck STARTED

kafka.log.LogTest > testMessageSizeCheck PASSED

kafka.log.LogTest > testAsyncDelete STARTED

kafka.log.LogTest > testAsyncDelete PASSED

kafka.log.LogTest > testReadOutOfRange STARTED

kafka.log.LogTest > testReadOutOfRange PASSED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException STARTED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException PASSED

kafka.log.LogTest > testReadAtLogGap STARTED

kafka.log.LogTest > testReadAtLogGap PASSED

kafka.log.LogTest > testTimeBasedLogRoll STARTED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testLoadEmptyLog STARTED

kafka.log.LogTest > testLoadEmptyLog PASSED

kafka.log.LogTest > testMessageSetSizeCheck STARTED

kafka.log.LogTest > testMessageSetSizeCheck PASSED

kafka.log.LogTest > testIndexResizingAtTruncation STARTED

kafka.log.LogTest > testIndexResizingAtTruncation PASSED

kafka.log.LogTest > testCompactedTopicConstraints STARTED

kafka.log.LogTest > testCompactedTopicConstraints PASSED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset STARTED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets STARTED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets PASSED

kafka.log.LogTest > testDeleteOldSegmentsMethod STARTED

kafka.log.LogTest > testDeleteOldSegmentsMethod PASSED

kafka.log.LogTest > testParseTopicPartitionNameForNull STARTED

kafka.log.LogTest > testParseTopicPartitionNameForNull PASSED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets STARTED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator PASSED

kafka.log.LogTest > testCorruptIndexRebuild STARTED

kafka.log.LogTest > testCorruptIndexRebuild PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved STARTED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages STARTED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload STARTED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog STARTED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset STARTED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate STARTED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition STARTED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName STARTED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles STARTED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testSizeBasedLogRoll STARTED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter STARTED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName STARTED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo STARTED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile STARTED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogConfigTest > testFromPropsEmpty STARTED

kafka.log.LogConfigTest > testFromPropsEmpty PASSED

kafka.log.LogConfigTest > testKafkaConfigToProps STARTED

kafka.log.LogConfigTest > testKafkaConfigToProps PASSED

kafka.log.LogConfigTest > testFromPropsInvalid STARTED

kafka.log.LogConfigTest > testFromPropsInvalid PASSED

kafka.log.CleanerTest > testBuildOffsetMap STARTED

kafka.log.CleanerTest > testBuildOffsetMap PASSED

kafka.log.CleanerTest > testBuildOffset

Parallelisation factor in kafka streams

2016-07-06 Thread Jeyhun Karimov
Hi community,

How can I set parallelisation factor in kafka streams? Is it related with
the number of partitions within topics?


Cheers
Jeyhun
-- 
-Cheers

Jeyhun


Build failed in Jenkins: kafka-0.10.0-jdk7 #143

2016-07-06 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3802; log mtimes reset on broker restart / shutdown

--
[...truncated 4025 lines...]

kafka.log.BrokerCompressionTest > testBrokerSideCompression[8] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[9] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[10] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[11] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[12] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[13] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[14] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[15] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[16] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[17] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[18] PASSED

kafka.log.BrokerCompressionTest > testBrokerSideCompression[19] PASSED

kafka.log.OffsetMapTest > testClear PASSED

kafka.log.OffsetMapTest > testBasicValidation PASSED

kafka.log.LogManagerTest > testCleanupSegmentsToMaintainSize PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithRelativeDirectory 
PASSED

kafka.log.LogManagerTest > testGetNonExistentLog PASSED

kafka.log.LogManagerTest > testTwoLogManagersUsingSameDirFails PASSED

kafka.log.LogManagerTest > testLeastLoadedAssignment PASSED

kafka.log.LogManagerTest > testCleanupExpiredSegments PASSED

kafka.log.LogManagerTest > testCheckpointRecoveryPoints PASSED

kafka.log.LogManagerTest > testTimeBasedFlush PASSED

kafka.log.LogManagerTest > testCreateLog PASSED

kafka.log.LogManagerTest > testRecoveryDirectoryMappingWithTrailingSlash PASSED

kafka.log.LogSegmentTest > testRecoveryWithCorruptMessage PASSED

kafka.log.LogSegmentTest > testRecoveryFixesCorruptIndex PASSED

kafka.log.LogSegmentTest > testReadFromGap PASSED

kafka.log.LogSegmentTest > testTruncate PASSED

kafka.log.LogSegmentTest > testReadBeforeFirstOffset PASSED

kafka.log.LogSegmentTest > testCreateWithInitFileSizeAppendMessage PASSED

kafka.log.LogSegmentTest > testChangeFileSuffixes PASSED

kafka.log.LogSegmentTest > testMaxOffset PASSED

kafka.log.LogSegmentTest > testNextOffsetCalculation PASSED

kafka.log.LogSegmentTest > testReadOnEmptySegment PASSED

kafka.log.LogSegmentTest > testReadAfterLast PASSED

kafka.log.LogSegmentTest > testCreateWithInitFileSizeClearShutdown PASSED

kafka.log.LogSegmentTest > testTruncateFull PASSED

kafka.log.FileMessageSetTest > testTruncate PASSED

kafka.log.FileMessageSetTest > testIterationOverPartialAndTruncation PASSED

kafka.log.FileMessageSetTest > testRead PASSED

kafka.log.FileMessageSetTest > 
testTruncateNotCalledIfSizeIsBiggerThanTargetSize PASSED

kafka.log.FileMessageSetTest > testFileSize PASSED

kafka.log.FileMessageSetTest > testIteratorWithLimits PASSED

kafka.log.FileMessageSetTest > testTruncateNotCalledIfSizeIsSameAsTargetSize 
PASSED

kafka.log.FileMessageSetTest > testPreallocateTrue PASSED

kafka.log.FileMessageSetTest > testIteratorIsConsistent PASSED

kafka.log.FileMessageSetTest > testTruncateIfSizeIsDifferentToTargetSize PASSED

kafka.log.FileMessageSetTest > testFormatConversionWithPartialMessage PASSED

kafka.log.FileMessageSetTest > testIterationDoesntChangePosition PASSED

kafka.log.FileMessageSetTest > testWrittenEqualsRead PASSED

kafka.log.FileMessageSetTest > testWriteTo PASSED

kafka.log.FileMessageSetTest > testPreallocateFalse PASSED

kafka.log.FileMessageSetTest > testPreallocateClearShutdown PASSED

kafka.log.FileMessageSetTest > testMessageFormatConversion PASSED

kafka.log.FileMessageSetTest > testSearch PASSED

kafka.log.FileMessageSetTest > testSizeInBytes PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingTopic PASSED

kafka.log.LogTest > testIndexRebuild PASSED

kafka.log.LogTest > testLogRolls PASSED

kafka.log.LogTest > testMessageSizeCheck PASSED

kafka.log.LogTest > testAsyncDelete PASSED

kafka.log.LogTest > testReadOutOfRange PASSED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException PASSED

kafka.log.LogTest > testReadAtLogGap PASSED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testLoadEmptyLog PASSED

kafka.log.LogTest > testMessageSetSizeCheck PASSED

kafka.log.LogTest > testIndexResizingAtTruncation PASSED

kafka.log.LogTest > testCompactedTopicConstraints PASSED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets PASSED

kafka.log.LogTest > testDeleteOldSegmentsMethod PASSED

kafka.log.LogTest > testParseTopicP

[GitHub] kafka-site issue #16: Update design.html

2016-07-06 Thread ijuma
Github user ijuma commented on the issue:

https://github.com/apache/kafka-site/pull/16
  
Thank you, can you close this PR then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka-site issue #16: Update design.html

2016-07-06 Thread nihed
Github user nihed commented on the issue:

https://github.com/apache/kafka-site/pull/16
  
Done, 
https://github.com/apache/kafka/pull/1592


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1592: Update design.html

2016-07-06 Thread nihed
GitHub user nihed opened a pull request:

https://github.com/apache/kafka/pull/1592

Update design.html

Since 0.9.0.1 Configuration parameter log.cleaner.enable is now true by 
default.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nihed/kafka patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1592.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1592


commit 673468f9791ce03e306898fe9b0145691b5fa65a
Author: Nihed MBAREK 
Date:   2016-07-06T15:49:50Z

Update design.html

Since 0.9.0.1 Configuration parameter log.cleaner.enable is now true by 
default.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-60: Make Java client class loading more flexible

2016-07-06 Thread Rajini Sivaram
Hi all,

The vote has passed with 3 binding +1s and 1 non-binding +1. Many thanks to
those who voted. If you do have any more comments or suggestions, please do
add them.

I will update the KIP page and rebase the PR.


On Tue, Jul 5, 2016 at 12:03 AM, Harsha Chintalapani 
wrote:

> +1 (binding)
> Harsha
>
> On Mon, Jul 4, 2016 at 8:28 AM Ismael Juma  wrote:
>
> > +1 (binding)
> >
> > Ismael
> >
> > On Fri, Jul 1, 2016 at 12:27 PM, Rajini Sivaram <
> > rajinisiva...@googlemail.com> wrote:
> >
> > > I would like to initiate the voting process for KIP-60 (
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-60+-+Make+Java+client+classloading+more+flexible
> > > ).
> > > This is a simple set of changes that are fully compatible with the
> > existing
> > > class loading in Kafka and enables Java clients to be run in
> > > multi-classloader environments including OSGi.
> > >
> > > Thank you...
> > >
> > >
> > > Regards,
> > >
> > > Rajini
> > >
> >
>



-- 
Regards,

Rajini


[jira] [Updated] (KAFKA-3802) log mtimes reset on broker restart

2016-07-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3802:
---
Assignee: Moritz Siuts

> log mtimes reset on broker restart
> --
>
> Key: KAFKA-3802
> URL: https://issues.apache.org/jira/browse/KAFKA-3802
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Otto
>Assignee: Moritz Siuts
> Fix For: 0.10.0.1
>
>
> Folks over in 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAO8=cz0ragjad1acx4geqcwj+rkd1gmdavkjwytwthkszfg...@mail.gmail.com%3E
>  are commenting about this issue.
> In 0.9, any data log file that was on
> disk before the broker has it's mtime modified to the time of the broker
> restart.
> This causes problems with log retention, as all the files then look like
> they contain recent data to kafka.  We use the default log retention of 7
> days, but if all the files are touched at the same time, this can cause us
> to retain up to 2 weeks of log data, which can fill up our disks.
> This happens *most* of the time, but seemingly not all.  We have seen broker 
> restarts where mtimes were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3802) log mtimes reset on broker restart

2016-07-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3802:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> log mtimes reset on broker restart
> --
>
> Key: KAFKA-3802
> URL: https://issues.apache.org/jira/browse/KAFKA-3802
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Otto
> Fix For: 0.10.0.1
>
>
> Folks over in 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAO8=cz0ragjad1acx4geqcwj+rkd1gmdavkjwytwthkszfg...@mail.gmail.com%3E
>  are commenting about this issue.
> In 0.9, any data log file that was on
> disk before the broker has it's mtime modified to the time of the broker
> restart.
> This causes problems with log retention, as all the files then look like
> they contain recent data to kafka.  We use the default log retention of 7
> days, but if all the files are touched at the same time, this can cause us
> to retain up to 2 weeks of log data, which can fill up our disks.
> This happens *most* of the time, but seemingly not all.  We have seen broker 
> restarts where mtimes were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3802) log mtimes reset on broker restart

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364439#comment-15364439
 ] 

ASF GitHub Bot commented on KAFKA-3802:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1497


> log mtimes reset on broker restart
> --
>
> Key: KAFKA-3802
> URL: https://issues.apache.org/jira/browse/KAFKA-3802
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
>Reporter: Andrew Otto
> Fix For: 0.10.0.1
>
>
> Folks over in 
> http://mail-archives.apache.org/mod_mbox/kafka-users/201605.mbox/%3CCAO8=cz0ragjad1acx4geqcwj+rkd1gmdavkjwytwthkszfg...@mail.gmail.com%3E
>  are commenting about this issue.
> In 0.9, any data log file that was on
> disk before the broker has it's mtime modified to the time of the broker
> restart.
> This causes problems with log retention, as all the files then look like
> they contain recent data to kafka.  We use the default log retention of 7
> days, but if all the files are touched at the same time, this can cause us
> to retain up to 2 weeks of log data, which can fill up our disks.
> This happens *most* of the time, but seemingly not all.  We have seen broker 
> restarts where mtimes were not changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1497: KAFKA-3802 log mtimes reset on broker restart / sh...

2016-07-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1497


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka-site issue #16: Update design.html

2016-07-06 Thread ijuma
Github user ijuma commented on the issue:

https://github.com/apache/kafka-site/pull/16
  
Thanks for the PR. The file you edited is just a copy from the following 
file:

https://github.com/apache/kafka/blob/trunk/docs/design.html

Can you please file a PR against the `kafka` repo instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Kafka - offset preservation

2016-07-06 Thread Pawel Huszcza
Hello,

I tried every different property I can think of - I have set 
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG = true; 
ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG = 1000, but I have also tried 
with many different values
And still every time I start my consumer - I am receiving all the messages ever 
sent to the given topic - it is every time the same consumer group; 
ConsumerConfig.AUTO_OFFSET_RESET_CONFIG = "latest" but I also tried with 
"earliest" (just for luck). What is the proper configuration to get it to work 
in such a way, that every time I start my consumer - it will process only the 
new messages not all of them?

Kind Regards,
Pawel Jan Huszcza


[GitHub] kafka-site pull request #16: Update design.html

2016-07-06 Thread nihed
GitHub user nihed opened a pull request:

https://github.com/apache/kafka-site/pull/16

Update design.html

Hi,

Since 0.9.0.1 Configuration parameter log.cleaner.enable is true by default.
so I updated the documentation. 

Regards, 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nihed/kafka-site asf-site

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/16.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16


commit c288df08da9bbec5c364a800b4746b993e9a3cf2
Author: Nihed MBAREK 
Date:   2016-07-06T13:50:02Z

Update design.html

since 0.9.0.1 Configuration parameter log.cleaner.enable is true by default.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: NPE using GenericAvroSerde to deserialize

2016-07-06 Thread Philippe Derome
This item is moved to confluent google group.

On Wed, Jul 6, 2016 at 6:25 AM, Philippe Derome  wrote:

> Please ignore until I become quite more specific about my code usage (will
> try to recover the same code I used). In the mean time, I managed to
> serialize into output topic using a JsonSerde, which was straightforward.
>
> On Wed, Jul 6, 2016 at 5:26 AM, Michael Noll  wrote:
>
>> Phil,
>>
>> > I then specify a Serde (new GenericAvroSerde) as value
>> > deserializer when outputting to topic via table.to method.
>>
>> I suppose that was a typo, and you actually meant "as a value
>> *serializer*", right?
>>
>>
>>
>> On Tue, Jul 5, 2016 at 11:55 PM, Philippe Derome 
>> wrote:
>>
>> > This is possibly more of a Confluent question as GenericAvroSerde is in
>> > Confluent example code base.
>> >
>> > I have a small Stream application, which creates a KTable with String
>> key
>> > and GenericRecord value.
>> >
>> > I then specify a Serde (new GenericAvroSerde) as value
>> > deserializer when outputting to topic via table.to method.
>> >
>> > I get a NPE on deserializing with this.schemaRegistry being null within
>> > AbstractKafkaAvroSerializer#serializeImpl.
>> >
>> > Could it be simply that GenericRecords are more of an intermediate class
>> > and are not meant to be serialised?
>> >
>> > I'd like to stream the values on topic as GenericRecord. I thought it
>> > should work. Alternatively, guidance on using SpecificAvroSerde would be
>> > very helpful.
>> >
>>
>>
>>
>> --
>> Best regards,
>> Michael Noll
>>
>>
>>
>> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
>> Apache Kafka and Confluent Platform: www.confluent.io/download
>> *
>>
>
>


[jira] [Commented] (KAFKA-3910) Cyclic schema support in ConnectSchema and SchemaBuilder

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364188#comment-15364188
 ] 

ASF GitHub Bot commented on KAFKA-3910:
---

GitHub user johnhofman reopened a pull request:

https://github.com/apache/kafka/pull/1582

KAFKA-3910: Cyclic schema support in ConnectSchema and SchemaBuilder

This feature uses a FutureSchema as a placeholder to be resolved later. 
Resolution is attempted whenever a ConnectSchema is constructed, it attempts to 
resolve all its children (fields, keySchema, or valueSchema) and recurses until 
the end of the tree. 

A FutureSchema is resolved when it finds a parent schema that matches its 
name, and optional flag. If a FutureSchema is accessed before being resolved, 
it will throw a DataException.

The SchemaBuilder constructs a FutureSchema if a field is added with only a 
type name.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/johnhofman/kafka cyclic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1582.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1582


commit d932db4544cc2d20a46bf873fbd92f2c09450276
Author: John Hofman 
Date:   2016-06-30T20:04:19Z

Add FutureSchema to support cyclic schemas

commit 7d95c487d4a5bf3cb751deb98425e5123abb461b
Author: John Hofman 
Date:   2016-07-04T07:32:47Z

Fix resolution failure test

commit 09f1b47c238ff10f58681e44816f2ba39ed95166
Author: John Hofman 
Date:   2016-07-04T09:53:58Z

Move cyclic comparison resolution to FutureSchema

commit c1c632b51f80d81c29cb66a35c5aed867ad869e7
Author: John Hofman 
Date:   2016-07-04T11:26:35Z

Clean up unused tokens, minor spelling fixes




> Cyclic schema support in ConnectSchema and SchemaBuilder
> 
>
> Key: KAFKA-3910
> URL: https://issues.apache.org/jira/browse/KAFKA-3910
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: John Hofman
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
>
> Cyclic schema's are not supported by ConnectSchema or SchemaBuilder. 
> Subsequently the AvroConverter (confluentinc/schema-registry) hits a stack 
> overflow when converting a cyclic avro schema, e.g:
> {code}
> {"type":"record", 
> "name":"list","fields":[{"name":"value","type":"int"},{"name":"next","type":["null","list"]}]}
> {code}
> This is a blocking issue for all connectors running on the connect framework 
> with data containing cyclic references. The AvroConverter cannot support 
> cyclic schema's until the underlying ConnectSchema and SchemaBuilder do.
> To reproduce the stack-overflow (Confluent-3.0.0):
> Produce some cyclic data:
> {code}
> bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test 
> --property value.schema='{"type":"record", 
> "name":"list","fields":[{"name":"value","type":"int"},{"name":"next","type":["null","list"]}]}'
> {"value":1,"next":null} 
> {"value":1,"next":{"list":{"value":2,"next":null}}}
> {code}
> Then try to consume it with connect:
> {code:title=connect-console-sink.properties}
> name=local-console-sink 
> connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector 
> tasks.max=1 
> topics=test
> {code}
> {code}
> ./bin/connect-standalone 
> ./etc/schema-registry/connect-avro-standalone.properties 
> connect-console-sink.properties  
> … start up logging … 
> java.lang.StackOverflowError 
>  at org.apache.avro.JsonProperties.getJsonProp(JsonProperties.java:54) 
>  at org.apache.avro.JsonProperties.getProp(JsonProperties.java:45) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1055) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1103) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1137) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1103) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1137)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3910) Cyclic schema support in ConnectSchema and SchemaBuilder

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364187#comment-15364187
 ] 

ASF GitHub Bot commented on KAFKA-3910:
---

Github user johnhofman closed the pull request at:

https://github.com/apache/kafka/pull/1582


> Cyclic schema support in ConnectSchema and SchemaBuilder
> 
>
> Key: KAFKA-3910
> URL: https://issues.apache.org/jira/browse/KAFKA-3910
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.0.0
>Reporter: John Hofman
>Assignee: Ewen Cheslack-Postava
>Priority: Blocker
>
> Cyclic schema's are not supported by ConnectSchema or SchemaBuilder. 
> Subsequently the AvroConverter (confluentinc/schema-registry) hits a stack 
> overflow when converting a cyclic avro schema, e.g:
> {code}
> {"type":"record", 
> "name":"list","fields":[{"name":"value","type":"int"},{"name":"next","type":["null","list"]}]}
> {code}
> This is a blocking issue for all connectors running on the connect framework 
> with data containing cyclic references. The AvroConverter cannot support 
> cyclic schema's until the underlying ConnectSchema and SchemaBuilder do.
> To reproduce the stack-overflow (Confluent-3.0.0):
> Produce some cyclic data:
> {code}
> bin/kafka-avro-console-producer --broker-list localhost:9092 --topic test 
> --property value.schema='{"type":"record", 
> "name":"list","fields":[{"name":"value","type":"int"},{"name":"next","type":["null","list"]}]}'
> {"value":1,"next":null} 
> {"value":1,"next":{"list":{"value":2,"next":null}}}
> {code}
> Then try to consume it with connect:
> {code:title=connect-console-sink.properties}
> name=local-console-sink 
> connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector 
> tasks.max=1 
> topics=test
> {code}
> {code}
> ./bin/connect-standalone 
> ./etc/schema-registry/connect-avro-standalone.properties 
> connect-console-sink.properties  
> … start up logging … 
> java.lang.StackOverflowError 
>  at org.apache.avro.JsonProperties.getJsonProp(JsonProperties.java:54) 
>  at org.apache.avro.JsonProperties.getProp(JsonProperties.java:45) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1055) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1103) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1137) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1103) 
>  at io.confluent.connect.avro.AvroData.toConnectSchema(AvroData.java:1137)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1582: KAFKA-3910: Cyclic schema support in ConnectSchema...

2016-07-06 Thread johnhofman
GitHub user johnhofman reopened a pull request:

https://github.com/apache/kafka/pull/1582

KAFKA-3910: Cyclic schema support in ConnectSchema and SchemaBuilder

This feature uses a FutureSchema as a placeholder to be resolved later. 
Resolution is attempted whenever a ConnectSchema is constructed, it attempts to 
resolve all its children (fields, keySchema, or valueSchema) and recurses until 
the end of the tree. 

A FutureSchema is resolved when it finds a parent schema that matches its 
name, and optional flag. If a FutureSchema is accessed before being resolved, 
it will throw a DataException.

The SchemaBuilder constructs a FutureSchema if a field is added with only a 
type name.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/johnhofman/kafka cyclic

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1582.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1582


commit d932db4544cc2d20a46bf873fbd92f2c09450276
Author: John Hofman 
Date:   2016-06-30T20:04:19Z

Add FutureSchema to support cyclic schemas

commit 7d95c487d4a5bf3cb751deb98425e5123abb461b
Author: John Hofman 
Date:   2016-07-04T07:32:47Z

Fix resolution failure test

commit 09f1b47c238ff10f58681e44816f2ba39ed95166
Author: John Hofman 
Date:   2016-07-04T09:53:58Z

Move cyclic comparison resolution to FutureSchema

commit c1c632b51f80d81c29cb66a35c5aed867ad869e7
Author: John Hofman 
Date:   2016-07-04T11:26:35Z

Clean up unused tokens, minor spelling fixes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1582: KAFKA-3910: Cyclic schema support in ConnectSchema...

2016-07-06 Thread johnhofman
Github user johnhofman closed the pull request at:

https://github.com/apache/kafka/pull/1582


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-07-06 Thread Andy Coates (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364141#comment-15364141
 ] 

Andy Coates commented on KAFKA-3919:


[~junrao] My understanding was that the offset index looks at the offset of the 
first record in the compressed set, not the last.  Having checked the code this 
does seem to be the case. (Code below from LogSegment.scala, recover()):

{code}
val startOffset =
entry.message.compressionCodec match {
  case NoCompressionCodec =>
entry.offset
  case _ =>
ByteBufferMessageSet.deepIterator(entry.message).next().offset
  }
{code}

> Broker faills to start after ungraceful shutdown due to non-monotonically 
> incrementing offsets in logs
> --
>
> Key: KAFKA-3919
> URL: https://issues.apache.org/jira/browse/KAFKA-3919
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.9.0.1
>Reporter: Andy Coates
>
> Hi All,
> I encountered an issue with Kafka following a power outage that saw a 
> proportion of our cluster disappear. When the power came back on several 
> brokers halted on start up with the error:
> {noformat}
>   Fatal error during KafkaServerStartable startup. Prepare to shutdown”
>   kafka.common.InvalidOffsetException: Attempt to append an offset 
> (1239742691) to position 35728 no larger than the last offset appended 
> (1239742822) to 
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>   at 
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
>   at kafka.log.LogSegment.recover(LogSegment.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
>   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at kafka.log.Log.loadSegments(Log.scala:160)
>   at kafka.log.Log.(Log.scala:90)
>   at 
> kafka.log.LogManager$$anonfun$loadLogs$2$$anonfun$3$$anonfun$apply$10$$anonfun$apply$1.apply$mcV$sp(LogManager.scala:150)
>   at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:60)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The only way to recover the brokers was to delete the log files that 
> contained non monotonically incrementing offsets.
> We've spent some time digging through the logs and I feel I may have worked 
> out the sequence of events leading to this issue, (though this is based on 
> some assumptions I've made about the way Kafka is working, which may be 
> wrong).
> First off, we have unclean leadership elections disable. (We did later enable 
> them to help get around some other issues we were having, but this was 
> several hours after this issue manifested), and we're producing to the topic 
> with gzip compression and acks=1
> We looked through the data logs that were causing the brokers to not start. 
> What we found the initial part of the log has monotonically increasing 
> offset, where each compressed batch normally contained one or two records. 
> Then the is a batch that contains many records, whose first records have an 
> offset below the previous batch and whose last record has an offset above the 
> previous batch. Following on from this there continues a period of large 
> batches, with monotonically increasing offsets, and then the log returns to 
> batches with one or two records.
> Our working assumption here is that the period before the offset dip, with 
> the small batches, is pre-outage normal operation. The period of larger 
> batches is from just after the outage, where producers have a back log to 
> processes when the partition becomes available, and then things return to 
> normal batch sizes again once the back log clears.
> We

Re: KStream: KTable-KTable leftJoin with key only on RHS of join generates null in joined table

2016-07-06 Thread Philippe Derome
thanks, I understand that it's not modelled the same way as database joins.

Let's take an example of inner join of very small population set (NBA
rookies or US senators) with larger table (data on zip codes). Let's assume
we want to identify the crime rate of zip codes where current senators live
or the median income of zip codes where NBA rookies currently live. These
small elite population samples will likely never live in the 50% poorest
zip codes in US (although exceptionally some might) and NBA rookies will
not live far from their team home base (Maine, Alaska, Hawaii, North
Dakota) so many zip codes will not match and are expected to never match.
So, I don't see that the keys representing such zip codes will become
eventually consistent.

One can imagine an application that makes case of census data (with many
zip codes) and interested in many such statistics for several such small
"elite" populations and then the irrelevant zip codes with null records
find their ways multiple times in the data pipeline.

I cannot think of a more egregious example where one table has billions of
keys and the other only a handful that would match but I'd assume that such
use cases could be natural.

It seems to me that the null keys should be output to represent a record
deletion in the resulting table, but not a near miss on data selection.

On Tue, Jul 5, 2016 at 12:44 AM, Guozhang Wang  wrote:

> Hello,
>
> The KTable join semantics is not exactly the same with that of a RDBMS. You
> can fine detailed semantics in the web docs (search for Joining Streams):
>
>
> http://docs.confluent.io/3.0.0/streams/developer-guide.html#kafka-streams-dsl
>
> In a nutshell, the joiner will be triggered only if both / left / either of
> the joining streams has the matching record with the key of the incoming
> received record (so the input values of the joiner could not be null / can
> be null for only the other value / can be null on either values, but not
> both), and otherwise a pair of {join-key, null} is output. We made this
> design deliberately just to make sure that "table-table joins are
> eventually consistent". This gives a kind of resilience to late arrival of
> records that a late arrival in either stream can "update" the join result.
>
>
> Guozhang
>
> On Mon, Jul 4, 2016 at 6:10 PM, Philippe Derome 
> wrote:
>
> > Same happens for regular join, keys that appear only in one stream will
> > make it to output KTable tC with a null for either input stream. I guess
> > it's related to Kafka-3911 Enforce ktable Materialization or umbrella
> JIRA
> > 3909, Queryable state for Kafka Streams?
> >
> > On Mon, Jul 4, 2016 at 8:45 PM, Philippe Derome 
> > wrote:
> >
> > > If we have two streams A and B for which we associate tables tA and tB,
> > > then create a table tC as ta.leftJoin(tB, ) and then
> > we
> > > have a key kB in stream B but never made it to tA nor tC, do we need to
> > > inject a pair (k,v) of (kB, null) into resulting change log for tC ?
> > >
> > > It sounds like it is definitely necessary if key kB is present in table
> > tC
> > > but if not, why add it?
> > >
> > > I have an example that reproduces this and would like to know if it is
> > > considered normal, sub-optimal, or a defect. I don't view it as normal
> > for
> > > time being, particularly considering stream A as having very few keys
> > and B
> > > as having many, which could lead to an unnecessary large change log for
> > C.
> > >
> > > Phil
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: NPE using GenericAvroSerde to deserialize

2016-07-06 Thread Philippe Derome
Please ignore until I become quite more specific about my code usage (will
try to recover the same code I used). In the mean time, I managed to
serialize into output topic using a JsonSerde, which was straightforward.

On Wed, Jul 6, 2016 at 5:26 AM, Michael Noll  wrote:

> Phil,
>
> > I then specify a Serde (new GenericAvroSerde) as value
> > deserializer when outputting to topic via table.to method.
>
> I suppose that was a typo, and you actually meant "as a value
> *serializer*", right?
>
>
>
> On Tue, Jul 5, 2016 at 11:55 PM, Philippe Derome 
> wrote:
>
> > This is possibly more of a Confluent question as GenericAvroSerde is in
> > Confluent example code base.
> >
> > I have a small Stream application, which creates a KTable with String key
> > and GenericRecord value.
> >
> > I then specify a Serde (new GenericAvroSerde) as value
> > deserializer when outputting to topic via table.to method.
> >
> > I get a NPE on deserializing with this.schemaRegistry being null within
> > AbstractKafkaAvroSerializer#serializeImpl.
> >
> > Could it be simply that GenericRecords are more of an intermediate class
> > and are not meant to be serialised?
> >
> > I'd like to stream the values on topic as GenericRecord. I thought it
> > should work. Alternatively, guidance on using SpecificAvroSerde would be
> > very helpful.
> >
>
>
>
> --
> Best regards,
> Michael Noll
>
>
>
> *Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
> Apache Kafka and Confluent Platform: www.confluent.io/download
> *
>


[jira] [Commented] (KAFKA-3758) KStream job fails to recover after Kafka broker stopped

2016-07-06 Thread Strong Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364085#comment-15364085
 ] 

Strong Liu commented on KAFKA-3758:
---

I run into same problem with the 
_io.confluent.examples.streams.AnomalyDetectionLambdaExample_ 

some details to reproduce:

* change stream threads to 3
* start two processes 

> KStream job fails to recover after Kafka broker stopped
> ---
>
> Key: KAFKA-3758
> URL: https://issues.apache.org/jira/browse/KAFKA-3758
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Greg Fodor
>Assignee: Guozhang Wang
> Attachments: muon.log.1.gz
>
>
> We've been doing some testing of a fairly complex KStreams job and under load 
> it seems the job fails to rebalance + recover if we shut down one of the 
> kafka brokers. The test we were running had a 3-node kafka cluster where each 
> topic had at least a replication factor of 2, and we terminated one of the 
> nodes.
> Attached is the full log, the root exception seems to be contention on the 
> lock on the state directory. The job continues to try to recover but throws 
> errors relating to locks over and over. Restarting the job itself resolves 
> the problem.
>  1702 org.apache.kafka.streams.errors.ProcessorStateException: Error while 
> creating the state manager
>  1703 at 
> org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71)
>  1704 at 
> org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86)
>  1705 at 
> org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550)
>  1706 at 
> org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577)
>  1707 at 
> org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68)
>  1708 at 
> org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123)
>  1709 at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222)
>  1710 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232)
>  1711 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227)
>  1712 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  1713 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  1714 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182)
>  1715 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  1716 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  1717 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436)
>  1718 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422)
>  1719 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
>  1720 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
>  1721 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
>  1722 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
>  1723 at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
>  1724 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
>  1725 at 
> org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
>  1726 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
>  1727 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>  1728 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>  1729 at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>  1730 at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensu

Re: NPE using GenericAvroSerde to deserialize

2016-07-06 Thread Michael Noll
Phil,

> I then specify a Serde (new GenericAvroSerde) as value
> deserializer when outputting to topic via table.to method.

I suppose that was a typo, and you actually meant "as a value
*serializer*", right?



On Tue, Jul 5, 2016 at 11:55 PM, Philippe Derome  wrote:

> This is possibly more of a Confluent question as GenericAvroSerde is in
> Confluent example code base.
>
> I have a small Stream application, which creates a KTable with String key
> and GenericRecord value.
>
> I then specify a Serde (new GenericAvroSerde) as value
> deserializer when outputting to topic via table.to method.
>
> I get a NPE on deserializing with this.schemaRegistry being null within
> AbstractKafkaAvroSerializer#serializeImpl.
>
> Could it be simply that GenericRecords are more of an intermediate class
> and are not meant to be serialised?
>
> I'd like to stream the values on topic as GenericRecord. I thought it
> should work. Alternatively, guidance on using SpecificAvroSerde would be
> very helpful.
>



-- 
Best regards,
Michael Noll



*Michael G. Noll | Product Manager | Confluent | +1 650.453.5860Download
Apache Kafka and Confluent Platform: www.confluent.io/download
*


Re: Kafka Streams - production use

2016-07-06 Thread Michael Noll
Dirk,

we included the note "be careful when using Kafka Streams in production"
because Kafka Streams as shipped in Kafka 0.10.0.0 is the first-ever
release of Kafka Streams.  In practice, users are running Streams
applications in a variety of stages -- some are doing pilots or
evaluations, some are already running it in production.

The 0.10.x release line will see bug fixes and improvements (as you'd
expect) but may also bring bigger new functionality such as Queryable State
[1].

Hope this helps,
Michael



[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-67%3A+Queryable+state+for+Kafka+Streams






On Mon, Jul 4, 2016 at 11:14 AM, Dirk Jeßberger 
wrote:

> Hello,
>
> According to the FAQ (http://docs.confluent.io/3.0.0/streams/faq.html)
> it is not recommended to use Kafka Stream for production. When do you
> expect that it is ready for production use? Within the version 0.10.x.x
> or with later versions?
>
> Best regards
> Dirk Jeßberger
>
> --
> Dirk Jeßberger
> Senior Software Developer
> Homepages & Search Products Development
> 1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 | 76135
> Karlsruhe | Germany
> E-Mail: dirk.jessber...@1und1.de  |
> Web: www.1und1.de 
> Amtsgericht Montabaur, HRB 5452
>
> Geschäftsführer: Frank Einhellinger, Thomas Ludwig, Jan Oetjen
> Member of United Internet
> Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte
> Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat
> sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte
> den Absender und vernichten Sie diese E-Mail. Anderen als dem
> bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern,
> weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden.
> This e-mail may contain confidential and/or privileged information. If
> you are not the intended recipient of this e-mail, you are hereby
> notified that saving, distribution or use of the content of this e-mail
> in any way is prohibited. If you have received this e-mail in error,
> please notify the sender and delete the e-mail.
>


[jira] [Assigned] (KAFKA-3856) Move inner classes accessible only functions in TopologyBuilder out of public APIs

2016-07-06 Thread Jeyhun Karimov (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeyhun Karimov reassigned KAFKA-3856:
-

Assignee: Jeyhun Karimov

> Move inner classes accessible only functions in TopologyBuilder out of public 
> APIs
> --
>
> Key: KAFKA-3856
> URL: https://issues.apache.org/jira/browse/KAFKA-3856
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Jeyhun Karimov
>  Labels: api
>
> In {{TopologyBuilder}} there are a couple of public functions that are 
> actually only used in the internal classes such as StreamThread and 
> StreamPartitionAssignor, and some accessible only in high-level DSL inner 
> classes, examples include {{addInternalTopic}}, {{sourceGroups}} and 
> {{copartitionGroups}}, etc. But they are still listed in Javadocs since this 
> class is part of public APIs.
> We should think about moving them out of the public functions. Unfortunately 
> there is no "friend" access mode as in C++, so we need to think of another 
> way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3836) KStreamReduce and KTableReduce should not pass nulls to Deserializers

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15364023#comment-15364023
 ] 

ASF GitHub Bot commented on KAFKA-3836:
---

GitHub user jeyhunkarimov opened a pull request:

https://github.com/apache/kafka/pull/1591

KAFKA-3836: KStreamReduce and KTableReduce should not pass nulls to 
Deserializers

Minor changes to check null changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeyhunkarimov/kafka KAFKA-3836

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1591.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1591


commit ac421c30f698153a5733ffb71e677633d1edbe9c
Author: Jeyhun Karimov 
Date:   2016-07-06T09:05:13Z

KAFKA-3836 null checks v2




> KStreamReduce and KTableReduce should not pass nulls to Deserializers
> -
>
> Key: KAFKA-3836
> URL: https://issues.apache.org/jira/browse/KAFKA-3836
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Avi Flax
>Assignee: Jeyhun Karimov
>Priority: Trivial
>  Labels: architecture
>
> As per [this 
> discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccahwhrru29jw4jgvhsijwbvlzb3bc6qz6pbh9tqcfbcorjk4...@mail.gmail.com%3e]
>  these classes currently pass null values along to Deserializers, so those 
> Deserializers need to handle null inputs and pass them through without 
> throwing. It would be better for these classes to simply not call the 
> Deserializers in this case; this would reduce the burden of implementers of 
> {{Deserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1591: KAFKA-3836: KStreamReduce and KTableReduce should ...

2016-07-06 Thread jeyhunkarimov
GitHub user jeyhunkarimov opened a pull request:

https://github.com/apache/kafka/pull/1591

KAFKA-3836: KStreamReduce and KTableReduce should not pass nulls to 
Deserializers

Minor changes to check null changes.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jeyhunkarimov/kafka KAFKA-3836

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1591.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1591


commit ac421c30f698153a5733ffb71e677633d1edbe9c
Author: Jeyhun Karimov 
Date:   2016-07-06T09:05:13Z

KAFKA-3836 null checks v2




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-07-06 Thread Andrew Coates
[~junrao] i thought that the indexer took the offset of the first record in
a compressed set. Looking at `LogSegment.recover` in the 0.9.0.1 code base
that does indeed seen to be the case.

I haven't dumped the offsets again, but I can of you still need it?

On Wed, 6 Jul 2016, 09:04 Andrew Coates,  wrote:

> [~junrao] will double check
>
> On Tue, 5 Jul 2016, 18:53 Jun Rao (JIRA),  wrote:
>
>>
>> [
>> https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362865#comment-15362865
>> ]
>>
>> Jun Rao commented on KAFKA-3919:
>> 
>>
>> [~BigAndy], thanks for the investigation and the additional information.
>>
>> Let me first explain how log reconciliation normal works. Each broker
>> maintains the last committed offset for each partition and stores that
>> information in a checkpoint file replication-offset-checkpoint. A message
>> is only considered committed if it's received by all in-sync replicas. The
>> leader advances the last committed offset and propagates it to the
>> followers. So, the follower's last committed offset is always <= the
>> leader's. When a replica becomes a leader, it won't do any truncation to
>> its log and will instead try to commit all messages in its local log. When
>> a replica becomes a follower, it will first truncate its log to the last
>> committed offset stored in its local checkpoint file and then start
>> replicating from that offset. If unclean leader election is disabled, after
>> truncation, the follower's last offset should always be <= the leader's
>> last offset.
>>
>> Another thing we do is that if a broker is shut down forcefully, on
>> startup, we will do log recovery to remove any corrupted messages. In your
>> case, it seems what happens is that when the new leader (2011) comes up,
>> its log is actually corrupted and therefore it has to truncate some
>> messages. This potentially will remove messages that have been previously
>> committed. Then, when broker 2012 comes up and becomes the follower. It's
>> possible that after the follower truncates its log to its local last
>> committed offset, that offset is actually larger than the last recovered
>> offset in the leader. If no new messages have been appended to the new
>> leader, the follower will realize that its offset is out of range and will
>> truncate its log again to the leader's last offset. However, in this case,
>> it seems some new messages have been published to the new leader and the
>> follower's offset is actually in range. It's just that the follower's
>> offset may now point to a completely new set of messages. In this case if
>> the follower's offset points to the middle of a compressed message set, the
>> follower will get the whole compressed message set and append it to its
>> local log. Currently, in the follower, will only ensure that the last
>> offset in a compressed message set be larger than the last offset in the
>> log, but not the first offset. This seems to be the situation that you are
>> in.
>>
>> There is still one thing not very clear to me. When building indexes
>> during log recovery, we actually only add index entries at the boundary of
>> compressed message set. So as long as the last offset of each compressed
>> set keeps increasing, we won't hit the InvalidOffsetException in the
>> description. Could you check whether 1239742691 is the last offset of a
>> compressed set and if so whether there is a case that the last offset of a
>> compressed set is out of order in the log?
>>
>> > Broker faills to start after ungraceful shutdown due to
>> non-monotonically incrementing offsets in logs
>> >
>> --
>> >
>> > Key: KAFKA-3919
>> > URL: https://issues.apache.org/jira/browse/KAFKA-3919
>> > Project: Kafka
>> >  Issue Type: Bug
>> >  Components: core
>> >Affects Versions: 0.9.0.1
>> >Reporter: Andy Coates
>> >
>> > Hi All,
>> > I encountered an issue with Kafka following a power outage that saw a
>> proportion of our cluster disappear. When the power came back on several
>> brokers halted on start up with the error:
>> > {noformat}
>> >   Fatal error during KafkaServerStartable startup. Prepare to
>> shutdown”
>> >   kafka.common.InvalidOffsetException: Attempt to append an offset
>> (1239742691) to position 35728 no larger than the last offset appended
>> (1239742822) to
>> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
>> >   at
>> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
>> >   at
>> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>> >   at
>> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
>> >   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
>> >   

[jira] [Updated] (KAFKA-3930) IPv6 address can't used as ObjectName

2016-07-06 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3930:
---
Fix Version/s: 0.10.1.0

> IPv6 address can't used as ObjectName
> -
>
> Key: KAFKA-3930
> URL: https://issues.apache.org/jira/browse/KAFKA-3930
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.9.0.0, 0.9.0.1, 0.10.0.0
>Reporter: wateray
>Priority: Minor
> Fix For: 0.10.1.0
>
>
> When use IPv6 start the broker. The server.log output this error:
> ===
> [2016-05-25 15:45:56,120] WARN Error processing 
> kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=console-consumer-25184,brokerHost=fe80::92e2:baff:fe92:62f,brokerPort=3392
>  (com.yammer.metrics.reporting.JmxReporter)
> javax.management.MalformedObjectNameException: Invalid character ':' in value 
> part of property
> at javax.management.ObjectName.construct(ObjectName.java:618)
> at javax.management.ObjectName.(ObjectName.java:1382)
> at 
> com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)
> at 
> com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)
> at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)
> at com.yammer.metrics.core.MetricsRegistry.newMeter(MetricsRegistry.java:240)
> at kafka.metrics.KafkaMetricsGroup$class.newMeter(KafkaMetricsGroup.scala:80)
> at kafka.server.FetcherStats.newMeter(AbstractFetcherThread.scala:264)
> at kafka.server.FetcherStats.(AbstractFetcherThread.scala:269)
> at kafka.server.AbstractFetcherThread.(AbstractFetcherThread.scala:55)
> at kafka.consumer.ConsumerFetcherThread.(ConsumerFetcherThread.scala:38)
> ..
> ==
> In the  AbstractFetcherThread.scala line 264:
> class FetcherStats(metricId: ClientIdAndBroker) extends KafkaMetricsGroup {
>   val tags = Map("clientId" -> metricId.clientId,
> "brokerHost" -> metricId.brokerHost,
> "brokerPort" -> metricId.brokerPort.toString)
> When brokerHost is IPv6, the address has :, which can't use as ObjectName.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3930) IPv6 address can't used as ObjectName

2016-07-06 Thread wateray (JIRA)
wateray created KAFKA-3930:
--

 Summary: IPv6 address can't used as ObjectName
 Key: KAFKA-3930
 URL: https://issues.apache.org/jira/browse/KAFKA-3930
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.10.0.0, 0.9.0.1, 0.9.0.0
Reporter: wateray
Priority: Minor


When use IPv6 start the broker. The server.log output this error:
===
[2016-05-25 15:45:56,120] WARN Error processing 
kafka.server:type=FetcherStats,name=RequestsPerSec,clientId=console-consumer-25184,brokerHost=fe80::92e2:baff:fe92:62f,brokerPort=3392
 (com.yammer.metrics.reporting.JmxReporter)
javax.management.MalformedObjectNameException: Invalid character ':' in value 
part of property
at javax.management.ObjectName.construct(ObjectName.java:618)
at javax.management.ObjectName.(ObjectName.java:1382)
at com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)
at 
com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)
at com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)
at com.yammer.metrics.core.MetricsRegistry.newMeter(MetricsRegistry.java:240)
at kafka.metrics.KafkaMetricsGroup$class.newMeter(KafkaMetricsGroup.scala:80)
at kafka.server.FetcherStats.newMeter(AbstractFetcherThread.scala:264)
at kafka.server.FetcherStats.(AbstractFetcherThread.scala:269)
at kafka.server.AbstractFetcherThread.(AbstractFetcherThread.scala:55)
at kafka.consumer.ConsumerFetcherThread.(ConsumerFetcherThread.scala:38)
..
==
In the  AbstractFetcherThread.scala line 264:
class FetcherStats(metricId: ClientIdAndBroker) extends KafkaMetricsGroup {
  val tags = Map("clientId" -> metricId.clientId,
"brokerHost" -> metricId.brokerHost,
"brokerPort" -> metricId.brokerPort.toString)

When brokerHost is IPv6, the address has :, which can't use as ObjectName.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3836) KStreamReduce and KTableReduce should not pass nulls to Deserializers

2016-07-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363945#comment-15363945
 ] 

ASF GitHub Bot commented on KAFKA-3836:
---

Github user jeyhunkarimov closed the pull request at:

https://github.com/apache/kafka/pull/1585


> KStreamReduce and KTableReduce should not pass nulls to Deserializers
> -
>
> Key: KAFKA-3836
> URL: https://issues.apache.org/jira/browse/KAFKA-3836
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.0.0
>Reporter: Avi Flax
>Assignee: Jeyhun Karimov
>Priority: Trivial
>  Labels: architecture
>
> As per [this 
> discussion|http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3ccahwhrru29jw4jgvhsijwbvlzb3bc6qz6pbh9tqcfbcorjk4...@mail.gmail.com%3e]
>  these classes currently pass null values along to Deserializers, so those 
> Deserializers need to handle null inputs and pass them through without 
> throwing. It would be better for these classes to simply not call the 
> Deserializers in this case; this would reduce the burden of implementers of 
> {{Deserializer}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [jira] [Commented] (KAFKA-3919) Broker faills to start after ungraceful shutdown due to non-monotonically incrementing offsets in logs

2016-07-06 Thread Andrew Coates
[~junrao] will double check

On Tue, 5 Jul 2016, 18:53 Jun Rao (JIRA),  wrote:

>
> [
> https://issues.apache.org/jira/browse/KAFKA-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362865#comment-15362865
> ]
>
> Jun Rao commented on KAFKA-3919:
> 
>
> [~BigAndy], thanks for the investigation and the additional information.
>
> Let me first explain how log reconciliation normal works. Each broker
> maintains the last committed offset for each partition and stores that
> information in a checkpoint file replication-offset-checkpoint. A message
> is only considered committed if it's received by all in-sync replicas. The
> leader advances the last committed offset and propagates it to the
> followers. So, the follower's last committed offset is always <= the
> leader's. When a replica becomes a leader, it won't do any truncation to
> its log and will instead try to commit all messages in its local log. When
> a replica becomes a follower, it will first truncate its log to the last
> committed offset stored in its local checkpoint file and then start
> replicating from that offset. If unclean leader election is disabled, after
> truncation, the follower's last offset should always be <= the leader's
> last offset.
>
> Another thing we do is that if a broker is shut down forcefully, on
> startup, we will do log recovery to remove any corrupted messages. In your
> case, it seems what happens is that when the new leader (2011) comes up,
> its log is actually corrupted and therefore it has to truncate some
> messages. This potentially will remove messages that have been previously
> committed. Then, when broker 2012 comes up and becomes the follower. It's
> possible that after the follower truncates its log to its local last
> committed offset, that offset is actually larger than the last recovered
> offset in the leader. If no new messages have been appended to the new
> leader, the follower will realize that its offset is out of range and will
> truncate its log again to the leader's last offset. However, in this case,
> it seems some new messages have been published to the new leader and the
> follower's offset is actually in range. It's just that the follower's
> offset may now point to a completely new set of messages. In this case if
> the follower's offset points to the middle of a compressed message set, the
> follower will get the whole compressed message set and append it to its
> local log. Currently, in the follower, will only ensure that the last
> offset in a compressed message set be larger than the last offset in the
> log, but not the first offset. This seems to be the situation that you are
> in.
>
> There is still one thing not very clear to me. When building indexes
> during log recovery, we actually only add index entries at the boundary of
> compressed message set. So as long as the last offset of each compressed
> set keeps increasing, we won't hit the InvalidOffsetException in the
> description. Could you check whether 1239742691 is the last offset of a
> compressed set and if so whether there is a case that the last offset of a
> compressed set is out of order in the log?
>
> > Broker faills to start after ungraceful shutdown due to
> non-monotonically incrementing offsets in logs
> >
> --
> >
> > Key: KAFKA-3919
> > URL: https://issues.apache.org/jira/browse/KAFKA-3919
> > Project: Kafka
> >  Issue Type: Bug
> >  Components: core
> >Affects Versions: 0.9.0.1
> >Reporter: Andy Coates
> >
> > Hi All,
> > I encountered an issue with Kafka following a power outage that saw a
> proportion of our cluster disappear. When the power came back on several
> brokers halted on start up with the error:
> > {noformat}
> >   Fatal error during KafkaServerStartable startup. Prepare to
> shutdown”
> >   kafka.common.InvalidOffsetException: Attempt to append an offset
> (1239742691) to position 35728 no larger than the last offset appended
> (1239742822) to
> /data3/kafka/mt_xp_its_music_main_itsevent-20/001239444214.index.
> >   at
> kafka.log.OffsetIndex$$anonfun$append$1.apply$mcV$sp(OffsetIndex.scala:207)
> >   at
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> >   at
> kafka.log.OffsetIndex$$anonfun$append$1.apply(OffsetIndex.scala:197)
> >   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> >   at kafka.log.OffsetIndex.append(OffsetIndex.scala:197)
> >   at kafka.log.LogSegment.recover(LogSegment.scala:188)
> >   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:188)
> >   at kafka.log.Log$$anonfun$loadSegments$4.apply(Log.scala:160)
> >   at
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> >   at
> scala.c

[GitHub] kafka pull request #1569: Kafka 3836: KStreamReduce and KTableReduce should ...

2016-07-06 Thread jeyhunkarimov
Github user jeyhunkarimov closed the pull request at:

https://github.com/apache/kafka/pull/1569


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request #1585: KAFKA-3836: KStreamReduce and KTableReduce should ...

2016-07-06 Thread jeyhunkarimov
Github user jeyhunkarimov closed the pull request at:

https://github.com/apache/kafka/pull/1585


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3480) Autogenerate metrics documentation

2016-07-06 Thread James Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15363943#comment-15363943
 ] 

James Cheng commented on KAFKA-3480:


PR is here: https://github.com/apache/kafka/pull/1202

> Autogenerate metrics documentation
> --
>
> Key: KAFKA-3480
> URL: https://issues.apache.org/jira/browse/KAFKA-3480
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: James Cheng
> Attachments: Screen Shot 2016-04-07 at 6.52.19 PM.png, 
> sample_metrics.html
>
>
> Metrics documentation is done manually, which means it's hard to keep it 
> current. It would be nice to have automatic generation of this documentation 
> as we have for configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request #1586: Kafka-3836: KStreamReduce and KTableReduce should ...

2016-07-06 Thread jeyhunkarimov
Github user jeyhunkarimov closed the pull request at:

https://github.com/apache/kafka/pull/1586


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3480) Autogenerate metrics documentation

2016-07-06 Thread James Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Cheng updated KAFKA-3480:
---
Reviewer: Jason Gustafson
  Status: Patch Available  (was: In Progress)

[~hachikuji], this is ready for review.

[~ewencp], [~guozhang], this touches the JMX metrics portions of Kafka Connect 
and Kafka Streams. Feel free to comment if you want.



> Autogenerate metrics documentation
> --
>
> Key: KAFKA-3480
> URL: https://issues.apache.org/jira/browse/KAFKA-3480
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jason Gustafson
>Assignee: James Cheng
> Attachments: Screen Shot 2016-04-07 at 6.52.19 PM.png, 
> sample_metrics.html
>
>
> Metrics documentation is done manually, which means it's hard to keep it 
> current. It would be nice to have automatic generation of this documentation 
> as we have for configs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)