Re: [DISCUSS] KIP-58 - Make Log Compaction Point Configurable

2016-05-18 Thread Gwen Shapira
Oops :)

The docs are definitely not doing the feature any favors, but I didn't mean
to imply the feature is thoughtless.

Here's the thing I'm not getting: You are trading off disk space for IO
efficiency. Thats reasonable. But why not allow users to specify space in
bytes?

Basically tell the LogCompacter: Once I have X bytes of dirty data (or post
KIP-58, X bytes of data that needs cleaning), please compact it to the best
of your ability (which in steady state will be into almost nothing).

Since we know how big the compaction buffer is and how Kafka uses it, we
can exactly calculate how much space we are wasting vs. how much IO we are
going to do per unit of time. The size of a single segment or compaction
buffer (whichever is bigger) can be a good default value for
min.dirty.bytes. We can even evaluate and re-evaluate it based on the
amount of free space on the disk. Heck, we can automate those tunings
(lower min.dirty.bytes to trigger compaction and free space if we are close
to running out of space).

We can do the same capacity planning with percentages but it requires more
information to know the results, information that can only be acquired
after you reach steady state.

It is a bit obvious, so I'm guessing the idea was considered and dismissed.
I just can't see why.
If only there were KIPs back then, so I could look at rejected
alternatives...

Gwen



On Wed, May 18, 2016 at 9:54 PM, Jay Kreps  wrote:

> So in summary we never considered this a mechanism to give the consumer
> time to consume prior to compaction, just a mechanism to control space
> wastage. It sort of accidentally gives you that but it's super hard to
> reason about it as an SLA since it is relative to the log size rather than
> absolute.
>
> -Jay
>
> On Wed, May 18, 2016 at 9:50 PM, Jay Kreps  wrote:
>
> > The sad part is I actually did think pretty hard about how to configure
> > that stuff so I guess *I* think the config makes sense! Clearly trying to
> > prevent my being shot :-)
> >
> > I agree the name could be improved and the documentation is quite
> > spartan--no guidance at all on how to set it or what it trades off. A bit
> > shameful.
> >
> > The thinking was this. One approach to cleaning would be to just do it
> > continually with the idea that, hey, you can't take that I/O with
> you--once
> > you've budgeted N MB/sec of background I/O for compaction some of the
> time,
> > you might as well just use that budget all the time. But this leads to
> > seemingly silly behavior where you are doing big ass compactions all the
> > time to free up just a few bytes and we thought it would freak people
> out.
> > Plus arguably Kafka usage isn't all in steady state so this wastage would
> > come out of the budget for other bursty stuff.
> >
> >  So when should compaction kick in? Well what are you trading off? The
> > tradeoff here is how much space to waste on disk versus how much I/O to
> use
> > in cleaning. In general we can't say exactly how much space a compaction
> > will free up--during a phase of all "inserts" compaction may free up no
> > space at all. You just have to do the compaction and hope for the best.
> But
> > in general for most compacted topics they should soon reach a "steady
> > state" where they aren't growing or growing very slowly, so most writes
> are
> > updates (if they keep growing rapidly indefinitely then you are going to
> > run out of space--so safe to assume they do reach steady state). In this
> > steady state the ratio of uncompacted log to total log is effectively the
> > utilization (wasted space percentage). So if you set it to 50% your data
> is
> > about half duplicates. By tolerating more uncleaned log you get more bang
> > for your compaction I/O buck but more space wastage. This seemed like a
> > reasonable way to think about it because maybe you know your compacted
> data
> > size (roughly) so you can reason about whether using, say, twice that
> space
> > is okay.
> >
> > Maybe we should just change the name to something about target
> utilization
> > even though that isn't strictly true except in steady state?
> >
> > -Jay
> >
> >
> > On Wed, May 18, 2016 at 7:59 PM, Gwen Shapira  wrote:
> >
> >> Interesting!
> >>
> >> This needs to be double checked by someone with more experience, but
> >> reading the code, it looks like "log.cleaner.min.cleanable.ratio"
> >> controls *just* the second property, and I'm not even convinced about
> >> that.
> >>
> >> Few facts:
> >>
> >> 1. Each cleaner thread cleans one log at a time. It always goes for
> >> the log with the largest percentage of non-compacted bytes. If you
> >> just created a new partition, wrote 1G and switched to a new segment,
> >> it is very likely that this will be the next log to compact.
> >> Explaining the behavior Eric and Jay complained about. I expected it
> >> to be rare.
> >>
> >> 2. If the dirtiest log has less than 50% dirty bytes (or whatever
> >> 

Re: [DISCUSS] KIP-58 - Make Log Compaction Point Configurable

2016-05-18 Thread Jay Kreps
So in summary we never considered this a mechanism to give the consumer
time to consume prior to compaction, just a mechanism to control space
wastage. It sort of accidentally gives you that but it's super hard to
reason about it as an SLA since it is relative to the log size rather than
absolute.

-Jay

On Wed, May 18, 2016 at 9:50 PM, Jay Kreps  wrote:

> The sad part is I actually did think pretty hard about how to configure
> that stuff so I guess *I* think the config makes sense! Clearly trying to
> prevent my being shot :-)
>
> I agree the name could be improved and the documentation is quite
> spartan--no guidance at all on how to set it or what it trades off. A bit
> shameful.
>
> The thinking was this. One approach to cleaning would be to just do it
> continually with the idea that, hey, you can't take that I/O with you--once
> you've budgeted N MB/sec of background I/O for compaction some of the time,
> you might as well just use that budget all the time. But this leads to
> seemingly silly behavior where you are doing big ass compactions all the
> time to free up just a few bytes and we thought it would freak people out.
> Plus arguably Kafka usage isn't all in steady state so this wastage would
> come out of the budget for other bursty stuff.
>
>  So when should compaction kick in? Well what are you trading off? The
> tradeoff here is how much space to waste on disk versus how much I/O to use
> in cleaning. In general we can't say exactly how much space a compaction
> will free up--during a phase of all "inserts" compaction may free up no
> space at all. You just have to do the compaction and hope for the best. But
> in general for most compacted topics they should soon reach a "steady
> state" where they aren't growing or growing very slowly, so most writes are
> updates (if they keep growing rapidly indefinitely then you are going to
> run out of space--so safe to assume they do reach steady state). In this
> steady state the ratio of uncompacted log to total log is effectively the
> utilization (wasted space percentage). So if you set it to 50% your data is
> about half duplicates. By tolerating more uncleaned log you get more bang
> for your compaction I/O buck but more space wastage. This seemed like a
> reasonable way to think about it because maybe you know your compacted data
> size (roughly) so you can reason about whether using, say, twice that space
> is okay.
>
> Maybe we should just change the name to something about target utilization
> even though that isn't strictly true except in steady state?
>
> -Jay
>
>
> On Wed, May 18, 2016 at 7:59 PM, Gwen Shapira  wrote:
>
>> Interesting!
>>
>> This needs to be double checked by someone with more experience, but
>> reading the code, it looks like "log.cleaner.min.cleanable.ratio"
>> controls *just* the second property, and I'm not even convinced about
>> that.
>>
>> Few facts:
>>
>> 1. Each cleaner thread cleans one log at a time. It always goes for
>> the log with the largest percentage of non-compacted bytes. If you
>> just created a new partition, wrote 1G and switched to a new segment,
>> it is very likely that this will be the next log to compact.
>> Explaining the behavior Eric and Jay complained about. I expected it
>> to be rare.
>>
>> 2. If the dirtiest log has less than 50% dirty bytes (or whatever
>> min.cleanable is), it will be skipped, knowing that others have even
>> lower ditry ratio.
>>
>> 3. If we do decide to clean a log, we will clean the whole damn thing,
>> leaving only the active segment. Contrary to my expectations, it does
>> not leave any dirty byte behind. So *at most* you will have a single
>> clean segment. Again, explaining why Jay, James and Eric are unhappy.
>>
>> 4. What is does guarantee (kinda? at least I think it tries?) is to
>> always clean a large "chunk" of data at once, hopefully minimizing
>> churn (cleaning small bits off the same log over and over) and
>> minimizing IO. It does have the nice mathematical property of
>> guaranteeing double the amount of time between cleanings (except it
>> doesn't really, because who knows the size of the compacted region).
>>
>> 5. Whoever wrote the docs should be shot :)
>>
>> so, in conclusion:
>> In my mind, min.cleanable.dirty.ratio is terrible, it is misleading,
>> difficult to understand, and IMO doesn't even do what it should do.
>> I would like to consider the possibility of
>> min.cleanable.dirty.bytes, which should give good control over # of IO
>> operations (since the size of compaction buffer is known).
>>
>> In the context of this KIP, the interaction with cleanable ratio and
>> cleanable bytes will be similar, and it looks like it was already done
>> correctly in the PR, so no worries ("the ratio's definition will be
>> expanded to become the ratio of "compactable" to compactable plus
>> compacted message sizes. Where compactable includes log segments that
>> are neither the active segment nor those 

Re: [DISCUSS] KIP-58 - Make Log Compaction Point Configurable

2016-05-18 Thread Jay Kreps
The sad part is I actually did think pretty hard about how to configure
that stuff so I guess *I* think the config makes sense! Clearly trying to
prevent my being shot :-)

I agree the name could be improved and the documentation is quite
spartan--no guidance at all on how to set it or what it trades off. A bit
shameful.

The thinking was this. One approach to cleaning would be to just do it
continually with the idea that, hey, you can't take that I/O with you--once
you've budgeted N MB/sec of background I/O for compaction some of the time,
you might as well just use that budget all the time. But this leads to
seemingly silly behavior where you are doing big ass compactions all the
time to free up just a few bytes and we thought it would freak people out.
Plus arguably Kafka usage isn't all in steady state so this wastage would
come out of the budget for other bursty stuff.

 So when should compaction kick in? Well what are you trading off? The
tradeoff here is how much space to waste on disk versus how much I/O to use
in cleaning. In general we can't say exactly how much space a compaction
will free up--during a phase of all "inserts" compaction may free up no
space at all. You just have to do the compaction and hope for the best. But
in general for most compacted topics they should soon reach a "steady
state" where they aren't growing or growing very slowly, so most writes are
updates (if they keep growing rapidly indefinitely then you are going to
run out of space--so safe to assume they do reach steady state). In this
steady state the ratio of uncompacted log to total log is effectively the
utilization (wasted space percentage). So if you set it to 50% your data is
about half duplicates. By tolerating more uncleaned log you get more bang
for your compaction I/O buck but more space wastage. This seemed like a
reasonable way to think about it because maybe you know your compacted data
size (roughly) so you can reason about whether using, say, twice that space
is okay.

Maybe we should just change the name to something about target utilization
even though that isn't strictly true except in steady state?

-Jay


On Wed, May 18, 2016 at 7:59 PM, Gwen Shapira  wrote:

> Interesting!
>
> This needs to be double checked by someone with more experience, but
> reading the code, it looks like "log.cleaner.min.cleanable.ratio"
> controls *just* the second property, and I'm not even convinced about
> that.
>
> Few facts:
>
> 1. Each cleaner thread cleans one log at a time. It always goes for
> the log with the largest percentage of non-compacted bytes. If you
> just created a new partition, wrote 1G and switched to a new segment,
> it is very likely that this will be the next log to compact.
> Explaining the behavior Eric and Jay complained about. I expected it
> to be rare.
>
> 2. If the dirtiest log has less than 50% dirty bytes (or whatever
> min.cleanable is), it will be skipped, knowing that others have even
> lower ditry ratio.
>
> 3. If we do decide to clean a log, we will clean the whole damn thing,
> leaving only the active segment. Contrary to my expectations, it does
> not leave any dirty byte behind. So *at most* you will have a single
> clean segment. Again, explaining why Jay, James and Eric are unhappy.
>
> 4. What is does guarantee (kinda? at least I think it tries?) is to
> always clean a large "chunk" of data at once, hopefully minimizing
> churn (cleaning small bits off the same log over and over) and
> minimizing IO. It does have the nice mathematical property of
> guaranteeing double the amount of time between cleanings (except it
> doesn't really, because who knows the size of the compacted region).
>
> 5. Whoever wrote the docs should be shot :)
>
> so, in conclusion:
> In my mind, min.cleanable.dirty.ratio is terrible, it is misleading,
> difficult to understand, and IMO doesn't even do what it should do.
> I would like to consider the possibility of
> min.cleanable.dirty.bytes, which should give good control over # of IO
> operations (since the size of compaction buffer is known).
>
> In the context of this KIP, the interaction with cleanable ratio and
> cleanable bytes will be similar, and it looks like it was already done
> correctly in the PR, so no worries ("the ratio's definition will be
> expanded to become the ratio of "compactable" to compactable plus
> compacted message sizes. Where compactable includes log segments that
> are neither the active segment nor those prohibited from being
> compacted because they contain messages that do not satisfy all the
> new lag constraints"
>
> I may open a new KIP to handle the cleanable ratio. Please don't let
> my confusion detract from this KIP.
>
> Gwen
>
> On Wed, May 18, 2016 at 3:41 PM, Ben Stopford  wrote:
> > Generally, this seems like a sensible proposal to me.
> >
> > Regarding (1): time and message count seem sensible. I can’t think of a
> specific use case for bytes but it seems like there 

[jira] [Commented] (KAFKA-3511) Provide built-in aggregators sum() and avg() in Kafka Streams DSL

2016-05-18 Thread Jay Kreps (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290440#comment-15290440
 ] 

Jay Kreps commented on KAFKA-3511:
--

It just seems really weird to bake one or two aggregates into the framework 
when there will be many of them and others will not be invoked the same way. As 
a user I wonder why are these aggregates invoked differently then others?

But further, how common is it that you actually ever want to do a sum() or 
count() on its own outside of example code. Wouldn't your sum or count 
virtually always be one field in a richer record?

> Provide built-in aggregators sum() and avg() in Kafka Streams DSL
> -
>
> Key: KAFKA-3511
> URL: https://issues.apache.org/jira/browse/KAFKA-3511
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Ishita Mandhan
>  Labels: api
> Fix For: 0.10.1.0
>
>
> Currently we only have one built-in aggregate function count() in the Kafka 
> Streams DSL, but we want to add more aggregation functions like sum() and 
> avg().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-58 - Make Log Compaction Point Configurable

2016-05-18 Thread Gwen Shapira
Interesting!

This needs to be double checked by someone with more experience, but
reading the code, it looks like "log.cleaner.min.cleanable.ratio"
controls *just* the second property, and I'm not even convinced about
that.

Few facts:

1. Each cleaner thread cleans one log at a time. It always goes for
the log with the largest percentage of non-compacted bytes. If you
just created a new partition, wrote 1G and switched to a new segment,
it is very likely that this will be the next log to compact.
Explaining the behavior Eric and Jay complained about. I expected it
to be rare.

2. If the dirtiest log has less than 50% dirty bytes (or whatever
min.cleanable is), it will be skipped, knowing that others have even
lower ditry ratio.

3. If we do decide to clean a log, we will clean the whole damn thing,
leaving only the active segment. Contrary to my expectations, it does
not leave any dirty byte behind. So *at most* you will have a single
clean segment. Again, explaining why Jay, James and Eric are unhappy.

4. What is does guarantee (kinda? at least I think it tries?) is to
always clean a large "chunk" of data at once, hopefully minimizing
churn (cleaning small bits off the same log over and over) and
minimizing IO. It does have the nice mathematical property of
guaranteeing double the amount of time between cleanings (except it
doesn't really, because who knows the size of the compacted region).

5. Whoever wrote the docs should be shot :)

so, in conclusion:
In my mind, min.cleanable.dirty.ratio is terrible, it is misleading,
difficult to understand, and IMO doesn't even do what it should do.
I would like to consider the possibility of
min.cleanable.dirty.bytes, which should give good control over # of IO
operations (since the size of compaction buffer is known).

In the context of this KIP, the interaction with cleanable ratio and
cleanable bytes will be similar, and it looks like it was already done
correctly in the PR, so no worries ("the ratio's definition will be
expanded to become the ratio of "compactable" to compactable plus
compacted message sizes. Where compactable includes log segments that
are neither the active segment nor those prohibited from being
compacted because they contain messages that do not satisfy all the
new lag constraints"

I may open a new KIP to handle the cleanable ratio. Please don't let
my confusion detract from this KIP.

Gwen

On Wed, May 18, 2016 at 3:41 PM, Ben Stopford  wrote:
> Generally, this seems like a sensible proposal to me.
>
> Regarding (1): time and message count seem sensible. I can’t think of a 
> specific use case for bytes but it seems like there could be one.
>
> Regarding (2):
> The setting log.cleaner.min.cleanable.ratio currently seems to have two uses. 
> It controls which messages will not be compacted, but it also provides a 
> fractional bound on how many logs are cleaned (and hence work done) in each 
> round. This new proposal seems aimed at the first use, but not the second.
>
> The second case better suits a fractional setting like the one we have now. 
> Using a fractional value means the amount of data cleaned scales in 
> proportion to the data stored in the log. If we were to replace this with an 
> absolute value it would create proportionally more cleaning work as the log 
> grew in size.
>
> So, if I understand this correctly, I think there is an argument for having 
> both.
>
>
>> On 17 May 2016, at 19:43, Gwen Shapira  wrote:
>>
>>  and Spark's implementation is another good reason to allow compaction 
>> lag.
>>
>> I'm convinced :)
>>
>> We need to decide:
>>
>> 1) Do we need just .ms config, or anything else? consumer lag is
>> measured (and monitored) in messages, so if we need this feature to
>> somehow work in tandem with consumer lag monitoring, I think we need
>> .messages too.
>>
>> 2) Does this new configuration allows us to get rid of cleaner.ratio config?
>>
>> Gwen
>>
>>
>> On Tue, May 17, 2016 at 9:43 AM, Eric Wasserman
>>  wrote:
>>> James,
>>>
>>> Your pictures do an excellent job of illustrating my point.
>>>
>>> My mention of the additional "10's of minutes to hours" refers to how far 
>>> after the original target checkpoint (T1 in your diagram) on may need to go 
>>> to get to a checkpoint where all partitions of all topics are in the 
>>> uncompacted region of their respective logs. In terms of your diagram: the 
>>> T3 transaction could have been written 10's of minutes to hours after T1 as 
>>> that was how much time it took all readers to get to T1.
>>>
 You would not have to start over from the beginning in order to read to T3.
>>>
>>> While I agree this is technically true, in practice it could be very 
>>> onerous to actually do it. For example, we use the Kafka consumer that is 
>>> part of the Spark Streaming library to read table topics. It accepts a 
>>> range of offsets to read for each partition. Say we originally target 
>>> 

[jira] [Commented] (KAFKA-3335) Kafka Connect hangs in shutdown hook

2016-05-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290333#comment-15290333
 ] 

Shikhar Bhushan commented on KAFKA-3335:


Currently {{start()}} looks like

 {code:java}
public void start() {
try {
log.info("Kafka Connect starting");
Runtime.getRuntime().addShutdownHook(shutdownHook);

herder.start();
rest.start(herder);

log.info("Kafka Connect started");
} finally {
startLatch.countDown();
}
}
 {code}

Would it make sense to only add the shutdown hook at the end of this method?

> Kafka Connect hangs in shutdown hook
> 
>
> Key: KAFKA-3335
> URL: https://issues.apache.org/jira/browse/KAFKA-3335
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Ben Kirwin
>
> The `Connect` class can run into issues during start, such as:
> {noformat}
> Exception in thread "main" org.apache.kafka.connect.errors.ConnectException: 
> Could not look up partition metadata for offset backing store topic in 
> allotted period. This could indicate a connectivity issue, unavailable topic 
> partitions, or if this is your first use of the topic it may have taken too 
> long to create.
> at 
> org.apache.kafka.connect.util.KafkaBasedLog.start(KafkaBasedLog.java:130)
> at 
> org.apache.kafka.connect.storage.KafkaOffsetBackingStore.start(KafkaOffsetBackingStore.java:85)
> at org.apache.kafka.connect.runtime.Worker.start(Worker.java:108)
> at org.apache.kafka.connect.runtime.Connect.start(Connect.java:56)
> at 
> org.apache.kafka.connect.cli.ConnectDistributed.main(ConnectDistributed.java:62)
> {noformat}
> This exception halts the startup process. It also triggers the shutdown 
> hook... which blocks waiting for the service to start up before calling stop. 
> This causes the process to hang forever.
> There's a few things that could be done here, but it would be nice to bound 
> the amount of time the process spends trying to exit gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2935) Remove vestigial CLUSTER_CONFIG in WorkerConfig

2016-05-18 Thread Shikhar Bhushan (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290294#comment-15290294
 ] 

Shikhar Bhushan commented on KAFKA-2935:


I could not find any documentation reference to this config, so hope the above 
patch is GTG.

p.s. Can I please be added as a contributor for JIRA :)

> Remove vestigial CLUSTER_CONFIG in WorkerConfig 
> 
>
> Key: KAFKA-2935
> URL: https://issues.apache.org/jira/browse/KAFKA-2935
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> This config isn't used anywhere anymore. Its previous reason for existence is 
> now handled by DistributedConfig.GROUP_ID_CONFIG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2935) Remove vestigial CLUSTER_CONFIG in WorkerConfig

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290286#comment-15290286
 ] 

ASF GitHub Bot commented on KAFKA-2935:
---

GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1404

KAFKA-2935: Remove vestigial WorkerConfig.CLUSTER_CONFIG



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-2935

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1404


commit 49d5c7694b40c3aaed27f2ad2fb5d0b072be1f27
Author: shikhar 
Date:   2016-05-19T02:07:29Z

KAFKA-2935: Remove vestigial WorkerConfig.CLUSTER_CONFIG




> Remove vestigial CLUSTER_CONFIG in WorkerConfig 
> 
>
> Key: KAFKA-2935
> URL: https://issues.apache.org/jira/browse/KAFKA-2935
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 0.9.0.0
>Reporter: Ewen Cheslack-Postava
>Assignee: Ewen Cheslack-Postava
>
> This config isn't used anywhere anymore. Its previous reason for existence is 
> now handled by DistributedConfig.GROUP_ID_CONFIG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-2935: Remove vestigial WorkerConfig.CLUS...

2016-05-18 Thread shikhar
GitHub user shikhar opened a pull request:

https://github.com/apache/kafka/pull/1404

KAFKA-2935: Remove vestigial WorkerConfig.CLUSTER_CONFIG



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shikhar/kafka kafka-2935

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1404


commit 49d5c7694b40c3aaed27f2ad2fb5d0b072be1f27
Author: shikhar 
Date:   2016-05-19T02:07:29Z

KAFKA-2935: Remove vestigial WorkerConfig.CLUSTER_CONFIG




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jenkins build is back to normal : kafka-trunk-jdk7 #1305

2016-05-18 Thread Apache Jenkins Server
See 



Question regarding enhancement for Apache Kafka

2016-05-18 Thread Janagan Sivagnanasundaram
Hello,

I Janagan Sivagnanasundaram from Sri Lanka, and pursuing final year
undergraduate studies under software engineering at University of Colombo
School of Computing, Sri Lanka. As a part of the fulfillment of the degree
we have to do a final year software engineering project with 3 members. So
our project we thought of doing an enhancement for current apache kafka
framework, and the proposed enhancement is changing apache kafka from topic
based to content based, which will provide many features.

Our idea for the above proposed method is to leave the kafka as it is
without modifying and build a middleware on top of the kafka framework,
which gets the content based subscription as the input and passes to the
kafka module as set of topics, and finally kafka will process as it is
(i.e: content is divided into set of topics)

Is the above proposed way is possible to implement? Or is there any
alternative best way to solve the problem?

More importantly, our whole project should not exceed 6-8 months of period.
We have to finish the project before Dec, 2016

It will be great if you all can propose a suitable way of implement such
thing, I hope that implementing such thing would be a great contribution to
kafka community.

Thank you

Regards,
S.Janagan


[jira] [Commented] (KAFKA-3722) PlaintextChannelBuilder should not use ChannelBuilders.createPrincipalBuilder(configs) for creating instance of PrincipalBuilder

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290233#comment-15290233
 ] 

ASF GitHub Bot commented on KAFKA-3722:
---

GitHub user MayureshGharat opened a pull request:

https://github.com/apache/kafka/pull/1403

KAFKA-3722 : The PlaintextChannelBuilder should always use the 
DefaultPrincipalBuilder



Consider this scenario :
1) We have a Kafka Broker running on PlainText and SSL port simultaneously.

2) We try to plugin a custom principal builder using the config 
"principal.builder.class" for the request coming over the SSL port.

3) The ChannelBuilders.createPrincipalBuilder(configs) first checks if a 
config "principal.builder.class" is specified in the passed in configs and 
tries to use that even when it is building the instance of PrincipalBuilder for 
the PlainText port, when that custom principal class is only menat for SSL port.

IMO, having a DefaultPrincipalBuilder for PlainText port should be fine.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MayureshGharat/kafka KAFKA-3722

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1403


commit e5227268140b2dbd8d0828449ee398a75549c8be
Author: MayureshGharat 
Date:   2016-05-19T01:19:37Z

The PlaintextChannelBuilder should always use the DefaultPrincipalBuilder




> PlaintextChannelBuilder should not use 
> ChannelBuilders.createPrincipalBuilder(configs) for creating instance of 
> PrincipalBuilder
> 
>
> Key: KAFKA-3722
> URL: https://issues.apache.org/jira/browse/KAFKA-3722
> Project: Kafka
>  Issue Type: Bug
>Reporter: Mayuresh Gharat
>Assignee: Mayuresh Gharat
>
> Consider this scenario :
> 1) We have a Kafka Broker running on  PlainText and SSL port simultaneously.
> 2)  We try to plugin a custom principal builder using the config 
> "principal.builder.class" for the request coming over the SSL port.
> 3) The ChannelBuilders.createPrincipalBuilder(configs) first checks if a 
> config "principal.builder.class" is specified in the passed in configs and 
> tries to use that even when it is building the instance of PrincipalBuilder 
> for the PlainText port, when that custom principal class is only menat for 
> SSL port.
> IMO, having a DefaultPrincipalBuilder for PalinText port should be fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3722 : The PlaintextChannelBuilder shoul...

2016-05-18 Thread MayureshGharat
GitHub user MayureshGharat opened a pull request:

https://github.com/apache/kafka/pull/1403

KAFKA-3722 : The PlaintextChannelBuilder should always use the 
DefaultPrincipalBuilder



Consider this scenario :
1) We have a Kafka Broker running on PlainText and SSL port simultaneously.

2) We try to plugin a custom principal builder using the config 
"principal.builder.class" for the request coming over the SSL port.

3) The ChannelBuilders.createPrincipalBuilder(configs) first checks if a 
config "principal.builder.class" is specified in the passed in configs and 
tries to use that even when it is building the instance of PrincipalBuilder for 
the PlainText port, when that custom principal class is only menat for SSL port.

IMO, having a DefaultPrincipalBuilder for PlainText port should be fine.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MayureshGharat/kafka KAFKA-3722

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1403.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1403


commit e5227268140b2dbd8d0828449ee398a75549c8be
Author: MayureshGharat 
Date:   2016-05-19T01:19:37Z

The PlaintextChannelBuilder should always use the DefaultPrincipalBuilder




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-3718) propagate all KafkaConfig __consumer_offsets configs to OffsetConfig instantiation

2016-05-18 Thread Onur Karaman (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Onur Karaman updated KAFKA-3718:

Fix Version/s: 0.10.0.0

> propagate all KafkaConfig __consumer_offsets configs to OffsetConfig 
> instantiation
> --
>
> Key: KAFKA-3718
> URL: https://issues.apache.org/jira/browse/KAFKA-3718
> Project: Kafka
>  Issue Type: Bug
>Reporter: Onur Karaman
>Assignee: Onur Karaman
> Fix For: 0.10.0.0
>
>
> Kafka has two configurable compression codecs: the one used by the client 
> (source codec) and the one finally used when storing into the log (target 
> codec). The target codec defaults to KafkaConfig.compressionType and can be 
> dynamically configured through zookeeper.
> The GroupCoordinator appends group membership information into the 
> __consumer_offsets topic by:
> 1. making a message with group membership information
> 2. making a MessageSet with the single message compressed with the source 
> codec
> 3. doing a log.append on the MessageSet
> Without this patch, KafkaConfig.offsetsTopicCompressionCodec doesn't get 
> propagated to OffsetConfig instantiation, so GroupMetadataManager uses a 
> source codec of NoCompressionCodec when making the MessageSet. Let's say we 
> have enough group information such that the message formed exceeds 
> KafkaConfig.messageMaxBytes before compression but would fall below the 
> threshold after compression using our source codec. Even if we had 
> dynamically configured __consumer_offsets with our favorite compression 
> codec, the log.append will throw RecordTooLargeException during 
> analyzeAndValidateMessageSet since the message was unexpectedly uncompressed 
> instead of having been compressed with the source codec defined by 
> KafkaConfig.offsetsTopicCompressionCodec.
> NOTE: even after this issue is resolved, preliminary tests show that LinkedIn 
> will still hit RecordTooLargeException with large groups that consume many 
> topics (like MirrorMakers with wildcard consumption of .*) since fully 
> expanded subscription and assignment state for each member is put into a 
> single record. But this is a first step in the right direction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: MINOR: Catch Throwable in CommitSourceTask()

2016-05-18 Thread Ishiihara
GitHub user Ishiihara opened a pull request:

https://github.com/apache/kafka/pull/1402

MINOR: Catch Throwable in CommitSourceTask()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ishiihara/kafka source-task-commit-record

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1402.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1402


commit cf0eb5028e3d7733fd1ec84ca3725d232c07ffc1
Author: Liquan Pei 
Date:   2016-05-19T00:34:39Z

Catch Throwable in CommitSourceTask()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3664) When subscription set changes on new consumer, the partitions may be removed without offset being committed.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290179#comment-15290179
 ] 

ASF GitHub Bot commented on KAFKA-3664:
---

Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/1363


> When subscription set changes on new consumer, the partitions may be removed 
> without offset being committed.
> 
>
> Key: KAFKA-3664
> URL: https://issues.apache.org/jira/browse/KAFKA-3664
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Vahid Hashemian
>
> When users are using group management, if they call consumer.subscribe() to 
> change the subscription, the removed subscriptions will be immediately 
> removed and their offset will not be commit. Also the revoked partitions 
> passed to the ConsumerRebalanceListener.onPartitionsRevoked() will not 
> include those partitions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3664) When subscription set changes on new consumer, the partitions may be removed without offset being committed.

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290181#comment-15290181
 ] 

ASF GitHub Bot commented on KAFKA-3664:
---

GitHub user vahidhashemian reopened a pull request:

https://github.com/apache/kafka/pull/1363

KAFKA-3664 (WIP): Commit offset of unsubscribed partitions of the new 
consumer on a subscription change

When users are using group management, if they call consumer.subscribe() to 
change the subscription, the removed subscriptions will be immediately removed 
and their offset will not be committed.
Also the revoked partitions passed to the 
ConsumerRebalanceListener.onPartitionsRevoked() will not include those 
partitions.

This pull request fixes this issue by following the design suggested by 
@becketqin 
[here](https://issues.apache.org/jira/browse/KAFKA-3664?focusedCommentId=15274470=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15274470).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3664

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1363


commit 68e6b5507e6c587b035655c5c010bef5c282c7f7
Author: Vahid Hashemian 
Date:   2016-05-13T00:23:27Z

KAFKA-3664: Commit offset of unsubscribed partitions of the new consumer on 
a subscription change

When users are using group management, if they call consumer.subscribe() to 
change the subscription, the removed subscriptions are immediately removed and 
their offset are not committed.
This pull request fixes this issue.




> When subscription set changes on new consumer, the partitions may be removed 
> without offset being committed.
> 
>
> Key: KAFKA-3664
> URL: https://issues.apache.org/jira/browse/KAFKA-3664
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Vahid Hashemian
>
> When users are using group management, if they call consumer.subscribe() to 
> change the subscription, the removed subscriptions will be immediately 
> removed and their offset will not be commit. Also the revoked partitions 
> passed to the ConsumerRebalanceListener.onPartitionsRevoked() will not 
> include those partitions. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3664 (WIP): Commit offset of unsubscribe...

2016-05-18 Thread vahidhashemian
Github user vahidhashemian closed the pull request at:

https://github.com/apache/kafka/pull/1363


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] kafka pull request: KAFKA-3664 (WIP): Commit offset of unsubscribe...

2016-05-18 Thread vahidhashemian
GitHub user vahidhashemian reopened a pull request:

https://github.com/apache/kafka/pull/1363

KAFKA-3664 (WIP): Commit offset of unsubscribed partitions of the new 
consumer on a subscription change

When users are using group management, if they call consumer.subscribe() to 
change the subscription, the removed subscriptions will be immediately removed 
and their offset will not be committed.
Also the revoked partitions passed to the 
ConsumerRebalanceListener.onPartitionsRevoked() will not include those 
partitions.

This pull request fixes this issue by following the design suggested by 
@becketqin 
[here](https://issues.apache.org/jira/browse/KAFKA-3664?focusedCommentId=15274470=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15274470).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vahidhashemian/kafka KAFKA-3664

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/1363.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1363


commit 68e6b5507e6c587b035655c5c010bef5c282c7f7
Author: Vahid Hashemian 
Date:   2016-05-13T00:23:27Z

KAFKA-3664: Commit offset of unsubscribed partitions of the new consumer on 
a subscription change

When users are using group management, if they call consumer.subscribe() to 
change the subscription, the removed subscriptions are immediately removed and 
their offset are not committed.
This pull request fixes this issue.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Build failed in Jenkins: kafka-0.10.0-jdk7 #100

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3717; Support building aggregate javadoc for all project modules

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H10 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/0.10.0^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/0.10.0^{commit} # timeout=10
Checking out Revision 4f3966cecef674ddda72acccb3e14b4bf7bcf1b9 
(refs/remotes/origin/0.10.0)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4f3966cecef674ddda72acccb3e14b4bf7bcf1b9
 > git rev-list 42e96b8b90b959d2d4e2823715ef48be60b75994 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-0.10.0-jdk7] $ /bin/bash -xe /tmp/hudson97306954291601484.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 25.234 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-0.10.0-jdk7] $ /bin/bash -xe /tmp/hudson5736475799231483449.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 231
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean UP-TO-DATE
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-0.10.0-jdk7:clients:compileJava
:jar_core_2_10 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
org.gradle.api.internal.changedetection.state.FileCollectionSnapshotImpl cannot 
be cast to 
org.gradle.api.internal.changedetection.state.OutputFilesCollectionSnapshotter$OutputFilesSnapshot

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 25.751 secs
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51


Build failed in Jenkins: kafka-trunk-jdk8 #640

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[ismael] KAFKA-3717; Support building aggregate javadoc for all project modules

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-us1 (Ubuntu ubuntu ubuntu-us golang-ppa) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 9791401a778fee5a165373b923bbe0ece70defbe 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 9791401a778fee5a165373b923bbe0ece70defbe
 > git rev-list 5b5f3ed0f29faa803e894fc3ee2797b3bfe5766a # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson7791129701545488756.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 21.795 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson4244189730624904500.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 231
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean UP-TO-DATE
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk8:clients:compileJava
:jar_core_2_10 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Failed to capture snapshot of input files for task 'compileJava' during 
up-to-date check.
> Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin 
> (

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 15.836 secs
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66


Build failed in Jenkins: kafka-trunk-jdk7 #1304

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Bump system test ducktape dependency to 0.5.1

--
[...truncated 3305 lines...]

kafka.log.LogTest > testAsyncDelete PASSED

kafka.log.LogTest > testReadOutOfRange PASSED

kafka.log.LogTest > testAppendWithOutOfOrderOffsetsThrowsException PASSED

kafka.log.LogTest > testReadAtLogGap PASSED

kafka.log.LogTest > testTimeBasedLogRoll PASSED

kafka.log.LogTest > testLoadEmptyLog PASSED

kafka.log.LogTest > testMessageSetSizeCheck PASSED

kafka.log.LogTest > testIndexResizingAtTruncation PASSED

kafka.log.LogTest > testCompactedTopicConstraints PASSED

kafka.log.LogTest > testThatGarbageCollectingSegmentsDoesntChangeOffset PASSED

kafka.log.LogTest > testAppendAndReadWithSequentialOffsets PASSED

kafka.log.LogTest > testDeleteOldSegmentsMethod PASSED

kafka.log.LogTest > testParseTopicPartitionNameForNull PASSED

kafka.log.LogTest > testAppendAndReadWithNonSequentialOffsets PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingSeparator PASSED

kafka.log.LogTest > testCorruptIndexRebuild PASSED

kafka.log.LogTest > testBogusIndexSegmentsAreRemoved PASSED

kafka.log.LogTest > testCompressedMessages PASSED

kafka.log.LogTest > testAppendMessageWithNullPayload PASSED

kafka.log.LogTest > testCorruptLog PASSED

kafka.log.LogTest > testLogRecoversToCorrectOffset PASSED

kafka.log.LogTest > testReopenThenTruncate PASSED

kafka.log.LogTest > testParseTopicPartitionNameForMissingPartition PASSED

kafka.log.LogTest > testParseTopicPartitionNameForEmptyName PASSED

kafka.log.LogTest > testOpenDeletesObsoleteFiles PASSED

kafka.log.LogTest > testSizeBasedLogRoll PASSED

kafka.log.LogTest > testTimeBasedLogRollJitter PASSED

kafka.log.LogTest > testParseTopicPartitionName PASSED

kafka.log.LogTest > testTruncateTo PASSED

kafka.log.LogTest > testCleanShutdownFile PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[0] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[1] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[2] PASSED

kafka.log.LogCleanerIntegrationTest > cleanerTest[3] PASSED

kafka.log.CleanerTest > testBuildOffsetMap PASSED

kafka.log.CleanerTest > testBuildOffsetMapFakeLarge PASSED

kafka.log.CleanerTest > testSegmentGrouping PASSED

kafka.log.CleanerTest > testCleanSegmentsWithAbort PASSED

kafka.log.CleanerTest > testSegmentGroupingWithSparseOffsets PASSED

kafka.log.CleanerTest > testRecoveryAfterCrash PASSED

kafka.log.CleanerTest > testLogToClean PASSED

kafka.log.CleanerTest > testCleaningWithDeletes PASSED

kafka.log.CleanerTest > testCleanSegments PASSED

kafka.log.CleanerTest > testCleaningWithUnkeyedMessages PASSED

kafka.controller.ControllerFailoverTest > testMetadataUpdate PASSED

kafka.javaapi.consumer.ZookeeperConsumerConnectorTest > testBasic PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytesWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > 
testIteratorIsConsistentWithCompression PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testIteratorIsConsistent PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEqualsWithCompression 
PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testWrittenEqualsRead PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testEquals PASSED

kafka.javaapi.message.ByteBufferMessageSetTest > testSizeInBytes PASSED

kafka.network.SocketServerTest > testClientDisconnectionUpdatesRequestMetrics 
PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIp PASSED

kafka.network.SocketServerTest > 
testBrokerSendAfterChannelClosedUpdatesRequestMetrics PASSED

kafka.network.SocketServerTest > simpleRequest PASSED

kafka.network.SocketServerTest > testSessionPrincipal PASSED

kafka.network.SocketServerTest > testMaxConnectionsPerIpOverrides PASSED

kafka.network.SocketServerTest > testSocketsCloseOnShutdown PASSED

kafka.network.SocketServerTest > testSslSocketServer PASSED

kafka.network.SocketServerTest > tooBigRequestIsRejected PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testIsrAfterBrokerShutDownAndJoinsBack PASSED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopicWithCollision 
PASSED

kafka.integration.SaslSslTopicMetadataTest > testAliveBrokerListWithNoTopics 
PASSED

kafka.integration.SaslSslTopicMetadataTest > testAutoCreateTopic PASSED

kafka.integration.SaslSslTopicMetadataTest > testGetAllTopicMetadata PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterNewBrokerStartup PASSED

kafka.integration.SaslSslTopicMetadataTest > testBasicTopicMetadata PASSED

kafka.integration.SaslSslTopicMetadataTest > 
testAliveBrokersListWithNoTopicsAfterABrokerShutdown PASSED

kafka.integration.PrimitiveApiTest > testMultiProduce PASSED

kafka.integration.PrimitiveApiTest > testDefaultEncoderProducerAndFetch PASSED

kafka.integration.PrimitiveApiTest > 

[jira] [Updated] (KAFKA-3717) Support building aggregate javadoc for all project modules

2016-05-18 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3717:
---
   Resolution: Fixed
Fix Version/s: 0.10.0.0
   Status: Resolved  (was: Patch Available)

Issue resolved by pull request 1398
[https://github.com/apache/kafka/pull/1398]

> Support building aggregate javadoc for all project modules
> --
>
> Key: KAFKA-3717
> URL: https://issues.apache.org/jira/browse/KAFKA-3717
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Ismael Juma
> Fix For: 0.10.0.0
>
>
> If you run "./gradlew javadoc", you will only get JavaDoc for the High Level 
> Consumer. All the new clients are missing.
> See here: http://home.apache.org/~gwenshap/0.10.0.0-rc5/javadoc/
> I suggest fixing in 0.10.0 branch and in trunk, not rolling a new release 
> candidate, but updating our docs site.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3717) Support building aggregate javadoc for all project modules

2016-05-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290102#comment-15290102
 ] 

ASF GitHub Bot commented on KAFKA-3717:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1398


> Support building aggregate javadoc for all project modules
> --
>
> Key: KAFKA-3717
> URL: https://issues.apache.org/jira/browse/KAFKA-3717
> Project: Kafka
>  Issue Type: Bug
>Reporter: Gwen Shapira
>Assignee: Ismael Juma
>
> If you run "./gradlew javadoc", you will only get JavaDoc for the High Level 
> Consumer. All the new clients are missing.
> See here: http://home.apache.org/~gwenshap/0.10.0.0-rc5/javadoc/
> I suggest fixing in 0.10.0 branch and in trunk, not rolling a new release 
> candidate, but updating our docs site.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] kafka pull request: KAFKA-3717; Support building aggregate javadoc...

2016-05-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1398


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-3728) inconsistent behavior of Consumer.poll() when assigned vs subscribed

2016-05-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15290084#comment-15290084
 ] 

Ismael Juma commented on KAFKA-3728:


cc [~hachikuji]

> inconsistent behavior of Consumer.poll() when assigned vs subscribed
> 
>
> Key: KAFKA-3728
> URL: https://issues.apache.org/jira/browse/KAFKA-3728
> Project: Kafka
>  Issue Type: Bug
>Reporter: Edoardo Comar
>
> A consumer that is manually assigned a topic-partition is able to consume 
> messages that a consumer that subscribes to the topic can not.
> To reproduce : take the test 
> EndToEndAuthorizationTest.testProduceConsume 
> (eg the SaslSslEndToEndAuthorizationTest implementation)
>  
> it passes ( = messages are consumed) 
> if the consumer is assigned the single topic-partition
>   consumers.head.assign(List(tp).asJava)
> but fails 
> if the consumer subscribes to the topic - changing the line to :
>   consumers.head.subscribe(List(topic).asJava)
> The failure when subscribed shows this error about synchronization:
>  org.apache.kafka.common.KafkaException: Unexpected error from SyncGroup: 
> Messages are rejected since there are fewer in-sync replicas than required.
>   at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:455)
> The test passes in both cases (subscribe and assign) with the setting
>   this.serverConfig.setProperty(KafkaConfig.MinInSyncReplicasProp, "1")



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3730) Problem when updating from 0.8.2 to 0.9.0

2016-05-18 Thread Clint Hillerman (JIRA)
Clint Hillerman created KAFKA-3730:
--

 Summary: Problem when updating from 0.8.2 to 0.9.0
 Key: KAFKA-3730
 URL: https://issues.apache.org/jira/browse/KAFKA-3730
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.9.0.0, 0.8.2.1
 Environment: SUSE SLE 10.3 64bit
Reporter: Clint Hillerman
Priority: Critical


Hello,

I'm having trouble upgrading a 3 node kafka cluster from 0.8.2.1 to 0.9.0.0. I 
have followed the steps in the upgrade guide here:

http://kafka.apache.org/documentation.html

Also, my zookeepers are on the same box as kafka. Each node is both a zookeeper 
and a broker.

Here's what I did:

On each box one at a time I,
- stopped kafka.
- replaced the code with the new version. Just removed the old kafka dir and 
untared the new 0.9.0.0 version into it's place. Note: the data dir is in a 
different location and was not deleted.
- copied the server.properties file from the 0.8.2.1 version to the 0.9.0.0 
config dir.
- added the "inter.broker.protocol.version=0.8.2.X" line to the 
server.properties in 0.9.0.0's config dir.
- restarted kafka

After I completed that process on all 3 broker/zookeeper boxes, I switched the 
version to 0.9.0.0 in the server.properties on one broker and restarted kafka.

This caused an error in my server.log. About one every few seconds:

[2016-05-18 15:00:27,956] WARN [ReplicaFetcherThread-0-12], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@45597bba. Possible cause: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'responses': Error reading field 'topic': java.nio.BufferUnderflowException 
(kafka.server.ReplicaFetcherThread)

A few other things I tried:

Restarting zookeepers. There status was also correct when I ran "server 
mapr-zookeeper" qstatus.

The same process with 9.1 and got this error instead:
[2016-05-18 14:07:15,545] WARN [ReplicaFetcherThread-0-12], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@484ad173. Possible cause: 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'responses': Error reading array of size 1078124, only 176 bytes available 
(kafka.server.ReplicaFetcherThread)

Restarting everything at once (all broker and zookeeper processes)

Please let me know if I should provide more information or if posted this in 
the wrong location. I'm also not sure if this is the right location to post 
bugs like this. If there is a forum or something where this is more appropriate 
please point in that direction.

Thanks,
cmhillerman




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] KIP-58 - Make Log Compaction Point Configurable

2016-05-18 Thread Ben Stopford
Generally, this seems like a sensible proposal to me. 

Regarding (1): time and message count seem sensible. I can’t think of a 
specific use case for bytes but it seems like there could be one.  

Regarding (2): 
The setting log.cleaner.min.cleanable.ratio currently seems to have two uses. 
It controls which messages will not be compacted, but it also provides a 
fractional bound on how many logs are cleaned (and hence work done) in each 
round. This new proposal seems aimed at the first use, but not the second. 

The second case better suits a fractional setting like the one we have now. 
Using a fractional value means the amount of data cleaned scales in proportion 
to the data stored in the log. If we were to replace this with an absolute 
value it would create proportionally more cleaning work as the log grew in 
size. 

So, if I understand this correctly, I think there is an argument for having 
both. 


> On 17 May 2016, at 19:43, Gwen Shapira  wrote:
> 
>  and Spark's implementation is another good reason to allow compaction 
> lag.
> 
> I'm convinced :)
> 
> We need to decide:
> 
> 1) Do we need just .ms config, or anything else? consumer lag is
> measured (and monitored) in messages, so if we need this feature to
> somehow work in tandem with consumer lag monitoring, I think we need
> .messages too.
> 
> 2) Does this new configuration allows us to get rid of cleaner.ratio config?
> 
> Gwen
> 
> 
> On Tue, May 17, 2016 at 9:43 AM, Eric Wasserman
>  wrote:
>> James,
>> 
>> Your pictures do an excellent job of illustrating my point.
>> 
>> My mention of the additional "10's of minutes to hours" refers to how far 
>> after the original target checkpoint (T1 in your diagram) on may need to go 
>> to get to a checkpoint where all partitions of all topics are in the 
>> uncompacted region of their respective logs. In terms of your diagram: the 
>> T3 transaction could have been written 10's of minutes to hours after T1 as 
>> that was how much time it took all readers to get to T1.
>> 
>>> You would not have to start over from the beginning in order to read to T3.
>> 
>> While I agree this is technically true, in practice it could be very onerous 
>> to actually do it. For example, we use the Kafka consumer that is part of 
>> the Spark Streaming library to read table topics. It accepts a range of 
>> offsets to read for each partition. Say we originally target ranges from 
>> offset 0 to the offset of T1 for each topic+partition. There really is no 
>> way to have the library arrive at T1 an then "keep going" to T3. What is 
>> worse, given Spark's design, if you lost a worker during your calculations 
>> you would be in a rather sticky position. Spark achieves resiliency not by 
>> data redundancy but by keeping track of how to reproduce the transformations 
>> leading to a state. In the face of a lost worker, Spark would try to re-read 
>> that portion of the data on the lost worker from Kafka. However, in the 
>> interim compaction may have moved past the reproducible checkpoint (T3) 
>> rendering the data inconsistent. At best the entire calculation would need 
>> to start over targeting some later transaction checkpoint.
>> 
>> Needless to say with the proposed feature everything is quite simple. As 
>> long as we set the compaction lag large enough we can be assured that T1 
>> will remain in the uncompacted region an thereby be reproducible. Thus 
>> reading from 0 to the offsets in T1 will be sufficient for the duration of 
>> the calculation.
>> 
>> Eric
>> 
>> 



Build failed in Jenkins: kafka-0.10.0-jdk7 #99

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Bump system test ducktape dependency to 0.5.1

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-4 (docker Ubuntu ubuntu4 ubuntu yahoo-not-h2) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/0.10.0^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/0.10.0^{commit} # timeout=10
Checking out Revision 42e96b8b90b959d2d4e2823715ef48be60b75994 
(refs/remotes/origin/0.10.0)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 42e96b8b90b959d2d4e2823715ef48be60b75994
 > git rev-list e02a0dd6afa1a51bde4502ad4e733031bb13f6c3 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-0.10.0-jdk7] $ /bin/bash -xe /tmp/hudson1836432578132902169.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 31.419 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-0.10.0-jdk7] $ /bin/bash -xe /tmp/hudson6615949742403246033.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 230
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-0.10.0-jdk7:clients:compileJavaNote: Some input files use unchecked or 
unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

:kafka-0.10.0-jdk7:clients:processResources UP-TO-DATE
:kafka-0.10.0-jdk7:clients:classes
:kafka-0.10.0-jdk7:clients:determineCommitId UP-TO-DATE
:kafka-0.10.0-jdk7:clients:createVersionFile
:kafka-0.10.0-jdk7:clients:jar
:kafka-0.10.0-jdk7:core:compileJava UP-TO-DATE
:kafka-0.10.0-jdk7:core:compileScala
:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 expireTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP) {

  

Build failed in Jenkins: kafka-trunk-jdk8 #639

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[me] MINOR: Bump system test ducktape dependency to 0.5.1

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-2 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision 5b5f3ed0f29faa803e894fc3ee2797b3bfe5766a 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5b5f3ed0f29faa803e894fc3ee2797b3bfe5766a
 > git rev-list c36cc60f73ca1fe956fb8792bc538b1fdebb712d # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson1365589102038145601.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 21.021 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson5701959755039289363.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 230
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean
:log4j-appender:clean
:streams:clean
:tools:clean
:connect:api:clean
:connect:file:clean
:connect:json:clean
:connect:runtime:clean
:streams:examples:clean
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk8:clients:compileJavawarning: [options] bootstrap class path 
not set in conjunction with -source 1.7
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning

:kafka-trunk-jdk8:clients:processResources UP-TO-DATE
:kafka-trunk-jdk8:clients:classes
:kafka-trunk-jdk8:clients:determineCommitId UP-TO-DATE
:kafka-trunk-jdk8:clients:createVersionFile
:kafka-trunk-jdk8:clients:jar
:kafka-trunk-jdk8:core:compileJava UP-TO-DATE
:kafka-trunk-jdk8:core:compileScalaJava HotSpot(TM) 64-Bit Server VM warning: 
ignoring option MaxPermSize=512m; support was removed in 8.0

:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 expireTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP) {
 

[GitHub] kafka pull request: MINOR: Bump system test ducktape dependency to...

2016-05-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1397


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (KAFKA-3729) Auto-configure non-default SerDes passed alongside the topology builder

2016-05-18 Thread Fred Patton (JIRA)
Fred Patton created KAFKA-3729:
--

 Summary:  Auto-configure non-default SerDes passed alongside the 
topology builder
 Key: KAFKA-3729
 URL: https://issues.apache.org/jira/browse/KAFKA-3729
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Fred Patton
Assignee: Guozhang Wang


>From Guozhang Wang:
"Only default serdes provided through configs are auto-configured today. But we 
could auto-configure other serdes passed alongside the topology builder as 
well."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [COMMERCIAL] Re: download - 0.10.0.0 RC6

2016-05-18 Thread Matthias J. Sax
It's for different Scala versions, ie, 2.10 and 2.11, respectively.

-Matthias


On 05/18/2016 07:28 PM, Ramanan, Buvana (Nokia - US) wrote:
> Ian,
> 
> Thanks a lot for the prompt response.
> 
> What is the difference between the following?
> 
> 1) kafka-0.10.0.0-src.tgz
> 2) kafka_2.10-0.10.0.0.tgz
> 3) kafka_2.11-0.10.0.0.tgz
> 
> I suppose the 1st one is the source and the 2nd & 3rd are binaries... 
> I am not able to figure out the difference between 2nd & 3rd, please clarify
> 
> -Buvana
> 
> 
> -Original Message-
> From: Ian Wrigley [mailto:i...@confluent.io] 
> Sent: Wednesday, May 18, 2016 1:10 PM
> To: dev@kafka.apache.org
> Cc: Users
> Subject: [COMMERCIAL] Re: download - 0.10.0.0 RC6
> 
> Hi
> 
> From Gwen’s announcement of RC6 last night:
> 
>> * Release artifacts to be voted upon (source and binary):
>> http://home.apache.org/~gwenshap/0.10.0.0-rc6/
> 
> Regards
> 
> Ian.
> 
> ---
> Ian Wrigley
> Director, Education Services
> Confluent, Inc
> Cell: (323) 819-4075
> 
>> On May 18, 2016, at 1:04 PM, Ramanan, Buvana (Nokia - US) 
>>  wrote:
>>
>> Hello,
>>
>> A naïve question. How do I download 0.10.0 RC6 (or RC5) version?
>>
>> I clicked on "Download zip" under 0.10.0 trunk @ github, downloaded 
>> kafka-0.10.0.zip today, compiled & started the server. Mbeans shows the 
>> version to be 0.10.0-SNAPSHOT and am not sure where / how to download RC6.
>>
>> Please guide.
>>
>> Thanks,
>> Buvana
>>
> 



signature.asc
Description: OpenPGP digital signature


[jira] [Commented] (KAFKA-3726) Enable cold storage option

2016-05-18 Thread Ben Stopford (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289608#comment-15289608
 ] 

Ben Stopford commented on KAFKA-3726:
-

The standard approach to this sort of problem would be to use Kafka Connector 
to move data to HDFS or S3 etc. Would this not suffice?

> Enable cold storage option
> --
>
> Key: KAFKA-3726
> URL: https://issues.apache.org/jira/browse/KAFKA-3726
> Project: Kafka
>  Issue Type: Wish
>Reporter: Radoslaw Gruchalski
> Attachments: kafka-cold-storage.txt
>
>
> This JIRA builds up on the cold storage article I have published on Medium. 
> The copy of the article attached here.
> The need for cold storage or an "indefinite" log seems to be quite often 
> discussed on the user mailing list.
> The cold storage idea would enable the opportunity for the operator to keep 
> the raw Kafka offset files in a third party storage and allow retrieving the 
> data back for re-consumption.
> The two possible options for enabling such functionality are, from the 
> article:
> First approach: if Kafka provided a notification mechanism and could trigger 
> a program when a segment file is to be discarded, it would become feasible to 
> provide a standard method of moving data to cold storage in reaction to those 
> events. Once the program finishes backing the segments up, it could tell 
> Kafka “it is now safe to delete these segments”.
> The second option is to provide an additional value for the 
> log.cleanup.policy setting, call it cold-storage. In case of this value, 
> Kafka would move the segment files — which otherwise would be deleted — to 
> another destination on the server. They can be picked up from there and moved 
> to the cold storage.
> Both have their limitations. The former one is simply a mechanism exposed to 
> allow operator building up the tooling necessary to enable this. Events could 
> be published in a manner similar to Mesos Event Bus 
> (https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself 
> could provide a control topic on which such info would be published. The 
> outcome is, the operator can subscribe to the event bus and get notified 
> about, at least, two events:
> - log segment is complete and can be backed up
> - partition leader changed
> These two, together with an option to keep the log segment safe from 
> compaction for a certain amount of time, would be sufficient to reliably 
> implement cold storage.
> The latter option, {{log.cleanup.policy}} setting would be more complete 
> feature but it is also much more difficult to implement.  All brokers would 
> have keep the backup of the data in the cold storage significantly increasing 
> the size requirements, also, the de-duplication of the data for the 
> replicated data would be left completely to the operator.
> In any case, the thing to stay away from is having Kafka to deal with the 
> physical aspect of moving the data to and back from the cold storage. This is 
> not Kafka's task. The intent is to provide a method for reliable cold storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk7 #1303

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] MINOR: Use `Record` instead of `ByteBufferMessageSet` in

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on H11 (docker Ubuntu ubuntu yahoo-not-h2) in workspace 

 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision c36cc60f73ca1fe956fb8792bc538b1fdebb712d 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c36cc60f73ca1fe956fb8792bc538b1fdebb712d
 > git rev-list 2bd7b64506a2a7ecef562f5b7db8a34e28d4e957 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson5097543723442676016.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 21.145 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
[kafka-trunk-jdk7] $ /bin/bash -xe /tmp/hudson6810386993164390522.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 230
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean
:log4j-appender:clean
:streams:clean
:tools:clean
:connect:api:clean
:connect:file:clean
:connect:json:clean
:connect:runtime:clean
:streams:examples:clean
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk7:clients:compileJava
:jar_core_2_10 FAILED

FAILURE: Build failed with an exception.

* What went wrong:
Failed to capture snapshot of input files for task 'compileJava' during 
up-to-date check.
> Could not add entry 
> '/home/jenkins/.gradle/caches/modules-2/files-2.1/net.jpountz.lz4/lz4/1.3.0/c708bb2590c0652a642236ef45d9f99ff842a2ce/lz4-1.3.0.jar'
>  to cache fileHashes.bin 
> (

* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug 
option to get more log output.

BUILD FAILED

Total time: 21.392 secs
Build step 'Execute shell' marked build as failure
Recording test results
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK_1_7U51_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk-1.7u51


[jira] [Commented] (KAFKA-3396) Unauthorized topics are returned to the user

2016-05-18 Thread Edoardo Comar (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289551#comment-15289551
 ] 

Edoardo Comar commented on KAFKA-3396:
--

I'm held back by 
https://issues.apache.org/jira/browse/KAFKA-3727
and
https://issues.apache.org/jira/browse/KAFKA-3728


> Unauthorized topics are returned to the user
> 
>
> Key: KAFKA-3396
> URL: https://issues.apache.org/jira/browse/KAFKA-3396
> Project: Kafka
>  Issue Type: Bug
>Reporter: Grant Henke
>Assignee: Edoardo Comar
>
> Kafka's clients and protocol exposes unauthorized topics to the end user. 
> This is often considered a security hole. To some, the topic name is 
> considered sensitive information. Those that do not consider the name 
> sensitive, still consider it more information that allows a user to try and 
> circumvent security.  Instead, if a user does not have access to the topic, 
> the servers should act as if the topic does not exist. 
> To solve this some of the changes could include:
>   - The broker should not return a TOPIC_AUTHORIZATION(29) error for 
> requests (metadata, produce, fetch, etc) that include a topic that the user 
> does not have DESCRIBE access to.
>   - A user should not receive a TopicAuthorizationException when they do 
> not have DESCRIBE access to a topic or the cluster.
>  - The client should not maintain and expose a list of unauthorized 
> topics in org.apache.kafka.common.Cluster. 
> Other changes may be required that are not listed here. Further analysis is 
> needed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: kafka-trunk-jdk8 #638

2016-05-18 Thread Apache Jenkins Server
See 

Changes:

[wangguoz] MINOR: Use `Record` instead of `ByteBufferMessageSet` in

--
Started by an SCM change
[EnvInject] - Loading node environment variables.
Building remotely on ubuntu-5 (docker Ubuntu ubuntu5 ubuntu yahoo-not-h2) in 
workspace 
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url 
 > https://git-wip-us.apache.org/repos/asf/kafka.git # timeout=10
Fetching upstream changes from https://git-wip-us.apache.org/repos/asf/kafka.git
 > git --version # timeout=10
 > git -c core.askpass=true fetch --tags --progress 
 > https://git-wip-us.apache.org/repos/asf/kafka.git 
 > +refs/heads/*:refs/remotes/origin/*
 > git rev-parse refs/remotes/origin/trunk^{commit} # timeout=10
 > git rev-parse refs/remotes/origin/origin/trunk^{commit} # timeout=10
Checking out Revision c36cc60f73ca1fe956fb8792bc538b1fdebb712d 
(refs/remotes/origin/trunk)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c36cc60f73ca1fe956fb8792bc538b1fdebb712d
 > git rev-list 2bd7b64506a2a7ecef562f5b7db8a34e28d4e957 # timeout=10
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson4064571805656987692.sh
+ 
/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2/bin/gradle
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
http://gradle.org/docs/2.4-rc-2/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
:downloadWrapper

BUILD SUCCESSFUL

Total time: 22.207 secs
Setting 
GRADLE_2_4_RC_2_HOME=/home/jenkins/jenkins-slave/tools/hudson.plugins.gradle.GradleInstallation/Gradle_2.4-rc-2
Setting 
JDK1_8_0_66_HOME=/home/jenkins/jenkins-slave/tools/hudson.model.JDK/jdk1.8.0_66
[kafka-trunk-jdk8] $ /bin/bash -xe /tmp/hudson9155046199242089977.sh
+ export GRADLE_OPTS=-Xmx1024m
+ GRADLE_OPTS=-Xmx1024m
+ ./gradlew -Dorg.gradle.project.maxParallelForks=1 clean jarAll testAll
To honour the JVM settings for this build a new JVM will be forked. Please 
consider using the daemon: 
https://docs.gradle.org/2.13/userguide/gradle_daemon.html.
Building project 'core' with Scala version 2.10.6
Build file ': 
line 230
useAnt has been deprecated and is scheduled to be removed in Gradle 3.0. The 
Ant-Based Scala compiler is deprecated, please see 
https://docs.gradle.org/current/userguide/scala_plugin.html.
:clean UP-TO-DATE
:clients:clean
:connect:clean UP-TO-DATE
:core:clean
:examples:clean UP-TO-DATE
:log4j-appender:clean UP-TO-DATE
:streams:clean UP-TO-DATE
:tools:clean UP-TO-DATE
:connect:api:clean UP-TO-DATE
:connect:file:clean UP-TO-DATE
:connect:json:clean UP-TO-DATE
:connect:runtime:clean UP-TO-DATE
:streams:examples:clean UP-TO-DATE
:jar_core_2_10
Building project 'core' with Scala version 2.10.6
:kafka-trunk-jdk8:clients:compileJavawarning: [options] bootstrap class path 
not set in conjunction with -source 1.7
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
1 warning

:kafka-trunk-jdk8:clients:processResources UP-TO-DATE
:kafka-trunk-jdk8:clients:classes
:kafka-trunk-jdk8:clients:determineCommitId UP-TO-DATE
:kafka-trunk-jdk8:clients:createVersionFile
:kafka-trunk-jdk8:clients:jar
:kafka-trunk-jdk8:core:compileJava UP-TO-DATE
:kafka-trunk-jdk8:core:compileScalaJava HotSpot(TM) 64-Bit Server VM warning: 
ignoring option MaxPermSize=512m; support was removed in 8.0

:79:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.

org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP
 ^
:36:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 commitTimestamp: Long = 
org.apache.kafka.common.requests.OffsetCommitRequest.DEFAULT_TIMESTAMP,

  ^
:37:
 value DEFAULT_TIMESTAMP in object OffsetCommitRequest is deprecated: see 
corresponding Javadoc for more information.
 

[GitHub] kafka pull request: MINOR: Use `Record` instead of `ByteBufferMess...

2016-05-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/1357


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


RE: [COMMERCIAL] Re: download - 0.10.0.0 RC6

2016-05-18 Thread Ramanan, Buvana (Nokia - US)
Ian,

Thanks a lot for the prompt response.

What is the difference between the following?

1) kafka-0.10.0.0-src.tgz
2) kafka_2.10-0.10.0.0.tgz
3) kafka_2.11-0.10.0.0.tgz

I suppose the 1st one is the source and the 2nd & 3rd are binaries... 
I am not able to figure out the difference between 2nd & 3rd, please clarify

-Buvana


-Original Message-
From: Ian Wrigley [mailto:i...@confluent.io] 
Sent: Wednesday, May 18, 2016 1:10 PM
To: dev@kafka.apache.org
Cc: Users
Subject: [COMMERCIAL] Re: download - 0.10.0.0 RC6

Hi

From Gwen’s announcement of RC6 last night:

> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~gwenshap/0.10.0.0-rc6/

Regards

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc
Cell: (323) 819-4075

> On May 18, 2016, at 1:04 PM, Ramanan, Buvana (Nokia - US) 
>  wrote:
> 
> Hello,
> 
> A naïve question. How do I download 0.10.0 RC6 (or RC5) version?
> 
> I clicked on "Download zip" under 0.10.0 trunk @ github, downloaded 
> kafka-0.10.0.zip today, compiled & started the server. Mbeans shows the 
> version to be 0.10.0-SNAPSHOT and am not sure where / how to download RC6.
> 
> Please guide.
> 
> Thanks,
> Buvana
> 



[jira] [Updated] (KAFKA-3727) Consumer.poll() stuck in loop on non-existent topic manually assigned

2016-05-18 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-3727:
-
Description: 
The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different/inconsistent 
between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

The "subscribed" consumer will return an empty collection
The "assigned" consumer will *loop forever* - this feels a bug to me.

sample snippet to reproduce:
{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever ! 
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)


  was:
Inconsistent behavior of Consumer.poll() on non-existent topic when assigned vs 
subscribed

The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior feels a bug to me.

sample snippet to reproduce:
{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever ! 
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)



> Consumer.poll() stuck in loop on non-existent topic manually assigned
> -
>
> Key: KAFKA-3727
> URL: https://issues.apache.org/jira/browse/KAFKA-3727
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Edoardo Comar
>
> The behavior of a consumer on poll() for a non-existing topic is surprisingly 
> different/inconsistent 
> between a consumer that subscribed to the topic and one that had the 
> topic-partition manually assigned.
> The "subscribed" consumer will return an empty collection
> The "assigned" consumer will *loop forever* - this feels a bug to me.
> sample snippet to reproduce:
> {quote}
> KafkaConsumer assignKc = new KafkaConsumer<>(props1);
> KafkaConsumer subsKc = new KafkaConsumer<>(props2);
> List tps = new ArrayList<>();
> tps.add(new 

[jira] [Updated] (KAFKA-3727) inconsistent behavior of Consumer.poll() on non-existent topic when assigned vs subscribed

2016-05-18 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-3727:
-
Description: 
Inconsistent behavior of Consumer.poll() on non-existent topic when assigned vs 
subscribed

The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior feels a bug to me.

sample snippet to reproduce:
{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever ! 
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)


  was:
The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior feels a bug to me.

sample snippet to reproduce:
{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever ! 
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)



> inconsistent behavior of Consumer.poll() on non-existent topic when assigned 
> vs subscribed
> --
>
> Key: KAFKA-3727
> URL: https://issues.apache.org/jira/browse/KAFKA-3727
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Edoardo Comar
>
> Inconsistent behavior of Consumer.poll() on non-existent topic when assigned 
> vs subscribed
> The behavior of a consumer on poll() for a non-existing topic is surprisingly 
> different between a consumer that subscribed to the topic and one that had 
> the topic-partition manually assigned.
> the "subscribed" consumer will return an empty collection
> the "assigned" consumer will *loop forever*.
> the latter behavior feels a bug to me.
> sample snippet to reproduce:
> {quote}
> KafkaConsumer assignKc = new KafkaConsumer<>(props1);
> 

[jira] [Updated] (KAFKA-3727) Consumer.poll() stuck in loop on non-existent topic manually assigned

2016-05-18 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-3727:
-
Summary: Consumer.poll() stuck in loop on non-existent topic manually 
assigned  (was: inconsistent behavior of Consumer.poll() on non-existent topic 
when assigned vs subscribed)

> Consumer.poll() stuck in loop on non-existent topic manually assigned
> -
>
> Key: KAFKA-3727
> URL: https://issues.apache.org/jira/browse/KAFKA-3727
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Edoardo Comar
>
> Inconsistent behavior of Consumer.poll() on non-existent topic when assigned 
> vs subscribed
> The behavior of a consumer on poll() for a non-existing topic is surprisingly 
> different between a consumer that subscribed to the topic and one that had 
> the topic-partition manually assigned.
> the "subscribed" consumer will return an empty collection
> the "assigned" consumer will *loop forever*.
> the latter behavior feels a bug to me.
> sample snippet to reproduce:
> {quote}
> KafkaConsumer assignKc = new KafkaConsumer<>(props1);
> KafkaConsumer subsKc = new KafkaConsumer<>(props2);
> List tps = new ArrayList<>();
> tps.add(new TopicPartition("topic-not-exists", 0));
> assignKc.assign(tps);
> subsKc.subscribe(Arrays.asList("topic-not-exists"));
> System.out.println("* subscribe k consumer ");
> ConsumerRecords crs2 = subsKc.poll(1000L); 
> print("subscribeKc", crs2); // returns empty
> System.out.println("* assign k consumer ");
> ConsumerRecords crs1 = assignKc.poll(1000L); 
>// will loop forever ! 
> print("assignKc", crs1);
> {quote}
> the logs for the "assigned" consumer show:
> [2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
> Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
> (org.apache.kafka.clients.Metadata)
> [2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
> fetching offset, wait for metadata refresh 
> (org.apache.kafka.clients.consumer.internals.Fetcher)
> [2016-05-18 17:33:10,010] DEBUG Sending metadata request 
> {topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
> [2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
> id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
> (org.apache.kafka.clients.NetworkClient)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3727) inconsistent behavior of Consumer.poll() on non-existent topic when assigned vs subscribed

2016-05-18 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-3727:
-
Description: 
The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior feels a bug to me.

sample snippet to reproduce:
{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever ! 
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)


  was:
The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior feels a bug to me.

{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever !
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)



> inconsistent behavior of Consumer.poll() on non-existent topic when assigned 
> vs subscribed
> --
>
> Key: KAFKA-3727
> URL: https://issues.apache.org/jira/browse/KAFKA-3727
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Edoardo Comar
>
> The behavior of a consumer on poll() for a non-existing topic is surprisingly 
> different between a consumer that subscribed to the topic and one that had 
> the topic-partition manually assigned.
> the "subscribed" consumer will return an empty collection
> the "assigned" consumer will *loop forever*.
> the latter behavior feels a bug to me.
> sample snippet to reproduce:
> {quote}
> KafkaConsumer assignKc = new KafkaConsumer<>(props1);
> KafkaConsumer subsKc = new KafkaConsumer<>(props2);
> List tps = new ArrayList<>();
> tps.add(new TopicPartition("topic-not-exists", 0));
> assignKc.assign(tps);
>

[jira] [Updated] (KAFKA-3727) inconsistent behavior of Consumer.poll() on non-existent topic when assigned vs subscribed

2016-05-18 Thread Edoardo Comar (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar updated KAFKA-3727:
-
Description: 
The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior feels a bug to me.

{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever !
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)


  was:
The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior 

{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever !
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)



> inconsistent behavior of Consumer.poll() on non-existent topic when assigned 
> vs subscribed
> --
>
> Key: KAFKA-3727
> URL: https://issues.apache.org/jira/browse/KAFKA-3727
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Reporter: Edoardo Comar
>
> The behavior of a consumer on poll() for a non-existing topic is surprisingly 
> different between a consumer that subscribed to the topic and one that had 
> the topic-partition manually assigned.
> the "subscribed" consumer will return an empty collection
> the "assigned" consumer will *loop forever*.
> the latter behavior feels a bug to me.
> {quote}
> KafkaConsumer assignKc = new KafkaConsumer<>(props1);
> KafkaConsumer subsKc = new KafkaConsumer<>(props2);
> List tps = new ArrayList<>();
> tps.add(new TopicPartition("topic-not-exists", 0));
> assignKc.assign(tps);
> subsKc.subscribe(Arrays.asList("topic-not-exists"));
> 

Re: download - 0.10.0.0 RC6

2016-05-18 Thread Ian Wrigley
Hi

From Gwen’s announcement of RC6 last night:

> * Release artifacts to be voted upon (source and binary):
> http://home.apache.org/~gwenshap/0.10.0.0-rc6/

Regards

Ian.

---
Ian Wrigley
Director, Education Services
Confluent, Inc
Cell: (323) 819-4075

> On May 18, 2016, at 1:04 PM, Ramanan, Buvana (Nokia - US) 
>  wrote:
> 
> Hello,
> 
> A naïve question. How do I download 0.10.0 RC6 (or RC5) version?
> 
> I clicked on "Download zip" under 0.10.0 trunk @ github, downloaded 
> kafka-0.10.0.zip today, compiled & started the server. Mbeans shows the 
> version to be 0.10.0-SNAPSHOT and am not sure where / how to download RC6.
> 
> Please guide.
> 
> Thanks,
> Buvana
> 



[jira] [Created] (KAFKA-3727) inconsistent behavior of Consumer.poll() on non-existent topic when assigned vs subscribed

2016-05-18 Thread Edoardo Comar (JIRA)
Edoardo Comar created KAFKA-3727:


 Summary: inconsistent behavior of Consumer.poll() on non-existent 
topic when assigned vs subscribed
 Key: KAFKA-3727
 URL: https://issues.apache.org/jira/browse/KAFKA-3727
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Edoardo Comar


The behavior of a consumer on poll() for a non-existing topic is surprisingly 
different between a consumer that subscribed to the topic and one that had the 
topic-partition manually assigned.

the "subscribed" consumer will return an empty collection
the "assigned" consumer will *loop forever*.
the latter behavior 

{quote}
KafkaConsumer assignKc = new KafkaConsumer<>(props1);
KafkaConsumer subsKc = new KafkaConsumer<>(props2);
List tps = new ArrayList<>();
tps.add(new TopicPartition("topic-not-exists", 0));
assignKc.assign(tps);

subsKc.subscribe(Arrays.asList("topic-not-exists"));

System.out.println("* subscribe k consumer ");
ConsumerRecords crs2 = subsKc.poll(1000L); 
print("subscribeKc", crs2); // returns empty

System.out.println("* assign k consumer ");
ConsumerRecords crs1 = assignKc.poll(1000L); 
   // will loop forever !
print("assignKc", crs1);
{quote}

the logs for the "assigned" consumer show:
[2016-05-18 17:33:09,907] DEBUG Updated cluster metadata version 8 to 
Cluster(nodes = [192.168.10.18:9093 (id: 0 rack: null)], partitions = []) 
(org.apache.kafka.clients.Metadata)
[2016-05-18 17:33:09,908] DEBUG Partition topic-not-exists-0 is unknown for 
fetching offset, wait for metadata refresh 
(org.apache.kafka.clients.consumer.internals.Fetcher)
[2016-05-18 17:33:10,010] DEBUG Sending metadata request 
{topics=[topic-not-exists]} to node 0 (org.apache.kafka.clients.NetworkClient)
[2016-05-18 17:33:10,011] WARN Error while fetching metadata with correlation 
id 9 : {topic-not-exists=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


download - 0.10.0.0 RC6

2016-05-18 Thread Ramanan, Buvana (Nokia - US)
Hello,

A naïve question. How do I download 0.10.0 RC6 (or RC5) version?

I clicked on "Download zip" under 0.10.0 trunk @ github, downloaded 
kafka-0.10.0.zip today, compiled & started the server. Mbeans shows the version 
to be 0.10.0-SNAPSHOT and am not sure where / how to download RC6.

Please guide.

Thanks,
Buvana



[jira] [Commented] (KAFKA-3637) Add method that checks if streams are initialised

2016-05-18 Thread Thomas Szymanski (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15289271#comment-15289271
 ] 

Thomas Szymanski commented on KAFKA-3637:
-

Hi, I would like to take this one to start with the Kafka project.

I started started to write the production code, but am having issue with the 
replacement of Thread.sleep() occurences in tests.
TestUtils is a scala class and waitUntilTrue takes a () => Boolean and it seems 
like this requires to instanciate a Function0 to give it as a 
parameter every time in java code.

Even if a kind of wrapper for this is done in IntegrationTestUtils, I am not 
sure this would be the right way to do in order to keep the tests readable.
Can someone git an hint about it ?

> Add method that checks if streams are initialised
> -
>
> Key: KAFKA-3637
> URL: https://issues.apache.org/jira/browse/KAFKA-3637
> Project: Kafka
>  Issue Type: Sub-task
>  Components: streams
>Affects Versions: 0.10.1.0
>Reporter: Eno Thereska
>Assignee: Liquan Pei
>  Labels: newbie
> Fix For: 0.10.1.0
>
>
> Currently when streams are initialised and started with streams.start(), 
> there is no way for the caller to know if the initialisation procedure 
> (including starting tasks) is complete or not. Hence, the caller is forced to 
> guess for how long to wait. It would be good to have a way to return the 
> state of the streams to the caller.
> One option would be to follow a similar approach in Kafka Server 
> (BrokerStates.scala).
> As part of this change, we must remove the Thread.sleep() call in the Kafka 
> Streams integration tests and substitute it with TestUtils.waitUntilTrue().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3726) Enable cold storage option

2016-05-18 Thread Radoslaw Gruchalski (JIRA)
Radoslaw Gruchalski created KAFKA-3726:
--

 Summary: Enable cold storage option
 Key: KAFKA-3726
 URL: https://issues.apache.org/jira/browse/KAFKA-3726
 Project: Kafka
  Issue Type: Wish
Reporter: Radoslaw Gruchalski


This JIRA builds up on the cold storage article I have published on Medium. The 
copy of the article attached here.

The need for cold storage or an "indefinite" log seems to be quite often 
discussed on the user mailing list.

The cold storage idea would enable the opportunity for the operator to keep the 
raw Kafka offset files in a third party storage and allow retrieving the data 
back for re-consumption.

The two possible options for enabling such functionality are, from the article:

First approach: if Kafka provided a notification mechanism and could trigger a 
program when a segment file is to be discarded, it would become feasible to 
provide a standard method of moving data to cold storage in reaction to those 
events. Once the program finishes backing the segments up, it could tell Kafka 
“it is now safe to delete these segments”.

The second option is to provide an additional value for the log.cleanup.policy 
setting, call it cold-storage. In case of this value, Kafka would move the 
segment files — which otherwise would be deleted — to another destination on 
the server. They can be picked up from there and moved to the cold storage.

Both have their limitations. The former one is simply a mechanism exposed to 
allow operator building up the tooling necessary to enable this. Events could 
be published in a manner similar to Mesos Event Bus 
(https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself 
could provide a control topic on which such info would be published. The 
outcome is, the operator can subscribe to the event bus and get notified about, 
at least, two events:

- log segment is complete and can be backed up
- partition leader changed

These two, together with an option to keep the log segment safe from compaction 
for a certain amount of time, would be sufficient to reliably implement cold 
storage.

The latter option, {{log.cleanup.policy}} setting would be more complete 
feature but it is also much more difficult to implement.  All brokers would 
have keep the backup of the data in the cold storage significantly increasing 
the size requirements, also, the de-duplication of the data for the replicated 
data would be left completely to the operator.

In any case, the thing to stay away from is having Kafka to deal with the 
physical aspect of moving the data to and back from the cold storage. This is 
not Kafka's task. The intent is to provide a method for reliable cold storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (KAFKA-3726) Enable cold storage option

2016-05-18 Thread Radoslaw Gruchalski (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Radoslaw Gruchalski updated KAFKA-3726:
---
Attachment: kafka-cold-storage.txt

The text version of the mentioned article.

> Enable cold storage option
> --
>
> Key: KAFKA-3726
> URL: https://issues.apache.org/jira/browse/KAFKA-3726
> Project: Kafka
>  Issue Type: Wish
>Reporter: Radoslaw Gruchalski
> Attachments: kafka-cold-storage.txt
>
>
> This JIRA builds up on the cold storage article I have published on Medium. 
> The copy of the article attached here.
> The need for cold storage or an "indefinite" log seems to be quite often 
> discussed on the user mailing list.
> The cold storage idea would enable the opportunity for the operator to keep 
> the raw Kafka offset files in a third party storage and allow retrieving the 
> data back for re-consumption.
> The two possible options for enabling such functionality are, from the 
> article:
> First approach: if Kafka provided a notification mechanism and could trigger 
> a program when a segment file is to be discarded, it would become feasible to 
> provide a standard method of moving data to cold storage in reaction to those 
> events. Once the program finishes backing the segments up, it could tell 
> Kafka “it is now safe to delete these segments”.
> The second option is to provide an additional value for the 
> log.cleanup.policy setting, call it cold-storage. In case of this value, 
> Kafka would move the segment files — which otherwise would be deleted — to 
> another destination on the server. They can be picked up from there and moved 
> to the cold storage.
> Both have their limitations. The former one is simply a mechanism exposed to 
> allow operator building up the tooling necessary to enable this. Events could 
> be published in a manner similar to Mesos Event Bus 
> (https://mesosphere.github.io/marathon/docs/event-bus.html) or Kafka itself 
> could provide a control topic on which such info would be published. The 
> outcome is, the operator can subscribe to the event bus and get notified 
> about, at least, two events:
> - log segment is complete and can be backed up
> - partition leader changed
> These two, together with an option to keep the log segment safe from 
> compaction for a certain amount of time, would be sufficient to reliably 
> implement cold storage.
> The latter option, {{log.cleanup.policy}} setting would be more complete 
> feature but it is also much more difficult to implement.  All brokers would 
> have keep the backup of the data in the cold storage significantly increasing 
> the size requirements, also, the de-duplication of the data for the 
> replicated data would be left completely to the operator.
> In any case, the thing to stay away from is having Kafka to deal with the 
> physical aspect of moving the data to and back from the cold storage. This is 
> not Kafka's task. The intent is to provide a method for reliable cold storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Perf producer/consumers for compacted topics

2016-05-18 Thread Tom Crayford
Hi,

I'm interested in benchmarking the impact of compaction on producers and
consumers and long term cluster stability. That's not *quite* the impact of
it on the server side, but it certainly plays into it. For example, I'd
like to be able to answer "in configuration X, if we write N messages into
a compacted topic with a certain key range, a certain number of deletes
etc, *then* replay that into a consumer that does nothing. How long does
that consumer take? What happens if we're continually running that
compaction process, along with restarting consumers once an hour or two,
*and* producing a lot of messages. What happens to perf on compacted topics
with different disk configurations (e.g. magnetic vs ssd, RAID vs JBOD).

I certainly welcome some topic/partition specific compaction metrics, and
would be willing to contribute there.

Thanks

Tom

On Wed, May 18, 2016 at 1:32 PM, Manikumar Reddy 
wrote:

> Hi,
>
> There is a kafka.tools.TestLogCleaning tool, which is used to stress test
> the compaction feature.
> This tool validates the correctness of compaction process. This tool can be
> improved for perf testing.
>
> I think you want to benchmark server side compaction process.  Currently we
> have few compaction
> related metrics. We may need to add few more topic specific metrics for
> better analysis.
>
> log compaction related JMX metrics:
> kafka.log:type=LogCleaner,name=cleaner-recopy-percent
> kafka.log:type=LogCleaner,name=max-buffer-utilization-percent
> kafka.log:type=LogCleaner,name=max-clean-time-secs
> kafka.log:type=LogCleanerManager,name=max-dirty-percent
>
> Manikumar
>
> On Tue, May 17, 2016 at 8:45 PM, Tom Crayford 
> wrote:
>
> > Hi there,
> >
> > As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been
> > doing extensive benchmarking of Kafka. In our case this is to help give
> > customers a good idea of the performance of our various configurations.
> For
> > this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh`
> > across multiple machines, which was relatively easy to do and very
> > successful (recently leading to a doc change and a good lesson about
> 0.10).
> >
> > However, we're finding one thing missing from the current
> producer/consumer
> > perf tests, which is that there's no good perf testing on compacted
> topics.
> > Some folk will undoubtedly use compacted topics, so it would be extremely
> > helpful (I think) for the community to have benchmarks that test
> > performance on compacted topics. We're interested in working on this and
> > contributing it upstream, but are pretty unsure what such a test should
> > look like. One straw proposal is to adapt the existing producer/consumer
> > perf tests to work on a compacted topic, likely with an additional flag
> on
> > the producer that lets you choose how wide a key range to emit, if it
> > should emit deletes (and how often to do so) and so on. Is there anything
> > more we could or should do there?
> >
> > We're happy writing the code here, and want to continue contributing
> back,
> > I'd just love a hand thinking about what perf tests for compacted topics
> > should look like.
> >
> > Thanks
> >
> > Tom Crayford
> > Heroku Kafka
> >
>


Re: Perf producer/consumers for compacted topics

2016-05-18 Thread Manikumar Reddy
Hi,

There is a kafka.tools.TestLogCleaning tool, which is used to stress test
the compaction feature.
This tool validates the correctness of compaction process. This tool can be
improved for perf testing.

I think you want to benchmark server side compaction process.  Currently we
have few compaction
related metrics. We may need to add few more topic specific metrics for
better analysis.

log compaction related JMX metrics:
kafka.log:type=LogCleaner,name=cleaner-recopy-percent
kafka.log:type=LogCleaner,name=max-buffer-utilization-percent
kafka.log:type=LogCleaner,name=max-clean-time-secs
kafka.log:type=LogCleanerManager,name=max-dirty-percent

Manikumar

On Tue, May 17, 2016 at 8:45 PM, Tom Crayford  wrote:

> Hi there,
>
> As noted in the 0.10.0.0-RC4 release thread, we (Heroku Kafka) have been
> doing extensive benchmarking of Kafka. In our case this is to help give
> customers a good idea of the performance of our various configurations. For
> this we orchestrate the Kafka `producer-perf.sh` and `consumer-perf.sh`
> across multiple machines, which was relatively easy to do and very
> successful (recently leading to a doc change and a good lesson about 0.10).
>
> However, we're finding one thing missing from the current producer/consumer
> perf tests, which is that there's no good perf testing on compacted topics.
> Some folk will undoubtedly use compacted topics, so it would be extremely
> helpful (I think) for the community to have benchmarks that test
> performance on compacted topics. We're interested in working on this and
> contributing it upstream, but are pretty unsure what such a test should
> look like. One straw proposal is to adapt the existing producer/consumer
> perf tests to work on a compacted topic, likely with an additional flag on
> the producer that lets you choose how wide a key range to emit, if it
> should emit deletes (and how often to do so) and so on. Is there anything
> more we could or should do there?
>
> We're happy writing the code here, and want to continue contributing back,
> I'd just love a hand thinking about what perf tests for compacted topics
> should look like.
>
> Thanks
>
> Tom Crayford
> Heroku Kafka
>


[jira] [Commented] (KAFKA-3725) Update documentation with regards to XFS

2016-05-18 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288623#comment-15288623
 ] 

Ismael Juma commented on KAFKA-3725:


[~toddpalino], can you please check our current recommendation with regards to 
the filesystem and see if should be updated with more recent information? 
Thanks!

> Update documentation with regards to XFS
> 
>
> Key: KAFKA-3725
> URL: https://issues.apache.org/jira/browse/KAFKA-3725
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
>
> Our documentation currently states that only Ext4 has been tried (by 
> LinkedIn):
> "Ext4 may or may not be the best filesystem for Kafka. Filesystems like XFS 
> supposedly handle locking during fsync better. We have only tried Ext4, 
> though."
> http://kafka.apache.org/documentation.html#ext4
> I think this is no longer true, so we should update the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (KAFKA-3725) Update documentation with regards to XFS

2016-05-18 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-3725:
--

 Summary: Update documentation with regards to XFS
 Key: KAFKA-3725
 URL: https://issues.apache.org/jira/browse/KAFKA-3725
 Project: Kafka
  Issue Type: Improvement
Reporter: Ismael Juma


Our documentation currently states that only Ext4 has been tried (by LinkedIn):

"Ext4 may or may not be the best filesystem for Kafka. Filesystems like XFS 
supposedly handle locking during fsync better. We have only tried Ext4, though."
http://kafka.apache.org/documentation.html#ext4

I think this is no longer true, so we should update the documentation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-05-18 Thread Balazs Kossovics (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Kossovics updated KAFKA-3719:

Comment: was deleted

(was: [~gwenshap] Oh, you are right. Actually it was a 
[bug|https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/issues/344] in 
gitlab-ci-multi-runner, who created services with invalid hostname.)

> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Priority: Trivial
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-05-18 Thread Balazs Kossovics (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Kossovics resolved KAFKA-3719.
-
Resolution: Not A Bug

> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Priority: Trivial
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-05-18 Thread Balazs Kossovics (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288597#comment-15288597
 ] 

Balazs Kossovics commented on KAFKA-3719:
-

Oh, you are right. Actually it was a 
[bug|https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/issues/344] in 
gitlab-ci-multi-runner, who created services with invalid hostname.


> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Priority: Trivial
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-3719) Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too narrow

2016-05-18 Thread Balazs Kossovics (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288595#comment-15288595
 ] 

Balazs Kossovics commented on KAFKA-3719:
-

[~gwenshap] Oh, you are right. Actually it was a 
[bug|https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/issues/344] in 
gitlab-ci-multi-runner, who created services with invalid hostname.

> Pattern regex org.apache.kafka.common.utils.Utils.HOST_PORT_PATTERN is too 
> narrow
> -
>
> Key: KAFKA-3719
> URL: https://issues.apache.org/jira/browse/KAFKA-3719
> Project: Kafka
>  Issue Type: Bug
>Reporter: Balazs Kossovics
>Priority: Trivial
>
> In our continuous integration environment the Kafka brokers run on hosts 
> containing underscores in their names. The current regex splits incorrectly 
> these names into host and port parts.
> I could submit a pull request if someone confirms that this is indeed a bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)