Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-27 Thread Michael Pearce
So on mutable, and headers just at message level seems we're all agreed then.

On Radai's comments.

1) agreed - kip updated.

2) Now i totally get the nasty code having this would create as noted by your 
example. And obviously we want an API which means boiler plate code and most 
common interaction is supported.

not sure what we should do here, so via protocol and as per previous 
discussion, it was compromised to support multiple values for a key. I believe 
this was Gwen's request, and compromised accepted.

Now in Guava multimap interface it has "Collection get(K key)", there is no 
single "V get(K key)" styled method.

org.apache.collections varient is the same.


Two ideas i had to address this would be:

A) add two additional methods something

/**
 * Returns the first header for a key if present, else returns null.
 */
Header first(K key)

/**
 * Replaces an existing header where the key and value for the old equal an 
existing header.
 */
replace(Header old, Header new)


B) Change the previous agreement to support multiple values for a key (aka 
multimap), and if someone wants a collection for their value they should code 
this in their value byte[].

My personal opinion is I think option B is a lot cleaner and will make for a 
cleaner interface. But it will mean negating on a previous discussion 
agreement. Would everyone be happy with that? If not Radai is option A ok with 
you? Or any other ideas?

Cheers
Mike





From: Jason Gustafson 
Sent: Tuesday, February 28, 2017 1:38 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-82 - Add Record Headers

If I understand correctly, the suggestion is to let headers be mutable on
the producer side basically until after they've passed through the
interceptors. That sounds like a reasonable compromise to me.

@Becket

3. It might be useful to have headers at MessageSet level as well so we can
> avoid decompression in some cases. But given this KIP is already
> complicated, I would rather leave this out of the scope and address that
> later when needed, e.g. after having batch level interceptors.


Yeah, I had the same thought. I was considering factoring the map of header
names to generated integer ids into the message set and only using the
integer ids in the individual messages. It's a bit complex though, so I
agree it's probably best left out. I guess for now if users have a lot of
headers, they should just enable compression.

-Jason


On Mon, Feb 27, 2017 at 1:16 PM, radai  wrote:

> a few comments on the KIP as it is now:
>
> 1. instead of add(Header) + add (Iterable) i suggest we use add +
> addAll. this is more in line with how java collections work and may
> therefor be more intuitive
>
> 2. common user code dealing with headers will want get("someKey") /
> set("someKey"), or equivalent. code using multiple headers under the same
> key will be rare, and code iterating over all headers would be even rarer
> (probably only for kafka-connect equivalent use cases, really). as the API
> looks right now, the most common and trivial cases will be gnarly:
>get("someKey") -->
> record.headers().headers("someKey").iterator().next().value(). and this is
> before i start talking about how nulls/emptys are handled.
>replace("someKey") -->
> record.headers().remove(record.headers().headers("someKey"));
> record.headers().append(new Header("someKey", value));
>
> this is why i think we should start with get()/set() which are single-value
> map semantics (so set overwrites), then add getAll() (multi-map), append()
> etc on top. make the common case pretty.
>
> On Sun, Feb 26, 2017 at 2:01 AM, Michael Pearce 
> wrote:
>
> > Hi Becket,
> >
> > On 1)
> >
> > Yes truly we wanted mutable headers also. Alas we couldn't think of a
> > solution would address Jason's point around, once a record is sent it
> > shouldn't be possible to mutate it, for cases where you send twice the
> same
> > record.
> >
> > Thank you so much for your solution i think this will work very nicely :)
> >
> > Agreed we only need to do mutable to immutable conversion
> >
> > I think you solution with a ".close()" taken from else where in the kafka
> > protocol where mutability is existent is a great solution, and happy
> middle
> > ground.
> >
> > @Jason you agree, this resolves your concerns if we had mutable headers?
> >
> >
> > On 2)
> > Agreed, this was only added as i couldn't think of a solution to that
> > would address Jason's concern, but really didn't want to force end users
> to
> > constantly write ugly boiler plate code. If we agree on you solution for
> 1,
> > very happy to remove these.
> >
> > On 3)
> > I also would like to keep the scope of this KIP limited to Message
> Headers
> > for now, else we run the risk of not getting even these delivered for
> next
> > release and we're almost now there on getting this KIP to the state
> > everyone is happy. As you note address that later if theres the need.
> >
> >
> > Ill leave 

[jira] [Resolved] (KAFKA-1569) Tool for performance and correctness of transactions end-to-end

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1569.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> Tool for performance and correctness of transactions end-to-end
> ---
>
> Key: KAFKA-1569
> URL: https://issues.apache.org/jira/browse/KAFKA-1569
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Raul Castro Fernandez
>Assignee: Raul Castro Fernandez
>  Labels: transactions
> Attachments: KAFKA-1569.patch, KAFKA-1569.patch
>
>
> A producer tool that creates an input file, reads it and sends it to the 
> brokers according to some transaction configuration. And a consumer tool that 
> read data from brokers with transaction boundaries and writes it to a file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1604) System Test for Transaction Management

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1604.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> System Test for Transaction Management
> --
>
> Key: KAFKA-1604
> URL: https://issues.apache.org/jira/browse/KAFKA-1604
> Project: Kafka
>  Issue Type: Test
>Reporter: Dong Lin
>Assignee: Dong Lin
>  Labels: transactions
> Attachments: KAFKA-1604_2014-08-19_17:31:16.patch, 
> KAFKA-1604_2014-08-19_21:07:35.patch
>
>
> Perform end-to-end transaction management test in the following steps:
> 1) Start Zookeeper.
> 2) Start multiple brokers.
> 3) Create topic.
> 4) Start transaction-aware ProducerPerformance to generate transactional 
> messages to topic.
> 5) Start transaction-aware ConsoleConsumer to read messages from topic.
> 6) Bounce brokers (optional).
> 7) Verify that same number of messages are sent and received.
> This patch depends on KAFKA-1524, KAFKA-1526 and KAFKA-1601.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1601) ConsoleConsumer/SimpleConsumerPerformance should be transaction-aware

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1601.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> ConsoleConsumer/SimpleConsumerPerformance should be transaction-aware
> -
>
> Key: KAFKA-1601
> URL: https://issues.apache.org/jira/browse/KAFKA-1601
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Dong Lin
>Assignee: Dong Lin
>Priority: Minor
>  Labels: transactions
> Attachments: KAFKA-1601_2014-08-19_21:10:12.patch, 
> KAFKA-1601_2014-08-20_08:57:29.patch, KAFKA-1601.patch
>
>
> Implement buffered consumer logic in ConsumerTransactionBuffer class.
> The class takes as input messages from non-transactional consumer (e.g. 
> ConsoleConsumer, SimpleConsumer), recognizes transaction control requests 
> (e.g. commit, abort), and outputs transaction messages when their transaction 
> is committed.
> By default, the class outputs non-transactional messages immediately on input.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1541) Add transactional request definitions to clients package

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1541.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

>  Add transactional request definitions to clients package
> -
>
> Key: KAFKA-1541
> URL: https://issues.apache.org/jira/browse/KAFKA-1541
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Raul Castro Fernandez
>  Labels: transactions
> Attachments: KAFKA-1541.patch, KAFKA-1541.patch, KAFKA-1541.patch, 
> KAFKA-1541.patch
>
>
> Separate jira for this since KAFKA-1522 only adds definitions to the core 
> package.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1526) Producer performance tool should have an option to enable transactions

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1526.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> Producer performance tool should have an option to enable transactions
> --
>
> Key: KAFKA-1526
> URL: https://issues.apache.org/jira/browse/KAFKA-1526
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Raul Castro Fernandez
>  Labels: transactions
> Attachments: KAFKA-1526_2014-08-19_10:54:51.patch, KAFKA-1526.patch
>
>
> If this flag is enabled the producer could start/commit/abort transactions 
> randomly - we could add more configs/parameters for more control on 
> transaction boundaries.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1527) SimpleConsumer should be transaction-aware

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1527.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> SimpleConsumer should be transaction-aware
> --
>
> Key: KAFKA-1527
> URL: https://issues.apache.org/jira/browse/KAFKA-1527
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Raul Castro Fernandez
>  Labels: transactions
> Attachments: KAFKA-1527_2014-08-19_10:39:53.patch, 
> KAFKA-1527_2014-08-19_18:22:26.patch, KAFKA-1527.patch
>
>
> This will help in further integration testing of the transactional producer. 
> This could be implemented in the consumer-iterator level or at a higher level.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1525) DumpLogSegments should print transaction IDs

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1525.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> DumpLogSegments should print transaction IDs
> 
>
> Key: KAFKA-1525
> URL: https://issues.apache.org/jira/browse/KAFKA-1525
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Dong Lin
>  Labels: transactions
> Attachments: KAFKA-1525_2014-07-22_16:48:45.patch, 
> KAFKA-1525_2014-08-15_11:49:25.patch, KAFKA-1525.patch
>
>
> This will help in some very basic integration testing of the transactional 
> producer and brokers (i.e., until we have a transactional simple consumer).
> We only need to print the txid's. There is no need to do transactional 
> buffering.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1524) Implement transactional producer

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1524.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> Implement transactional producer
> 
>
> Key: KAFKA-1524
> URL: https://issues.apache.org/jira/browse/KAFKA-1524
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Raul Castro Fernandez
>  Labels: transactions
> Attachments: KAFKA-1524_2014-08-18_09:39:34.patch, 
> KAFKA-1524_2014-08-20_09:14:59.patch, KAFKA-1524.patch, KAFKA-1524.patch, 
> KAFKA-1524.patch
>
>
> Implement the basic transactional producer functionality as outlined in 
> https://cwiki.apache.org/confluence/display/KAFKA/Transactional+Messaging+in+Kafka
> The scope of this jira is basic functionality (i.e., to be able to begin and 
> commit or abort a transaction) without the failure scenarios.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-1523) Implement transaction manager module

2017-02-27 Thread Jason Gustafson (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-1523.

Resolution: Unresolved

This work has been superseded by KIP-98: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-98+-+Exactly+Once+Delivery+and+Transactional+Messaging.

> Implement transaction manager module
> 
>
> Key: KAFKA-1523
> URL: https://issues.apache.org/jira/browse/KAFKA-1523
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Joel Koshy
>Assignee: Dong Lin
>  Labels: transactions
> Attachments: KAFKA-1523_2014-07-17_20:12:55.patch, 
> KAFKA-1523_2014-07-22_16:45:42.patch, KAFKA-1523_2014-08-05_21:25:55.patch, 
> KAFKA-1523_2014-08-08_21:36:52.patch
>
>
> * Entry point for transaction requests
> * Appends transaction control records to the transaction journal
> * Sends transaction control records to data brokers
> * Responsible for expiring transactions
> * Supports fail-over: for which it needs to maintain a transaction HW which 
> is the offset of the BEGIN control record of the earliest pending 
> transaction. It should checkpoint the HW periodically either to ZK/separate 
> topic/offset commit.
> We merge KAFKA-1565 transaction manager failover handling into this JIRA. 
> Transaction manager should guarantee that, once a pre-commit/pre-abort 
> request is acknowledged, commit/abort request will be delivered to partitions 
> involved in the transaction.
> This patch handles the following failover scenarios:
> 1) Transaction manager or its followers fail before txRequest is duplicated 
> on local log and followers.
> Solution: Transaction manager responds to request with error status. The 
> producer keeps trying to commit.
> 2) The txPartition’s leader is not available.
> Solution: Put txRequest on unSentTxRequestQueue. When metadataCache is 
> updated, check and re-send txRequest from unSentTxRequestQueue if possible.
> 3) The txPartition’s leader fails when txRequest is in channel manager.
> Solution: Retrieve all txRequests queued for transmission to this broker and 
> put them on unSentTxRequestQueue.
> 4) Transaction manage does not receive success response from txPartition’s 
> leaders within timeout period.
> Solution: Transaction manager expires the txRequest and re-send it.
> 5) Transaction manager fails.
> Solution: The new transaction manager reads transactionHW from zookeeper, and 
> sends txRequest starting from the transactionHW.
> This patch does not provide the following feature. These will be provided in 
> separate patches.
> 1) Producer offset commit.
> 2) Transaction expiration.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2606: kafka4811: ReplicaFetchThread may fail to create d...

2017-02-27 Thread amethystic
GitHub user amethystic opened a pull request:

https://github.com/apache/kafka/pull/2606

kafka4811: ReplicaFetchThread may fail to create due to existing metric

Have fetcherThreadMap keyed off brokerId + fetcherId instead of broker + 
fetcherId, but did not consider the case where port is changed.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amethystic/kafka 
kafka4811_ReplicaFetchThread_fail_create

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2606.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2606


commit b153d15f558b642e2c839ac99cf5e06c130f3497
Author: huxi 
Date:   2017-02-28T05:33:52Z

kafka4811: ReplicaFetchThread may fail to create due to existing metric

Have fetcherThreadMap keyed off brokerId + fetcherId instead of broker + 
fetcherId




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread huxi (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huxi reassigned KAFKA-4811:
---

Assignee: huxi

> ReplicaFetchThread may fail to create due to existing metric
> 
>
> Key: KAFKA-4811
> URL: https://issues.apache.org/jira/browse/KAFKA-4811
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>Assignee: huxi
>  Labels: newbie
>
> Someone reported the following error.
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=connection-close-rate, group=replica-fetcher-metrics, 
> description=Connections closed per second in the window., tags={broker-id=1, 
> fetcher-id=0}]' already exists, can't register another one.
> at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
> at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
> at org.apache.kafka.common.network.Selector.(Selector.java:140)
> at 
> kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
> at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
> at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
> at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4738) Remove generic type of class ClientState

2017-02-27 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887309#comment-15887309
 ] 

Matthias J. Sax commented on KAFKA-4738:


[~sharad.develop] Are you working on this?

> Remove generic type of class ClientState
> 
>
> Key: KAFKA-4738
> URL: https://issues.apache.org/jira/browse/KAFKA-4738
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Matthias J. Sax
>Assignee: Sharad
>Priority: Minor
>  Labels: beginner, newbie
>
> Currently, class 
> {{org.apache.kafka.streams.processor.internals.assignment.ClientState}} 
> uses a generic type. However, within actual Streams code base the type will 
> always be {{TaskId}} (from package {{org.apache.kafka.streams.processor}}).
> Thus, this ticket is about removing the generic type and replace it with 
> {{TaskId}}, to simplify the code base.
> There are some tests, that use {{ClientState}} (what allows for a 
> slightly simplified test setup).  Those tests need to be updated to work 
> properly using {{TaskId}} instead of {{Integer}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread huxi (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887308#comment-15887308
 ] 

huxi commented on KAFKA-4811:
-

[~junrao] what about the port change? Do we also need to consider this 
situation?

> ReplicaFetchThread may fail to create due to existing metric
> 
>
> Key: KAFKA-4811
> URL: https://issues.apache.org/jira/browse/KAFKA-4811
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>  Labels: newbie
>
> Someone reported the following error.
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=connection-close-rate, group=replica-fetcher-metrics, 
> description=Connections closed per second in the window., tags={broker-id=1, 
> fetcher-id=0}]' already exists, can't register another one.
> at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
> at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
> at org.apache.kafka.common.network.Selector.(Selector.java:140)
> at 
> kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
> at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
> at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
> at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4706) Unify StreamsKafkaClient instances

2017-02-27 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887304#comment-15887304
 ] 

Matthias J. Sax commented on KAFKA-4706:


[~sharad.develop] You did close your PR. Are you still working on this?

> Unify StreamsKafkaClient instances
> --
>
> Key: KAFKA-4706
> URL: https://issues.apache.org/jira/browse/KAFKA-4706
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Matthias J. Sax
>Assignee: Sharad
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>
> Kafka Streams currently used two instances of {{StreamsKafkaClient}} (one in 
> {{KafkaStreams}} and one in {{InternalTopicManager}}).
> We want to unify both such that only a single instance is used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4623) Change Default unclean.leader.election.enabled from True to False

2017-02-27 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887302#comment-15887302
 ] 

Matthias J. Sax commented on KAFKA-4623:


[~sharad.develop]: [~ewencp] left already a comment on your PR. It cannot be 
reviewed if you don't rebase it to latest trunk.

> Change Default unclean.leader.election.enabled from True to False
> -
>
> Key: KAFKA-4623
> URL: https://issues.apache.org/jira/browse/KAFKA-4623
> Project: Kafka
>  Issue Type: Bug
>Reporter: Ben Stopford
>Assignee: Sharad
> Fix For: 0.11.0.0
>
>
> See KIP-106
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-106+-+Change+Default+unclean.leader.election.enabled+from+True+to+False



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4722) Add application.id to StreamThread name

2017-02-27 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887301#comment-15887301
 ] 

Matthias J. Sax commented on KAFKA-4722:


[~sharad.develop] I left already a comment on your PR. It cannot be reviewed if 
you don't rebase it to latest trunk.

> Add application.id to StreamThread name
> ---
>
> Key: KAFKA-4722
> URL: https://issues.apache.org/jira/browse/KAFKA-4722
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Steven Schlansker
>Assignee: Sharad
>Priority: Minor
>  Labels: beginner, easyfix, newbie
>
> StreamThread currently sets its name thusly:
> {code}
> super("StreamThread-" + STREAM_THREAD_ID_SEQUENCE.getAndIncrement());
> {code}
> If you have multiple {{KafkaStreams}} instance within a single application, 
> it would help to add the application ID to {{StreamThread}} name to identify 
> which thread belong to what {{KafkaStreams}} instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-81: Bound Fetch memory usage in the consumer

2017-02-27 Thread Jason Gustafson
Hey Mickael,

The suggestion to add something to Node makes sense. I could imagine for
example adding a flag to indicate that the connection has a higher
"priority," meaning that we can allocate outside of the memory pool if
necessary. That would still be generic even if the only use case is the
consumer coordinator. We might also face a similar problem when the
producer is sending requests to the transaction coordinator for KIP-98.
What do you think?

Thanks,
Jason

On Mon, Feb 27, 2017 at 9:09 AM, Mickael Maison 
wrote:

> Apologies for the late response.
>
> Thanks Jason for the suggestion. Yes you are right, the Coordinator
> connection is "tagged" with a different id, so we could retrieve it in
> NetworkReceive to make the distinction.
> However, currently the coordinator connection are made different by using:
> Integer.MAX_VALUE - groupCoordinatorResponse.node().id()
> for the Node id.
>
> So to identify Coordinator connections, we'd have to check that the
> NetworkReceive source is a value near Integer.MAX_VALUE which is a bit
> hacky ...
>
> Maybe we could add a constructor to Node that allows to pass in a
> sourceId String. That way we could make all the coordinator
> connections explicit (by setting it to "Coordinator-[ID]" for
> example).
> What do you think ?
>
> On Tue, Jan 24, 2017 at 12:58 AM, Jason Gustafson 
> wrote:
> > Good point. The consumer does use a separate connection to the
> coordinator,
> > so perhaps the connection itself could be tagged for normal heap
> allocation?
> >
> > -Jason
> >
> > On Mon, Jan 23, 2017 at 10:26 AM, Onur Karaman <
> onurkaraman.apa...@gmail.com
> >> wrote:
> >
> >> I only did a quick scan but I wanted to point out what I think is an
> >> incorrect assumption in the KIP's caveats:
> >> "
> >> There is a risk using the MemoryPool that, after we fill up the memory
> with
> >> fetch data, we can starve the coordinator's connection
> >> ...
> >> To alleviate this issue, only messages larger than 1Kb will be
> allocated in
> >> the MemoryPool. Smaller messages will be allocated directly on the Heap
> >> like before. This allows group/heartbeat messages to avoid being
> delayed if
> >> the MemoryPool fills up.
> >> "
> >>
> >> So it sounds like there's an incorrect assumption that responses from
> the
> >> coordinator will always be small (< 1Kb as mentioned in the caveat).
> There
> >> are now a handful of request types between clients and the coordinator:
> >> {JoinGroup, SyncGroup, LeaveGroup, Heartbeat, OffsetCommit, OffsetFetch,
> >> ListGroups, DescribeGroups}. While true (at least today) for
> >> HeartbeatResponse and a few others, I don't think we can assume
> >> JoinGroupResponse, SyncGroupResponse, DescribeGroupsResponse, and
> >> OffsetFetchResponse will be small, as they are effectively bounded by
> the
> >> max message size allowed by the broker for the __consumer_offsets topic
> >> which by default is 1MB.
> >>
> >> On Mon, Jan 23, 2017 at 9:46 AM, Mickael Maison <
> mickael.mai...@gmail.com>
> >> wrote:
> >>
> >> > I've updated the KIP to address all the comments raised here and from
> >> > the "DISCUSS" thread.
> >> > See: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> >> > 81%3A+Bound+Fetch+memory+usage+in+the+consumer
> >> >
> >> > Now, I'd like to restart the vote.
> >> >
> >> > On Tue, Dec 6, 2016 at 9:02 AM, Rajini Sivaram
> >> >  wrote:
> >> > > Hi Mickael,
> >> > >
> >> > > I am +1 on the overall approach of this KIP, but have a couple of
> >> > comments
> >> > > (sorry, should have brought them up on the discuss thread earlier):
> >> > >
> >> > > 1. Perhaps it would be better to do this after KAFKA-4137
> >> > >  is implemented?
> At
> >> > the
> >> > > moment, coordinator shares the same NetworkClient (and hence the
> same
> >> > > Selector) with consumer connections used for fetching records. Since
> >> > > freeing of memory relies on consuming applications invoking poll()
> >> after
> >> > > processing previous records and potentially after committing
> offsets,
> >> it
> >> > > will be good to ensure that coordinator is not blocked for read by
> >> fetch
> >> > > responses. This may be simpler once coordinator has its own
> Selector.
> >> > >
> >> > > 2. The KIP says: *Once messages are returned to the user, messages
> are
> >> > > deleted from the MemoryPool so new messages can be stored.*
> >> > > Can you expand that a bit? I am assuming that partial buffers never
> get
> >> > > freed when some messages are returned to the user since the
> consumer is
> >> > > still holding a reference to the buffer. Would buffers be freed when
> >> > > fetches for all the partitions in a response are parsed, but perhaps
> >> not
> >> > > yet returned to the user (i.e., is the memory freed when a
> reference to
> >> > the
> >> > > response buffer is no longer required)? It will be good to document
> the
> >> > > (approximate) maximum memory requirement for the non-compressed
> case.
> >> 

Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-02-27 Thread Vahid S Hashemian
+1 on 0.11.0.0.

Can we also include KIP-54 to the list?
The PR for this KIP is ready for review.

Thanks.
--Vahid
 




From:   Ismael Juma 
To: dev@kafka.apache.org
Date:   02/27/2017 07:47 PM
Subject:[DISCUSS] 0.10.3.0/0.11.0.0 release planning
Sent by:isma...@gmail.com



Hi all,

With 0.10.2.0 out of the way, I would like to volunteer to be the release
manager for our next time-based release. See https://cwiki.apache.org/c
onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
communication on time-based releases or need a reminder.

I put together a draft release plan with June 2017 as the release month 
(as
previously agreed) and a list of KIPs that have already been voted:

*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68716876
*

I haven't set exact dates for the various stages (feature freeze, code
freeze, etc.) for now as Ewen is going to send out an email with some
suggested tweaks based on his experience as release manager for 0.10.2.0.
We can set the exact dates after that discussion.

As we are starting the process early this time, we should expect the 
number
of KIPs in the plan to grow (so don't worry if your KIP is not there yet),
but it's good to see that we already have 10 (including 2 merged and 2 
with
PR reviews in progress).

Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and KIP-101
(Leader Generation in Replication) require message format changes, which
typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
it makes sense to also include KIP-106 (Unclean leader election should be
false by default) and KIP-118 (Drop support for Java 7). We would also 
take
the chance to remove deprecated code, in that case.

Given the above, how do people feel about 0.11.0.0 as the next Kafka
version? Please share your thoughts.

Thanks,
Ismael






Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-02-27 Thread Guozhang Wang
+1 for 0.11.0.0

Guozhang

On Mon, Feb 27, 2017 at 8:14 PM, Becket Qin  wrote:

> Hi Ismael,
>
> Thanks for volunteering on the new release.
>
> I think 0.11.0.0 makes a lot of sense given the new big features we are
> intended to include.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Feb 27, 2017 at 7:47 PM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > With 0.10.2.0 out of the way, I would like to volunteer to be the release
> > manager for our next time-based release. See https://cwiki.apache.org/c
> > onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
> > communication on time-based releases or need a reminder.
> >
> > I put together a draft release plan with June 2017 as the release month
> (as
> > previously agreed) and a list of KIPs that have already been voted:
> >
> > *https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=68716876
> >  action?pageId=68716876
> > >*
> >
> > I haven't set exact dates for the various stages (feature freeze, code
> > freeze, etc.) for now as Ewen is going to send out an email with some
> > suggested tweaks based on his experience as release manager for 0.10.2.0.
> > We can set the exact dates after that discussion.
> >
> > As we are starting the process early this time, we should expect the
> number
> > of KIPs in the plan to grow (so don't worry if your KIP is not there
> yet),
> > but it's good to see that we already have 10 (including 2 merged and 2
> with
> > PR reviews in progress).
> >
> > Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and
> KIP-101
> > (Leader Generation in Replication) require message format changes, which
> > typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
> > it makes sense to also include KIP-106 (Unclean leader election should be
> > false by default) and KIP-118 (Drop support for Java 7). We would also
> take
> > the chance to remove deprecated code, in that case.
> >
> > Given the above, how do people feel about 0.11.0.0 as the next Kafka
> > version? Please share your thoughts.
> >
> > Thanks,
> > Ismael
> >
>



-- 
-- Guozhang


Re: [VOTE] KIP-107: Add purgeDataBefore() API in AdminClient

2017-02-27 Thread Dong Lin
Hi Jun and everyone,

I would like to change the KIP in the following way. Currently, if any
replica if offline, the purge result for a partition will
be NotEnoughReplicasException and its low_watermark will be 0. The
motivation for this approach is that we want to guarantee that the data
before purgedOffset has been deleted on all replicas of this partition if
purge result indicates success.

But this approach seems too conservative. It should be sufficient in most
cases to just tell user success and set low_watermark to minimum
logStartOffset of all live replicas in the PurgeResponse if logStartOffset
of all live replicas have reached purgedOffset. This is because for an
offline replicas to become online and be elected leader, it should have
received one FetchReponse from the current leader which should tell it to
purge beyond purgedOffset. The benefit of doing this change is that we can
allow purge operation to succeed when some replica is offline.

Are you OK with this change? If so, I will go ahead to update the KIP and
implement this behavior.

Thanks,
Dong



On Tue, Jan 17, 2017 at 10:18 AM, Dong Lin  wrote:

> Hey Jun,
>
> Do you have time to review the KIP again or vote for it?
>
> Hey Ewen,
>
> Can you also review the KIP again or vote for it? I have discussed with
> Radai and Becket regarding your concern. We still think putting it in Admin
> Client seems more intuitive because there is use-case where application
> which manages topic or produces data may also want to purge data. It seems
> weird if they need to create a consumer to do this.
>
> Thanks,
> Dong
>
> On Thu, Jan 12, 2017 at 9:34 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
>> +1 (non-binding)
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Wed, Jan 11, 2017 at 1:03 PM, Dong Lin  wrote:
>>
>> > Sorry for the duplicated email. It seems that gmail will put the voting
>> > email in this thread if I simply replace DISCUSS with VOTE in the
>> subject.
>> >
>> > On Wed, Jan 11, 2017 at 12:57 PM, Dong Lin  wrote:
>> >
>> > > Hi all,
>> > >
>> > > It seems that there is no further concern with the KIP-107. At this
>> point
>> > > we would like to start the voting process. The KIP can be found at
>> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-107
>> > > %3A+Add+purgeDataBefore%28%29+API+in+AdminClient.
>> > >
>> > > Thanks,
>> > > Dong
>> > >
>> >
>>
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>
>
>


Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-02-27 Thread Jeff Widman
+1 for major version bump.

A good bit of deprecated code I would like to see removed especially on old
consumer side plus a few other settings defaults changed such as the brief
discussion on mirrormaker options a few months back. Just be good to
continue to make the new user experience a lot more streamlined so they're
not wondering about all these variations on consumers, CLI scripts etc.

On Feb 27, 2017 8:14 PM, "Becket Qin"  wrote:

> Hi Ismael,
>
> Thanks for volunteering on the new release.
>
> I think 0.11.0.0 makes a lot of sense given the new big features we are
> intended to include.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Mon, Feb 27, 2017 at 7:47 PM, Ismael Juma  wrote:
>
> > Hi all,
> >
> > With 0.10.2.0 out of the way, I would like to volunteer to be the release
> > manager for our next time-based release. See https://cwiki.apache.org/c
> > onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
> > communication on time-based releases or need a reminder.
> >
> > I put together a draft release plan with June 2017 as the release month
> (as
> > previously agreed) and a list of KIPs that have already been voted:
> >
> > *https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=68716876
> >  action?pageId=68716876
> > >*
> >
> > I haven't set exact dates for the various stages (feature freeze, code
> > freeze, etc.) for now as Ewen is going to send out an email with some
> > suggested tweaks based on his experience as release manager for 0.10.2.0.
> > We can set the exact dates after that discussion.
> >
> > As we are starting the process early this time, we should expect the
> number
> > of KIPs in the plan to grow (so don't worry if your KIP is not there
> yet),
> > but it's good to see that we already have 10 (including 2 merged and 2
> with
> > PR reviews in progress).
> >
> > Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and
> KIP-101
> > (Leader Generation in Replication) require message format changes, which
> > typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
> > it makes sense to also include KIP-106 (Unclean leader election should be
> > false by default) and KIP-118 (Drop support for Java 7). We would also
> take
> > the chance to remove deprecated code, in that case.
> >
> > Given the above, how do people feel about 0.11.0.0 as the next Kafka
> > version? Please share your thoughts.
> >
> > Thanks,
> > Ismael
> >
>


Re: [DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-02-27 Thread Becket Qin
Hi Ismael,

Thanks for volunteering on the new release.

I think 0.11.0.0 makes a lot of sense given the new big features we are
intended to include.

Thanks,

Jiangjie (Becket) Qin

On Mon, Feb 27, 2017 at 7:47 PM, Ismael Juma  wrote:

> Hi all,
>
> With 0.10.2.0 out of the way, I would like to volunteer to be the release
> manager for our next time-based release. See https://cwiki.apache.org/c
> onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
> communication on time-based releases or need a reminder.
>
> I put together a draft release plan with June 2017 as the release month (as
> previously agreed) and a list of KIPs that have already been voted:
>
> *https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68716876
>  >*
>
> I haven't set exact dates for the various stages (feature freeze, code
> freeze, etc.) for now as Ewen is going to send out an email with some
> suggested tweaks based on his experience as release manager for 0.10.2.0.
> We can set the exact dates after that discussion.
>
> As we are starting the process early this time, we should expect the number
> of KIPs in the plan to grow (so don't worry if your KIP is not there yet),
> but it's good to see that we already have 10 (including 2 merged and 2 with
> PR reviews in progress).
>
> Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and KIP-101
> (Leader Generation in Replication) require message format changes, which
> typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
> it makes sense to also include KIP-106 (Unclean leader election should be
> false by default) and KIP-118 (Drop support for Java 7). We would also take
> the chance to remove deprecated code, in that case.
>
> Given the above, how do people feel about 0.11.0.0 as the next Kafka
> version? Please share your thoughts.
>
> Thanks,
> Ismael
>


Re: [VOTE] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-27 Thread Becket Qin
+1

On Mon, Feb 27, 2017 at 6:38 PM, Ismael Juma  wrote:

> Thanks to everyone who voted and provided feedback. +1 (binding) from me
> too.
>
> The vote has passed with 4 binding votes (Grant, Jason, Guozhang, Ismael)
> and 11 non-binding votes (Bill, Damian, Eno, Edoardo, Mickael, Bharat,
> Onur, Vahid, Colin, Apurva, Tom). There were no 0 or -1 votes.
>
> I have updated the relevant wiki pages.
>
> Ismael
>
> On Thu, Feb 9, 2017 at 3:31 PM, Ismael Juma  wrote:
>
> > Hi everyone,
> >
> > Since everyone in the discuss thread was in favour (10 people responded),
> > I would like to initiate the voting process for KIP-118: Drop Support for
> > Java 7 in Kafka 0.11:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
> >
> > The vote will run for a minimum of 72 hours.
> >
> > Thanks,
> > Ismael
> >
> >
>


[DISCUSS] 0.10.3.0/0.11.0.0 release planning

2017-02-27 Thread Ismael Juma
Hi all,

With 0.10.2.0 out of the way, I would like to volunteer to be the release
manager for our next time-based release. See https://cwiki.apache.org/c
onfluence/display/KAFKA/Time+Based+Release+Plan if you missed previous
communication on time-based releases or need a reminder.

I put together a draft release plan with June 2017 as the release month (as
previously agreed) and a list of KIPs that have already been voted:

*https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=68716876
*

I haven't set exact dates for the various stages (feature freeze, code
freeze, etc.) for now as Ewen is going to send out an email with some
suggested tweaks based on his experience as release manager for 0.10.2.0.
We can set the exact dates after that discussion.

As we are starting the process early this time, we should expect the number
of KIPs in the plan to grow (so don't worry if your KIP is not there yet),
but it's good to see that we already have 10 (including 2 merged and 2 with
PR reviews in progress).

Out of the KIPs listed, KIP-98 (Exactly-once and Transactions) and KIP-101
(Leader Generation in Replication) require message format changes, which
typically imply a major version bump (i.e. 0.11.0.0). If we do that, then
it makes sense to also include KIP-106 (Unclean leader election should be
false by default) and KIP-118 (Drop support for Java 7). We would also take
the chance to remove deprecated code, in that case.

Given the above, how do people feel about 0.11.0.0 as the next Kafka
version? Please share your thoughts.

Thanks,
Ismael


[jira] [Updated] (KAFKA-4586) Add purgeDataBefore() API in AdminClient

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4586:
---
Status: Patch Available  (was: Open)

> Add purgeDataBefore() API in AdminClient
> 
>
> Key: KAFKA-4586
> URL: https://issues.apache.org/jira/browse/KAFKA-4586
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Dong Lin
>Assignee: Dong Lin
>
> Please visit 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%3A+Add+purgeDataBefore%28%29+API+in+AdminClient
>  for motivation etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4586) Add purgeDataBefore() API in AdminClient

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4586:
---
Fix Version/s: 0.10.3.0

> Add purgeDataBefore() API in AdminClient
> 
>
> Key: KAFKA-4586
> URL: https://issues.apache.org/jira/browse/KAFKA-4586
> Project: Kafka
>  Issue Type: New Feature
>Reporter: Dong Lin
>Assignee: Dong Lin
> Fix For: 0.10.3.0
>
>
> Please visit 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-107%3A+Add+purgeDataBefore%28%29+API+in+AdminClient
>  for motivation etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-3959) __consumer_offsets wrong number of replicas at startup

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-3959:
---
Resolution: Fixed
  Assignee: Onur Karaman  (was: Ewen Cheslack-Postava)
Status: Resolved  (was: Patch Available)

> __consumer_offsets wrong number of replicas at startup
> --
>
> Key: KAFKA-3959
> URL: https://issues.apache.org/jira/browse/KAFKA-3959
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, offset manager, replication
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1, 0.10.0.2, 0.10.1.0, 
> 0.10.1.1, 0.10.1.2
> Environment: Brokers of 3 kafka nodes running Red Hat Enterprise 
> Linux Server release 7.2 (Maipo)
>Reporter: Alban Hurtaud
>Assignee: Onur Karaman
>Priority: Blocker
>  Labels: needs-kip, reliability
> Fix For: 0.10.3.0
>
>
> When creating a stack of 3 kafka brokers, the consumer is starting faster 
> than kafka nodes and when trying to read a topic, only one kafka node is 
> available.
> So the __consumer_offsets is created with a replication factor set to 1 
> (instead of configured 3) :
> offsets.topic.replication.factor=3
> default.replication.factor=3
> min.insync.replicas=2
> Then, other kafka nodes go up and we have exceptions because the replicas # 
> for __consumer_offsets is 1 and min insync is 2. So exceptions are thrown.
> What I missed is : Why the __consumer_offsets is created with replication to 
> 1 (when 1 broker is running) whereas in server.properties it is set to 3 ?
> To reproduce : 
> - Prepare 3 kafka nodes with the 3 lines above added to servers.properties.
> - Run one kafka,
> - Run one consumer (the __consumer_offsets is created with replicas =1)
> - Run 2 more kafka nodes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4811:
---
Labels: newbie  (was: )

> ReplicaFetchThread may fail to create due to existing metric
> 
>
> Key: KAFKA-4811
> URL: https://issues.apache.org/jira/browse/KAFKA-4811
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>  Labels: newbie
>
> Someone reported the following error.
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=connection-close-rate, group=replica-fetcher-metrics, 
> description=Connections closed per second in the window., tags={broker-id=1, 
> fetcher-id=0}]' already exists, can't register another one.
> at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
> at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
> at org.apache.kafka.common.network.Selector.(Selector.java:140)
> at 
> kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
> at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
> at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
> at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4811:
---
Component/s: replication

> ReplicaFetchThread may fail to create due to existing metric
> 
>
> Key: KAFKA-4811
> URL: https://issues.apache.org/jira/browse/KAFKA-4811
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>  Labels: newbie
>
> Someone reported the following error.
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=connection-close-rate, group=replica-fetcher-metrics, 
> description=Connections closed per second in the window., tags={broker-id=1, 
> fetcher-id=0}]' already exists, can't register another one.
> at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
> at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
> at org.apache.kafka.common.network.Selector.(Selector.java:140)
> at 
> kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
> at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
> at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
> at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread Jun Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15887119#comment-15887119
 ] 

Jun Rao commented on KAFKA-4811:


In AbstractFetcherManager, we only create a new ReplicaFetcherThread when it's 
not in the map. However, the map is keyed off BrokerAndFetcherId, which 
includes host. So, when the broker host changes, it's possible that we need to 
create a new ReplicaFetcherThread based on the new host, even though another 
ReplicaFetcherThread based on the old host is still present. If that happens, 
the creation of the new ReplicaFetcherThread will hit the 
IllegalArgumentException in the description since the metric tags are based on 
just broker-id.

One way to fix this is that in AbstractFetcherManager, we key fetcherThreadMap 
on just brokerIdAndFetcherId. When adding a partition, if the fetcherThread 
exists, we check if the host name in the fetcherThread matches the host from 
the input. If so, we will reuse the fetcherThread. Otherwise, we shut down the 
old fetcherThread and create a new one based on the new host name.

> ReplicaFetchThread may fail to create due to existing metric
> 
>
> Key: KAFKA-4811
> URL: https://issues.apache.org/jira/browse/KAFKA-4811
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0
>Reporter: Jun Rao
>
> Someone reported the following error.
> java.lang.IllegalArgumentException: A metric named 'MetricName 
> [name=connection-close-rate, group=replica-fetcher-metrics, 
> description=Connections closed per second in the window., tags={broker-id=1, 
> fetcher-id=0}]' already exists, can't register another one.
> at 
> org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
> at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
> at 
> org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
> at org.apache.kafka.common.network.Selector.(Selector.java:140)
> at 
> kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
> at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
> at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
> at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
> at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
> at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
> at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
> at 
> kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-118: Drop Support for Java 7 in Kafka 0.11

2017-02-27 Thread Ismael Juma
Thanks to everyone who voted and provided feedback. +1 (binding) from me
too.

The vote has passed with 4 binding votes (Grant, Jason, Guozhang, Ismael)
and 11 non-binding votes (Bill, Damian, Eno, Edoardo, Mickael, Bharat,
Onur, Vahid, Colin, Apurva, Tom). There were no 0 or -1 votes.

I have updated the relevant wiki pages.

Ismael

On Thu, Feb 9, 2017 at 3:31 PM, Ismael Juma  wrote:

> Hi everyone,
>
> Since everyone in the discuss thread was in favour (10 people responded),
> I would like to initiate the voting process for KIP-118: Drop Support for
> Java 7 in Kafka 0.11:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 118%3A+Drop+Support+for+Java+7+in+Kafka+0.11
>
> The vote will run for a minimum of 72 hours.
>
> Thanks,
> Ismael
>
>


[jira] [Created] (KAFKA-4811) ReplicaFetchThread may fail to create due to existing metric

2017-02-27 Thread Jun Rao (JIRA)
Jun Rao created KAFKA-4811:
--

 Summary: ReplicaFetchThread may fail to create due to existing 
metric
 Key: KAFKA-4811
 URL: https://issues.apache.org/jira/browse/KAFKA-4811
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.2.0
Reporter: Jun Rao


Someone reported the following error.

java.lang.IllegalArgumentException: A metric named 'MetricName 
[name=connection-close-rate, group=replica-fetcher-metrics, 
description=Connections closed per second in the window., tags={broker-id=1, 
fetcher-id=0}]' already exists, can't register another one.
at 
org.apache.kafka.common.metrics.Metrics.registerMetric(Metrics.java:433)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:249)
at org.apache.kafka.common.metrics.Sensor.add(Sensor.java:234)
at 
org.apache.kafka.common.network.Selector$SelectorMetrics.(Selector.java:680)
at org.apache.kafka.common.network.Selector.(Selector.java:140)
at 
kafka.server.ReplicaFetcherThread.(ReplicaFetcherThread.scala:86)
at 
kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
at 
kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
at 
kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.immutable.Map$Map1.foreach(Map.scala:109)
at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at 
kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:882)
at 
kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:700)
at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:148)
at kafka.server.KafkaApis.handle(KafkaApis.scala:84)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:62)
at java.lang.Thread.run(Thread.java:745)




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Dong Lin
Hi Jun,

In addition to the Eno's reference of why rebuild time with RAID-5 is more
expensive, another concern is that RAID-5 will fail if more than one disk
fails. JBOD is still works with 1+ disk failure and has better performance
with one disk failure. These seems like good argument for using JBOD
instead of RAID-5.

If a leader replica goes offline, the broker should first take all actions
(i.e. remove the partition from fetcher thread) as if it has received
StopReplicaRequest for this partition because the replica can no longer
work anyway. It will also respond with error to any ProduceRequest and
FetchRequest for partition. The broker notifies controller by writing
notification znode in ZK. The controller learns the disk failure event from
ZK, sends LeaderAndIsrRequest and receives LeaderAndIsrResponse to learn
that the replica is offline. The controller will then elect new leader for
this partition and sends LeaderAndIsrRequest/MetadataUpdateRequest to
relevant brokers. The broker should stop adjusting the ISR for this
partition as if the broker is already offline. I am not sure there is any
inconsistency in broker's behavior when it is leader or follower. Is there
any concern with this approach?

Thanks for catching this. I have removed that reference from the KIP.

Hi Eno,

Thank you for providing the reference of the RAID-5. In LinkedIn we have 10
disks per Kafka machine. It will not be a show-stopper operationally for
LinkedIn if we have to deploy one-broker-per-disk. On the other hand we
previously discussed the advantage of JBOD vs. one-broker-per-disk or
one-broker-per-machine. One-broker-per-disk suffers from the problems
described in the KIP and one-broker-per-machine increases the failure
caused by disk failure by 10X. Since JBOD is strictly better than either of
the two, it is also better then one-broker-per-multiple-disk which is
somewhere between one-broker-per-disk and one-broker-per-machine.

I personally think the benefits of JBOD design is worth the implementation
complexity it introduces. I would also argue that it is reasonable for
Kafka to manage this low level detail because Kafka is already exposing and
managing replication factor of its data. But whether the complexity is
worthwhile can be subjective and I can not prove my opinion. I am
contributing significant amount of time to do this KIP because Kafka
develops at LinkedIn believes it is useful and worth the effort. Yeah, it
will be useful to see what everyone else think about it.


Thanks,
Dong


On Mon, Feb 27, 2017 at 1:16 PM, Jun Rao  wrote:

> Hi, Dong,
>
> For RAID5, I am not sure the rebuild cost is a big concern. If a disk
> fails, typically an admin has to bring down the broker, replace the failed
> disk with a new one, trigger the RAID rebuild, and bring up the broker.
> This way, there is no performance impact at runtime due to rebuild. The
> benefit is that a broker doesn't fail in a hard way when there is a disk
> failure and can be brought down in a controlled way for maintenance. While
> the broker is running with a failed disk, reads may be more expensive since
> they have to be computed from the parity. However, if most reads are from
> page cache, this may not be a big issue either. So, it would be useful to
> do some tests on RAID5 before we completely rule it out.
>
> Regarding whether to remove an offline replica from the fetcher thread
> immediately. What do we do when a failed replica is a leader? Do we do
> nothing or mark the replica as not the leader immediately? Intuitively, it
> seems it's better if the broker acts consistently on a failed replica
> whether it's a leader or a follower. For ISR churns, I was just pointing
> out that if we don't send StopReplicaRequest to a broker to be shut down in
> a controlled way, then the leader will shrink ISR, expand it and shrink it
> again after the timeout.
>
> The KIP seems to still reference "
> /broker/topics/[topic]/partitions/[partitionId]/controller_managed_state".
>
> Thanks,
>
> Jun
>
> On Sat, Feb 25, 2017 at 7:49 PM, Dong Lin  wrote:
>
> > Hey Jun,
> >
> > Thanks for the suggestion. I think it is a good idea to know put created
> > flag in ZK and simply specify isNewReplica=true in LeaderAndIsrRequest if
> > repilcas was in NewReplica state. It will only fail the replica creation
> in
> > the scenario that the controller fails after
> > topic-creation/partition-reassignment/partition-number-change but before
> > actually sends out the LeaderAndIsrRequest while there is ongoing disk
> > failure, which should be pretty rare and acceptable. This should simplify
> > the design of this KIP.
> >
> > Regarding RAID-5, I think the concern with RAID-5/6 is not just about
> > performance when there is no failure. For example, RAID-5 can support up
> to
> > one disk failure and it takes time to rebuild disk after one disk
> > failure. RAID 5 implementations are susceptible to system failures
> because
> > of trends regarding array rebuild time and the chan

Re: Consumption on a explicitly (dynamically) created topic has a 5 minute delay

2017-02-27 Thread J Pai
Yes, that was going to be my last resort attempt :) I'm going to give that
a try today and see how it goes. Although it isn't that great of a
solution, I don't mind doing it since the logic resides at one single place
in our application.

-Jaikiran

On Tuesday, February 28, 2017, James Cheng  wrote:
>
>
>> On Feb 26, 2017, at 10:36 PM, Jaikiran Pai 
wrote:
>>
>> An update on one of the workaround I tried - I added some logic in our
consumption part to wait for the KafkaConsumer.partitionsFor() to return
the topic name in the list, before actually considering the KafkaConsumer
ready to consume/poll the messages. Something like this[1]. This does
successfully make the consumer wait till the presence of the topic is
confirmed. However, we still see the assignment of partitions being skipped:
>>
>> 2017-02-26 21:02:10,823 [Thread-46] DEBUG
org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor -
Skipping assignment for topic foo-bar since no metadata is available
>>
>> I looked at the code of KafkaConsumer.partitionsFor(...). It does indeed
fetch (remotely) the partitions info but it _doesn't_ (force) update the
"metadata" that the KafkaConsumer instance holds on to. This "metadata" is
what is used by the assignment logic and effectively it ends up considering
this topic absent. I am not sure why the "metadata" cannot be force updated
(internally) when a remote fetch is triggered via partitionsFor() call, I
guess there's a valid reason. In short, this approach of using
partitionsFor, although looked promising, won't work out. So, I'm going to
go back to the metadata.max.age.ms and fiddle with it a bit to keep it at a
low value (was trying to avoid this, since this then is applicable
throughout the lifetime of that consumer).
>>
>
> Jaikiran,
>
> What about
> 1) create topic
> 2) create consumer1 and do consumer1.partitionsFor() until it succeeds
> 3) close consumer1
> 4) create consumer2 and do consumer2.subscribe()
>
> -James
>
>
>>
>> [1] https://gist.github.com/jaikiran/902c1eadbfdd66466c8d8ecbd81416bf
>>
>> -Jaikiran
>> On Friday 24 February 2017 12:29 PM, Jaikiran Pai wrote:
>>> James, thank you very much for this explanation and I now understand
the situation much more clearly. I wasn't aware that the consumer's
metadata.max.age.ms could play a role in this. I was under the impression
that the 5 minute timeout is some broker level config which was triggering
this consumer group reevaluation.
>>>
>>> Perhaps changing the metadata.max.age.ms might be what we will end up
doing, but before doing that, I will check how the consumer.partitionsFor
API (the one which you noted in your reply) behaves in practice. The
javadoc on that method states that it will trigger a refresh of the
metadata if none is found for the topic. There's also a note that it throws
a TimeoutException if the metadata isn't available for the topic after a
timeout period. I'll experiment a bit with this API to see if I can be
assured of a empty list if the topic metadata (after a fetch) isn't
available. That way, I can add some logic in our application which calls
this API for a certain number of fixed times, before the consumer is
considered "ready". If it does throw a TimeoutException, I think I'll just
catch it and repeat the call for the fixed number of times.
>>>
>>> Thanks again, this was really helpful - I was running out of ideas to
come up with a clean enough workaround to this problem. Your explanation
has given me a couple of ideas which I consider clean enough to try a few
things. Once I have consistent working solution for this, I'll send a note
on what approach I settled on.
>>>
>>>
>>> -Jaikiran
>>>
>>> On Friday 24 February 2017 12:08 PM, James Cheng wrote:
> On Feb 23, 2017, at 10:03 PM, Jaikiran Pai 
wrote:
>
> (Re)posting this from the user mailing list to dev mailing list,
hoping for some inputs from the Kafka dev team:
>
> We are on Kafka 0.10.0.1 (server and client) and use Java
consumer/producer APIs. We have an application where we create Kafka topics
dynamically (using the AdminUtils Java API) and then start
producing/consuming on those topics. The issue we frequently run into is
this:
>
> 1. Application process creates a topic "foo-bar" via
AdminUtils.createTopic. This is sucessfully completed.
> 2. Same application process then creates a consumer (using new Java
consumer API) on that foo-bar topic as a next step.
> 3. The consumer that gets created in step#2 however, doesn't seem to
be enrolled in consumer group for this topic because of this (notice the
last line in the log):
>
> 2017-02-21 00:58:43,359 [Thread-6] DEBUG
org.apache.kafka.clients.consumer.KafkaConsumer - Kafka consumer created
> 2017-02-21 00:58:43,360 [Thread-6] DEBUG
org.apache.kafka.clients.consumer.KafkaConsumer - Subscribed to topic(s):
foo-bar
> 2017-02-21 00:58:43,543 [Thread-6] DEBUG
org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Received
group coordinato

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-27 Thread Jason Gustafson
If I understand correctly, the suggestion is to let headers be mutable on
the producer side basically until after they've passed through the
interceptors. That sounds like a reasonable compromise to me.

@Becket

3. It might be useful to have headers at MessageSet level as well so we can
> avoid decompression in some cases. But given this KIP is already
> complicated, I would rather leave this out of the scope and address that
> later when needed, e.g. after having batch level interceptors.


Yeah, I had the same thought. I was considering factoring the map of header
names to generated integer ids into the message set and only using the
integer ids in the individual messages. It's a bit complex though, so I
agree it's probably best left out. I guess for now if users have a lot of
headers, they should just enable compression.

-Jason


On Mon, Feb 27, 2017 at 1:16 PM, radai  wrote:

> a few comments on the KIP as it is now:
>
> 1. instead of add(Header) + add (Iterable) i suggest we use add +
> addAll. this is more in line with how java collections work and may
> therefor be more intuitive
>
> 2. common user code dealing with headers will want get("someKey") /
> set("someKey"), or equivalent. code using multiple headers under the same
> key will be rare, and code iterating over all headers would be even rarer
> (probably only for kafka-connect equivalent use cases, really). as the API
> looks right now, the most common and trivial cases will be gnarly:
>get("someKey") -->
> record.headers().headers("someKey").iterator().next().value(). and this is
> before i start talking about how nulls/emptys are handled.
>replace("someKey") -->
> record.headers().remove(record.headers().headers("someKey"));
> record.headers().append(new Header("someKey", value));
>
> this is why i think we should start with get()/set() which are single-value
> map semantics (so set overwrites), then add getAll() (multi-map), append()
> etc on top. make the common case pretty.
>
> On Sun, Feb 26, 2017 at 2:01 AM, Michael Pearce 
> wrote:
>
> > Hi Becket,
> >
> > On 1)
> >
> > Yes truly we wanted mutable headers also. Alas we couldn't think of a
> > solution would address Jason's point around, once a record is sent it
> > shouldn't be possible to mutate it, for cases where you send twice the
> same
> > record.
> >
> > Thank you so much for your solution i think this will work very nicely :)
> >
> > Agreed we only need to do mutable to immutable conversion
> >
> > I think you solution with a ".close()" taken from else where in the kafka
> > protocol where mutability is existent is a great solution, and happy
> middle
> > ground.
> >
> > @Jason you agree, this resolves your concerns if we had mutable headers?
> >
> >
> > On 2)
> > Agreed, this was only added as i couldn't think of a solution to that
> > would address Jason's concern, but really didn't want to force end users
> to
> > constantly write ugly boiler plate code. If we agree on you solution for
> 1,
> > very happy to remove these.
> >
> > On 3)
> > I also would like to keep the scope of this KIP limited to Message
> Headers
> > for now, else we run the risk of not getting even these delivered for
> next
> > release and we're almost now there on getting this KIP to the state
> > everyone is happy. As you note address that later if theres the need.
> >
> >
> > Ill leave it 24hrs and update the kip if no strong objections based on
> > your solution for  1 & 2.
> >
> > Cheers
> > Mike
> >
> > __ __
> > From: Becket Qin 
> > Sent: Saturday, February 25, 2017 10:33 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
> >
> > Hey Michael,
> >
> > Thanks for the KIP. It looks good overall and it looks we only have few
> > things to agree on.
> >
> > 1. Regarding the mutability.
> >
> > I think it would be a big convenience to have headers mutable during
> > certain stage in the message life cycle for the use cases you mentioned.
> I
> > agree there is a material benefit especially given that we may have to
> > modify the headers for each message.
> >
> > That said, I also think it is fair to say that in the producer, in order
> to
> > guarantee the correctness of the entire logic, it is necessary that at
> some
> > point we need to make producer record immutable. For example we probably
> > don't want to see that users accidentally updated the headers when the
> > producer is doing the serialization or compression.
> >
> > Given that, would it be possible to make Headers to be able to switch
> from
> > mutable to immutable? We have done this for the Batch in the producer.
> For
> > example, initially the headers are mutable, but after it has gone through
> > all the interceptors, we can call Headers.close() to make it immutable
> > afterwards.
> >
> > On the consumer side, we can probably always leave the the ConsumerRecord
> > mutable because after we give the messages to the users, Kafka consumer
> > itself do

Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-27 Thread Ismael Juma
Breaking clients without a deprecation period is something we only do as a
last resort. Is there strong justification for doing it here?

Ismael

On Mon, Feb 27, 2017 at 11:28 PM, Mayuresh Gharat <
gharatmayures...@gmail.com> wrote:

> Hi Ismael,
>
> Yeah. I agree that it might break the clients if the user is using the
> kafkaPrincipal directly. But since KafkaPrincipal is also a Java Principal
> and I think, it would be a right thing to do replace the kafkaPrincipal
> with Java Principal at this stage than later.
>
> We can mention in the KIP, that it would break the clients that are using
> the KafkaPrincipal directly and they will have to use the PrincipalType
> directly, if they are using it as its only one value and use the name from
> the Principal directly or create a KafkaPrincipal from Java Principal as we
> are doing in SimpleAclAuthorizer with this KIP.
>
> Thanks,
>
> Mayuresh
>
>
>
> On Mon, Feb 27, 2017 at 10:56 AM, Ismael Juma  wrote:
>
> > Hi Mayuresh,
> >
> > Sorry for the delay. The updated KIP states that there is no
> compatibility
> > impact, but that doesn't seem right. The fact that we changed the type of
> > Session.principal to `Principal` means that any code that expects it to
> be
> > `KafkaPrincipal` will break. Either because of declared types (likely) or
> > if it accesses `getPrincipalType` (unlikely since the value is always the
> > same). It's a bit annoying, but we should add a new field to `Session`
> with
> > the original principal. We can potentially deprecate the existing one, if
> > we're sure we don't need it (or we can leave it for now).
> >
> > Ismael
> >
> > On Mon, Feb 27, 2017 at 6:40 PM, Mayuresh Gharat <
> > gharatmayures...@gmail.com
> > > wrote:
> >
> > > Hi Ismael, Joel, Becket
> > >
> > > Would you mind taking a look at this. We require 2 more binding votes
> for
> > > the KIP to pass.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Thu, Feb 23, 2017 at 10:57 AM, Dong Lin 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > On Wed, Feb 22, 2017 at 10:52 PM, Manikumar <
> manikumar.re...@gmail.com
> > >
> > > > wrote:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > On Thu, Feb 23, 2017 at 3:27 AM, Mayuresh Gharat <
> > > > > gharatmayures...@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks a lot for the comments and reviews.
> > > > > > I agree we should log the username.
> > > > > > What I meant by creating KafkaPrincipal was, after this KIP we
> > would
> > > > not
> > > > > be
> > > > > > required to create KafkaPrincipal and if we want to maintain the
> > old
> > > > > > logging, we will have to create it as we do today.
> > > > > > I will take care that we specify the Principal name in the log.
> > > > > >
> > > > > > Thanks again for all the reviews.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > > On Wed, Feb 22, 2017 at 1:45 PM, Jun Rao 
> wrote:
> > > > > >
> > > > > > > Hi, Mayuresh,
> > > > > > >
> > > > > > > For logging the user name, we could do either way. We just need
> > to
> > > > make
> > > > > > > sure the expected user name is logged. Also, currently, we are
> > > > already
> > > > > > > creating a KafkaPrincipal on every request. +1 on the latest
> KIP.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Feb 21, 2017 at 8:05 PM, Mayuresh Gharat <
> > > > > > > gharatmayures...@gmail.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > Thanks for the comments.
> > > > > > > >
> > > > > > > > I will mention in the KIP : how this change doesn't affect
> the
> > > > > default
> > > > > > > > authorizer implementation.
> > > > > > > >
> > > > > > > > Regarding, Currently, we log the principal name in the
> request
> > > log
> > > > in
> > > > > > > > RequestChannel, which has the format of "principalType +
> > > SEPARATOR
> > > > +
> > > > > > > > name;".
> > > > > > > > It would be good if we can keep the same convention after
> this
> > > KIP.
> > > > > One
> > > > > > > way
> > > > > > > > to do that is to convert java.security.Principal to
> > > KafkaPrincipal
> > > > > for
> > > > > > > > logging the requests.
> > > > > > > > --- > This would mean we have to create a new KafkaPrincipal
> on
> > > > each
> > > > > > > > request. Would it be OK to just specify the name of the
> > > principal.
> > > > > > > > Is there any major reason, we don't want to change the
> logging
> > > > > format?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Mayuresh
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Feb 20, 2017 at 10:18 PM, Jun Rao 
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi, Mayuresh,
> > > > > > > > >
> > > > > > > > > Thanks for the updated KIP. A couple of more comments.
> > > > > > > > >
> > > > > > > > > 1. Do we convert java.security.Principal to KafkaPrincipal
> > for
> > 

Re: Consumption on a explicitly (dynamically) created topic has a 5 minute delay

2017-02-27 Thread James Cheng


> On Feb 26, 2017, at 10:36 PM, Jaikiran Pai  wrote:
> 
> An update on one of the workaround I tried - I added some logic in our 
> consumption part to wait for the KafkaConsumer.partitionsFor() to return the 
> topic name in the list, before actually considering the KafkaConsumer ready 
> to consume/poll the messages. Something like this[1]. This does successfully 
> make the consumer wait till the presence of the topic is confirmed. However, 
> we still see the assignment of partitions being skipped:
> 
> 2017-02-26 21:02:10,823 [Thread-46] DEBUG 
> org.apache.kafka.clients.consumer.internals.AbstractPartitionAssignor - 
> Skipping assignment for topic foo-bar since no metadata is available
> 
> I looked at the code of KafkaConsumer.partitionsFor(...). It does indeed 
> fetch (remotely) the partitions info but it _doesn't_ (force) update the 
> "metadata" that the KafkaConsumer instance holds on to. This "metadata" is 
> what is used by the assignment logic and effectively it ends up considering 
> this topic absent. I am not sure why the "metadata" cannot be force updated 
> (internally) when a remote fetch is triggered via partitionsFor() call, I 
> guess there's a valid reason. In short, this approach of using partitionsFor, 
> although looked promising, won't work out. So, I'm going to go back to the 
> metadata.max.age.ms and fiddle with it a bit to keep it at a low value (was 
> trying to avoid this, since this then is applicable throughout the lifetime 
> of that consumer).
> 

Jaikiran,

What about 
1) create topic
2) create consumer1 and do consumer1.partitionsFor() until it succeeds
3) close consumer1
4) create consumer2 and do consumer2.subscribe()

-James


> 
> [1] https://gist.github.com/jaikiran/902c1eadbfdd66466c8d8ecbd81416bf
> 
> -Jaikiran
> On Friday 24 February 2017 12:29 PM, Jaikiran Pai wrote:
>> James, thank you very much for this explanation and I now understand the 
>> situation much more clearly. I wasn't aware that the consumer's 
>> metadata.max.age.ms could play a role in this. I was under the impression 
>> that the 5 minute timeout is some broker level config which was triggering 
>> this consumer group reevaluation.
>> 
>> Perhaps changing the metadata.max.age.ms might be what we will end up doing, 
>> but before doing that, I will check how the consumer.partitionsFor API (the 
>> one which you noted in your reply) behaves in practice. The javadoc on that 
>> method states that it will trigger a refresh of the metadata if none is 
>> found for the topic. There's also a note that it throws a TimeoutException 
>> if the metadata isn't available for the topic after a timeout period. I'll 
>> experiment a bit with this API to see if I can be assured of a empty list if 
>> the topic metadata (after a fetch) isn't available. That way, I can add some 
>> logic in our application which calls this API for a certain number of fixed 
>> times, before the consumer is considered "ready". If it does throw a 
>> TimeoutException, I think I'll just catch it and repeat the call for the 
>> fixed number of times.
>> 
>> Thanks again, this was really helpful - I was running out of ideas to come 
>> up with a clean enough workaround to this problem. Your explanation has 
>> given me a couple of ideas which I consider clean enough to try a few 
>> things. Once I have consistent working solution for this, I'll send a note 
>> on what approach I settled on.
>> 
>> 
>> -Jaikiran
>> 
>> On Friday 24 February 2017 12:08 PM, James Cheng wrote:
 On Feb 23, 2017, at 10:03 PM, Jaikiran Pai  
 wrote:
 
 (Re)posting this from the user mailing list to dev mailing list, hoping 
 for some inputs from the Kafka dev team:
 
 We are on Kafka 0.10.0.1 (server and client) and use Java 
 consumer/producer APIs. We have an application where we create Kafka 
 topics dynamically (using the AdminUtils Java API) and then start 
 producing/consuming on those topics. The issue we frequently run into is 
 this:
 
 1. Application process creates a topic "foo-bar" via 
 AdminUtils.createTopic. This is sucessfully completed.
 2. Same application process then creates a consumer (using new Java 
 consumer API) on that foo-bar topic as a next step.
 3. The consumer that gets created in step#2 however, doesn't seem to be 
 enrolled in consumer group for this topic because of this (notice the last 
 line in the log):
 
 2017-02-21 00:58:43,359 [Thread-6] DEBUG 
 org.apache.kafka.clients.consumer.KafkaConsumer - Kafka consumer created
 2017-02-21 00:58:43,360 [Thread-6] DEBUG 
 org.apache.kafka.clients.consumer.KafkaConsumer - Subscribed to topic(s): 
 foo-bar
 2017-02-21 00:58:43,543 [Thread-6] DEBUG 
 org.apache.kafka.clients.consumer.internals.AbstractCoordinator - Received 
 group coordinator response ClientResponse(receivedTimeMs=1487667523542, 
 disconnected=false, request=ClientRequest(expectResponse=true, 

[GitHub] kafka pull request #2605: Kafka 4738:Remove generic type of class ClientStat...

2017-02-27 Thread sharad-develop
GitHub user sharad-develop opened a pull request:

https://github.com/apache/kafka/pull/2605

Kafka 4738:Remove generic type of class ClientState

Remove generic type of class ClientState. Also removed Generic type from 
TaskAssignor as it referred to same generic type of ClientState

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sharad-develop/kafka KAFKA-4738

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2605.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2605


commit 7946b2f35c0e3811ba7d706fc9c761d73ece47bb
Author: Damian Guy 
Date:   2017-01-16T19:40:47Z

MINOR: Remove unused constructor param from ProcessorStateManager

Remove applicationId parameter as it is no longer used.

Author: Damian Guy 

Reviewers: Guozhang Wang 

Closes #2385 from dguy/minor-remove-unused-param

commit 621dff22e79dc64b9a8748186dd985774044f91a
Author: Rajini Sivaram 
Date:   2017-01-17T11:16:29Z

KAFKA-4363; Documentation for sasl.jaas.config property

Author: Rajini Sivaram 

Reviewers: Ismael Juma 

Closes #2316 from rajinisivaram/KAFKA-4363

commit e3f4cdd0e249f78a7f4e8f064533bcd15eb11cbf
Author: Rajini Sivaram 
Date:   2017-01-17T12:55:07Z

KAFKA-4590; SASL/SCRAM system tests

Runs sanity test and one replication test using SASL/SCRAM.

Author: Rajini Sivaram 

Reviewers: Ewen Cheslack-Postava , Ismael Juma 


Closes #2355 from rajinisivaram/KAFKA-4590

commit 2b19ad9d8c47fb0f78a6e90d2f5711df6110bf1f
Author: Rajini Sivaram 
Date:   2017-01-17T18:42:55Z

KAFKA-4580; Use sasl.jaas.config for some system tests

Switched console_consumer, verifiable_consumer and verifiable_producer to 
use new sasl.jaas_config property instead of static JAAS configuration file 
when used with SASL_PLAINTEXT.

Author: Rajini Sivaram 

Reviewers: Ewen Cheslack-Postava , Ismael Juma 


Closes #2323 from rajinisivaram/KAFKA-4580

(cherry picked from commit 3f6c4f63c9c17424cf717ca76c74554bcf3b2e9a)
Signed-off-by: Ismael Juma 

commit 60d759a227087079e6fd270c68fd9e38441cb34a
Author: Jason Gustafson 
Date:   2017-01-17T18:42:05Z

MINOR: Some cleanups and additional testing for KIP-88

Author: Jason Gustafson 

Reviewers: Vahid Hashemian , Ismael Juma 


Closes #2383 from hachikuji/minor-cleanup-kip-88

commit c9b9acf6a8b542c2d0d825c17a4a20cf3fa5
Author: Damian Guy 
Date:   2017-01-17T20:33:11Z

KAFKA-4588: Wait for topics to be created in 
QueryableStateIntegrationTest.shouldNotMakeStoreAvailableUntilAllStoresAvailable

After debugging this i can see the times that it fails there is a race 
between when the topic is actually created/ready on the broker and when the 
assignment happens. When it fails `StreamPartitionAssignor.assign(..)` gets 
called with a `Cluster` with no topics. Hence the test hangs as no tasks get 
assigned. To fix this I added a `waitForTopics` method to 
`EmbeddedKafkaCluster`. This will wait until the topics have been created.

Author: Damian Guy 

Reviewers: Matthias J. Sax, Guozhang Wang

Closes #2371 from dguy/integration-test-fix

(cherry picked from commit 825f225bc5706b16af8ec44ca47ee1452c11e6f3)
Signed-off-by: Guozhang Wang 

commit 6f72a5a53c444278187fa6be58031168bcaffb26
Author: Damian Guy 
Date:   2017-01-17T22:13:46Z

KAFKA-3452 Follow-up: Refactoring StateStore hierarchies

This is a follow up of https://github.com/apache/kafka/pull/2166 - 
refactoring the store hierarchies as requested

Author: Damian Guy 

Reviewers: Guozhang Wang 

Closes #2360 from dguy/state-store-refactor

(cherry picked from commit 73b7ae0019d387407375f3865e263225c986a6ce)
Signed-off-by: Guozhang Wang 

commit eb62e5695506ae13bd37102c3c08e8a067eca0c8
Author: Ismael Juma 
Date:   2017-01-18T02:43:10Z

KAFKA-4591; Create Topic Policy follow-up

1. Added javadoc to public classes
2. Removed `s` from config name for consistency with interface name
3. The policy interface now implements Configurable and AutoCloseable as 
per the KIP
4. Use `null` instead of `-1` in `RequestMetadata`
5. Perform all broker validation before invoking the policy
6. Add tests

Author: Ismael Juma 

Reviewers: Jason Gustafson 

Closes #2388 from ijuma/create-topic-policy-docs-and-config-name-change

(cherry picked from commit fd6d7bcf335166a524dc9a29a50c96af8f1c1c02)
Signed-off-by: Ismael Juma 

commit e38794e020951adec5a5d0bbfe42c57294bf67bd
Author: Guozhang Wang 
Date:   2017-01-18T04:29:55Z

KAFKA-3502; move RocksDB options construction to init()

In RocksDBStore, options / wOp

[jira] [Created] (KAFKA-4810) SchemaBuilder should be more lax about checking that fields are unset if they are being set to the same value

2017-02-27 Thread Ewen Cheslack-Postava (JIRA)
Ewen Cheslack-Postava created KAFKA-4810:


 Summary: SchemaBuilder should be more lax about checking that 
fields are unset if they are being set to the same value
 Key: KAFKA-4810
 URL: https://issues.apache.org/jira/browse/KAFKA-4810
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 0.10.2.0
Reporter: Ewen Cheslack-Postava


Currently SchemaBuilder is strict when checking that certain fields have not 
been set yet (e.g. version, name, doc). It just checks that the field is null. 
This is intended to protect the user from buggy code that overwrites a field 
with different values, but it's a bit too strict currently. In generic code for 
converting schemas (e.g. Converters) you will sometimes initialize a builder 
with these values (e.g. because you get a SchemaBuilder for a logical type, 
which sets name & version), but then have generic code for setting name & 
version from the source schema.

We saw this bug in practice with Confluent's AvroConverter, so it's likely it 
could trip up others as well. You can work around the issue, but it would be 
nice if exceptions were only thrown if you try to overwrite an existing value 
with a different value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Joel Koshy
>
> Lets say we sent the batch over the wire and received a
> RecordTooLargeException, how do we split it as once we add the message to
> the batch we loose the message level granularity. We will have to
> decompress, do deep iteration and split and again compress. right? This
> looks like a performance bottle neck in case of multi topic producers like
> mirror maker.
>

Yes, but these should be outliers if we do estimation on a per-topic basis
and if we target a conservative-enough compression ratio. The producer
should also avoid sending over the wire if it can be made aware of the
max-message size limit on the broker, and split if it determines that a
record exceeds the broker's config. Ideally this should be part of topic
metadata but is not - so it could be off a periodic describe-configs

(which isn't available yet). This doesn't remove the need to split and
recompress though.


> On Mon, Feb 27, 2017 at 10:51 AM, Becket Qin  wrote:
>
> > Hey Mayuresh,
> >
> > 1) The batch would be split when an RecordTooLargeException is received.
> > 2) Not lower the actual compression ratio, but lower the estimated
> > compression ratio "according to" the Actual Compression Ratio(ACR).
> >
> > An example, let's start with Estimated Compression Ratio (ECR) = 1.0. Say
> > the compression ratio of ACR is ~0.8, instead of letting the ECR dropped
> to
> > 0.8 very quickly, we only drop 0.001 every time when ACR < ECR. However,
> > once we see an ACR > ECR, we increment ECR by 0.05. If a
> > RecordTooLargeException is received, we reset the ECR back to 1.0 and
> split
> > the batch.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Mon, Feb 27, 2017 at 10:30 AM, Mayuresh Gharat <
> > gharatmayures...@gmail.com> wrote:
> >
> > > Hi Becket,
> > >
> > > Seems like an interesting idea.
> > > I had couple of questions :
> > > 1) How do we decide when the batch should be split?
> > > 2) What do you mean by slowly lowering the "actual" compression ratio?
> > > An example would really help here.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin 
> > wrote:
> > >
> > > > Hi Jay,
> > > >
> > > > Yeah, I got your point.
> > > >
> > > > I think there might be a solution which do not require adding a new
> > > > configuration. We can start from a very conservative compression
> ratio
> > > say
> > > > 1.0 and lower it very slowly according to the actual compression
> ratio
> > > > until we hit a point that we have to split a batch. At that point, we
> > > > exponentially back off on the compression ratio. The idea is somewhat
> > > like
> > > > TCP. This should help avoid frequent split.
> > > >
> > > > The upper bound of the batch size is also a little awkward today
> > because
> > > we
> > > > say the batch size is based on compressed size, but users cannot set
> it
> > > to
> > > > the max message size because that will result in oversized messages.
> > With
> > > > this change we will be able to allow the users to set the message
> size
> > to
> > > > close to max message size.
> > > >
> > > > However the downside is that there could be latency spikes in the
> > system
> > > in
> > > > this case due to the splitting, especially when there are many
> messages
> > > > need to be split at the same time. That could potentially be an issue
> > for
> > > > some users.
> > > >
> > > > What do you think about this approach?
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > >
> > > >
> > > > On Thu, Feb 23, 2017 at 1:31 PM, Jay Kreps  wrote:
> > > >
> > > > > Hey Becket,
> > > > >
> > > > > Yeah that makes sense.
> > > > >
> > > > > I agree that you'd really have to both fix the estimation (i.e.
> make
> > it
> > > > per
> > > > > topic or make it better estimate the high percentiles) AND have the
> > > > > recovery mechanism. If you are underestimating often and then
> paying
> > a
> > > > high
> > > > > recovery price that won't fly.
> > > > >
> > > > > I think you take my main point though, which is just that I hate to
> > > > exposes
> > > > > these super low level options to users because it is so hard to
> > explain
> > > > to
> > > > > people what it means and how they should set it. So if it is
> possible
> > > to
> > > > > make either some combination of better estimation and splitting or
> > > better
> > > > > tolerance of overage that would be preferrable.
> > > > >
> > > > > -Jay
> > > > >
> > > > > On Thu, Feb 23, 2017 at 11:51 AM, Becket Qin  >
> > > > wrote:
> > > > >
> > > > > > @Dong,
> > > > > >
> > > > > > Thanks for the comments. The default behavior of the producer
> won't
> > > > > change.
> > > > > > If the users want to use the uncompressed message size, they
> > probably
> > > > > will
> > > > > > also bump up the batch size to somewhere

[jira] [Commented] (KAFKA-4778) OOM on kafka-streams instances with high numbers of unreaped Record classes

2017-02-27 Thread Guozhang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886778#comment-15886778
 ] 

Guozhang Wang commented on KAFKA-4778:
--

[~peoplemerge] What I was looking for is actually the StreamsConfig you used in 
your app, not the broker side configs.

Also I'd like to know if you are using any state stores (seems so as you 
mentioned rocksDB on local SSD), and if yes how they are being used, for 
example for joins / aggregations / others?

>From the jhat diagram that 

class java.io.EOFException  501606  20064240
class java.util.ArrayDeque  501941  8031056

And the number of consumer records are high I'd suspect that the apps did 
encounter some exceptions and hence were restarting while not clearly cleaning 
up the previous instances. To validate if this is the case, I'd ask a few 
questions:

1. Did you set an UncaughtExceptionHandler at your KafkaStreams object, and if 
yes, did you close the instance and restart it?
2. Did you observe multiple DEBUG log entries with {{Starting Kafka Stream 
process.}} at app's log files?

> OOM on kafka-streams instances with high numbers of unreaped Record classes
> ---
>
> Key: KAFKA-4778
> URL: https://issues.apache.org/jira/browse/KAFKA-4778
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer, streams
>Affects Versions: 0.10.1.1
> Environment: AWS m3.large Ubuntu 16.04.1 LTS.  rocksDB on local SSD.  
> Kafka has 3 zk, 5 brokers.  
> stream processors are run with:
> -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails
> Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
> Stream processors written in scala 2.11.8
>Reporter: Dave Thomas
> Attachments: oom-killer.txt
>
>
> We have a stream processing app with ~8 source/sink stages operating roughly 
> at the rate of 500k messages ingested/day (~4M across the 8 stages).
> We get OOM eruptions once every ~18 hours. Note it is Linux triggering the 
> OOM-killer, not the JVM terminating itself. 
> It may be worth noting that stream processing uses ~50 mbytes while 
> processing normally for hours on end, until the problem surfaces; then in 
> ~20-30 sec memory grows suddenly from under 100 mbytes to >1 gig and does not 
> shrink until the process is killed.
> We are using supervisor to restart the instances.  Sometimes, the process 
> dies immediately once stream processing resumes for the same reason, a 
> process which could continue for minutes or hours.  This extended window has 
> enabled us to capture a heap dump using jmap.
> jhat's histogram feature reveals the following top objects in memory:
> Class Instance Count  Total Size
> class [B  4070487 867857833
> class [Ljava.lang.Object; 2066986 268036184
> class [C  539519  92010932
> class [S  1003290 80263028
> class [I  508208  77821516
> class java.nio.HeapByteBuffer 1506943 58770777
> class org.apache.kafka.common.record.Record   1506783 36162792
> class org.apache.kafka.clients.consumer.ConsumerRecord528652  35948336
> class org.apache.kafka.common.record.MemoryRecords$RecordsIterator501742  
> 32613230
> class org.apache.kafka.common.record.LogEntry 2009373 32149968
> class org.xerial.snappy.SnappyInputStream 501600  20565600
> class java.io.DataInputStream 501742  20069680
> class java.io.EOFException501606  20064240
> class java.util.ArrayDeque501941  8031056
> class java.lang.Long  516463  4131704
> Note high on the list include org.apache.kafka.common.record.Record, 
> org.apache.kafka.clients.consumer.ConsumerRecord,
> org.apache.kafka.common.record.MemoryRecords$RecordsIterator,
> org.apache.kafka.common.record.LogEntry
> All of these contain 500k-1.5M instances.
> There is nothing in stream processing logs that is distinctive (log levels 
> are still at default).
> Could it be references (weak, phantom, etc) causing these instances to not be 
> garbage-collected?
> Edit: to request a full heap dump (created using `jmap 
> -dump:format=b,file=`), contact me directly at opensou...@peoplemerge.com.  
> It is 2G.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka-site pull request #48: Filled in og meta tags

2017-02-27 Thread derrickdoo
GitHub user derrickdoo opened a pull request:

https://github.com/apache/kafka-site/pull/48

Filled in og meta tags

Set og meta tags so sharing links to places like LinkedIn, Facebook, 
Twitter gives people good defaults.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/derrickdoo/kafka-site og-meta-tags

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka-site/pull/48.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #48






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-27 Thread Mayuresh Gharat
Hi Ismael,

Yeah. I agree that it might break the clients if the user is using the
kafkaPrincipal directly. But since KafkaPrincipal is also a Java Principal
and I think, it would be a right thing to do replace the kafkaPrincipal
with Java Principal at this stage than later.

We can mention in the KIP, that it would break the clients that are using
the KafkaPrincipal directly and they will have to use the PrincipalType
directly, if they are using it as its only one value and use the name from
the Principal directly or create a KafkaPrincipal from Java Principal as we
are doing in SimpleAclAuthorizer with this KIP.

Thanks,

Mayuresh



On Mon, Feb 27, 2017 at 10:56 AM, Ismael Juma  wrote:

> Hi Mayuresh,
>
> Sorry for the delay. The updated KIP states that there is no compatibility
> impact, but that doesn't seem right. The fact that we changed the type of
> Session.principal to `Principal` means that any code that expects it to be
> `KafkaPrincipal` will break. Either because of declared types (likely) or
> if it accesses `getPrincipalType` (unlikely since the value is always the
> same). It's a bit annoying, but we should add a new field to `Session` with
> the original principal. We can potentially deprecate the existing one, if
> we're sure we don't need it (or we can leave it for now).
>
> Ismael
>
> On Mon, Feb 27, 2017 at 6:40 PM, Mayuresh Gharat <
> gharatmayures...@gmail.com
> > wrote:
>
> > Hi Ismael, Joel, Becket
> >
> > Would you mind taking a look at this. We require 2 more binding votes for
> > the KIP to pass.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Thu, Feb 23, 2017 at 10:57 AM, Dong Lin  wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Wed, Feb 22, 2017 at 10:52 PM, Manikumar  >
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > On Thu, Feb 23, 2017 at 3:27 AM, Mayuresh Gharat <
> > > > gharatmayures...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks a lot for the comments and reviews.
> > > > > I agree we should log the username.
> > > > > What I meant by creating KafkaPrincipal was, after this KIP we
> would
> > > not
> > > > be
> > > > > required to create KafkaPrincipal and if we want to maintain the
> old
> > > > > logging, we will have to create it as we do today.
> > > > > I will take care that we specify the Principal name in the log.
> > > > >
> > > > > Thanks again for all the reviews.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > > On Wed, Feb 22, 2017 at 1:45 PM, Jun Rao  wrote:
> > > > >
> > > > > > Hi, Mayuresh,
> > > > > >
> > > > > > For logging the user name, we could do either way. We just need
> to
> > > make
> > > > > > sure the expected user name is logged. Also, currently, we are
> > > already
> > > > > > creating a KafkaPrincipal on every request. +1 on the latest KIP.
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 21, 2017 at 8:05 PM, Mayuresh Gharat <
> > > > > > gharatmayures...@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Thanks for the comments.
> > > > > > >
> > > > > > > I will mention in the KIP : how this change doesn't affect the
> > > > default
> > > > > > > authorizer implementation.
> > > > > > >
> > > > > > > Regarding, Currently, we log the principal name in the request
> > log
> > > in
> > > > > > > RequestChannel, which has the format of "principalType +
> > SEPARATOR
> > > +
> > > > > > > name;".
> > > > > > > It would be good if we can keep the same convention after this
> > KIP.
> > > > One
> > > > > > way
> > > > > > > to do that is to convert java.security.Principal to
> > KafkaPrincipal
> > > > for
> > > > > > > logging the requests.
> > > > > > > --- > This would mean we have to create a new KafkaPrincipal on
> > > each
> > > > > > > request. Would it be OK to just specify the name of the
> > principal.
> > > > > > > Is there any major reason, we don't want to change the logging
> > > > format?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Mayuresh
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Feb 20, 2017 at 10:18 PM, Jun Rao 
> > > wrote:
> > > > > > >
> > > > > > > > Hi, Mayuresh,
> > > > > > > >
> > > > > > > > Thanks for the updated KIP. A couple of more comments.
> > > > > > > >
> > > > > > > > 1. Do we convert java.security.Principal to KafkaPrincipal
> for
> > > > > > > > authorization check in SimpleAclAuthorizer? If so, it would
> be
> > > > useful
> > > > > > to
> > > > > > > > mention that in the wiki so that people can understand how
> this
> > > > > change
> > > > > > > > doesn't affect the default authorizer implementation.
> > > > > > > >
> > > > > > > > 2. Currently, we log the principal name in the request log in
> > > > > > > > RequestChannel, which has the format of "principalType +
> > > SEPARATOR
> > > > +
> > > > > > > > name;".
> > > > > > > > It would be good if we can keep the same convention after
> th

[jira] [Updated] (KAFKA-4809) docker/run_tests.sh should set up /opt/kafka-dev to be the source directory

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Description: {{docker/run_tests.sh }} should be able to run 
{{sanity_checks/test_verifiable_producer.py}} against version 0.8.2.2.  
Currently, it can't do that, because the Docker image configures the contents 
of {{/opt/kafka-dev}} differently than the Vagrant images are configured.  
Currently, {{run_tests.sh}} decompresses the {{releaseTarGz}} in that 
directory.  But it should simply be the source directory.  This would also make 
it unnecessary to copy the {{releaseTarGz}} around.  (was: 
{{docker/run_tests.sh }}should be able to run 
{{sanity_checks/test_verifiable_producer.py}} against version 0.8.2.2.  
Currently, it can't do that, because the Docker image configures the contents 
of {{/opt/kafka-dev}} differently than the Vagrant images are configured.)

> docker/run_tests.sh should set up /opt/kafka-dev to be the source directory
> ---
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{docker/run_tests.sh }} should be able to run 
> {{sanity_checks/test_verifiable_producer.py}} against version 0.8.2.2.  
> Currently, it can't do that, because the Docker image configures the contents 
> of {{/opt/kafka-dev}} differently than the Vagrant images are configured.  
> Currently, {{run_tests.sh}} decompresses the {{releaseTarGz}} in that 
> directory.  But it should simply be the source directory.  This would also 
> make it unnecessary to copy the {{releaseTarGz}} around.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4809) docker/run_tests.sh should set up /opt/kafka-dev the same way Vagrant does

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Summary: docker/run_tests.sh should set up /opt/kafka-dev the same way 
Vagrant does  (was: docker/run_tests.sh should be able to run 
sanity_checks/test_verifiable_producer.py against version 0.8.2.2)

> docker/run_tests.sh should set up /opt/kafka-dev the same way Vagrant does
> --
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{docker/run_tests.sh }}should be able to run 
> {{sanity_checks/test_verifiable_producer.py}} against version 0.8.2.2.  
> Currently, it can't do that, because the Docker image configures the contents 
> of {{/opt/kafka-dev}} differently than the Vagrant images are configured.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4809) docker/run_tests.sh should set up /opt/kafka-dev to be the source directory

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Summary: docker/run_tests.sh should set up /opt/kafka-dev to be the source 
directory  (was: docker/run_tests.sh should set up /opt/kafka-dev the same way 
Vagrant does)

> docker/run_tests.sh should set up /opt/kafka-dev to be the source directory
> ---
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{docker/run_tests.sh }}should be able to run 
> {{sanity_checks/test_verifiable_producer.py}} against version 0.8.2.2.  
> Currently, it can't do that, because the Docker image configures the contents 
> of {{/opt/kafka-dev}} differently than the Vagrant images are configured.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4809) docker/run_tests.sh should be able to run sanity_checks/test_verifiable_producer.py against version 0.8.2.2

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Summary: docker/run_tests.sh should be able to run 
sanity_checks/test_verifiable_producer.py against version 0.8.2.2  (was: 
sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8)

> docker/run_tests.sh should be able to run 
> sanity_checks/test_verifiable_producer.py against version 0.8.2.2
> ---
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{sanity_checks/test_verifiable_producer.py}} should test the latest 0.10.0, 
> 0.10.1 and 0.10.2 releases.  It should also not test 0.8.2.2, since there 
> appears to be no {{VerifiableProducer}} class in Kafka 0.8.2.2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4809) docker/run_tests.sh should be able to run sanity_checks/test_verifiable_producer.py against version 0.8.2.2

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Description: {{docker/run_tests.sh }}should be able to run 
{{sanity_checks/test_verifiable_producer.py}} against version 0.8.2.2.  
Currently, it can't do that, because the Docker image configures the contents 
of {{/opt/kafka-dev}} differently than the Vagrant images are configured.  
(was: docker/run_tests.sh should be able to run 
sanity_checks/test_verifiable_producer.py against version 0.8.2.2.  Currently, 
it can't do that, because the Docker image configures the contents of 
/opt/kafka-dev differently than the Vagrant images are configured.)

> docker/run_tests.sh should be able to run 
> sanity_checks/test_verifiable_producer.py against version 0.8.2.2
> ---
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{docker/run_tests.sh }}should be able to run 
> {{sanity_checks/test_verifiable_producer.py}} against version 0.8.2.2.  
> Currently, it can't do that, because the Docker image configures the contents 
> of {{/opt/kafka-dev}} differently than the Vagrant images are configured.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4809) docker/run_tests.sh should be able to run sanity_checks/test_verifiable_producer.py against version 0.8.2.2

2017-02-27 Thread Colin P. McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886715#comment-15886715
 ] 

Colin P. McCabe commented on KAFKA-4809:


Thanks for the background, [~ewencp].

I guess the question is why this succeeds on Vagrant, and fails under docker.  
Looks like it's because the Vagrant images simply put the contents of the dev 
directory into {{/opt/kafka-dev}}, whereas the Docker image currently puts the 
decompressed {{releaseTarGz}} file there.  To fix this, the Docker image should 
do the same thing that Vagrant does.  I'll update the pull request.

> docker/run_tests.sh should be able to run 
> sanity_checks/test_verifiable_producer.py against version 0.8.2.2
> ---
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> docker/run_tests.sh should be able to run 
> sanity_checks/test_verifiable_producer.py against version 0.8.2.2.  
> Currently, it can't do that, because the Docker image configures the contents 
> of /opt/kafka-dev differently than the Vagrant images are configured.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4809) docker/run_tests.sh should be able to run sanity_checks/test_verifiable_producer.py against version 0.8.2.2

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Description: docker/run_tests.sh should be able to run 
sanity_checks/test_verifiable_producer.py against version 0.8.2.2.  Currently, 
it can't do that, because the Docker image configures the contents of 
/opt/kafka-dev differently than the Vagrant images are configured.  (was: 
{{sanity_checks/test_verifiable_producer.py}} should test the latest 0.10.0, 
0.10.1 and 0.10.2 releases.  It should also not test 0.8.2.2, since there 
appears to be no {{VerifiableProducer}} class in Kafka 0.8.2.2.)

> docker/run_tests.sh should be able to run 
> sanity_checks/test_verifiable_producer.py against version 0.8.2.2
> ---
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> docker/run_tests.sh should be able to run 
> sanity_checks/test_verifiable_producer.py against version 0.8.2.2.  
> Currently, it can't do that, because the Docker image configures the contents 
> of /opt/kafka-dev differently than the Vagrant images are configured.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Jenkins build is back to normal : kafka-trunk-jdk8 #1306

2017-02-27 Thread Apache Jenkins Server
See 




[jira] [Commented] (KAFKA-4794) Add access to OffsetStorageReader from SourceConnector

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886680#comment-15886680
 ] 

ASF GitHub Bot commented on KAFKA-4794:
---

GitHub user fhussonnois opened a pull request:

https://github.com/apache/kafka/pull/2604

KAFKA-4794: Add access to OffsetStorageReader from SourceConnector

This a first attempt to implement Add access to OffsetStorageReader from 
Source Connector.
I am not sure if I did it right. I prefer to take your feedbacks before 
writing some tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhussonnois/kafka KAFKA-4794

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2604


commit e2bb682afd052cf1c83c68933c1165b584a0acc9
Author: Florian Hussonnois 
Date:   2017-02-27T22:35:33Z

KAFKA-4794: Add access to OffsetStorageReader from SourceConnector




> Add access to OffsetStorageReader from SourceConnector
> --
>
> Key: KAFKA-4794
> URL: https://issues.apache.org/jira/browse/KAFKA-4794
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 0.10.2.0
>Reporter: Florian Hussonnois
>Priority: Minor
>
> Currently the offsets storage is only accessible from SourceTask to able to 
> initialize properly tasks after a restart, a crash or a reconfiguration 
> request.
> To implement more complex connectors that need to track the progression of 
> each task it would helpful to have access to an OffsetStorageReader instance 
> from the SourceConnector.
> In that way, we could have a background thread that could request a tasks 
> reconfiguration based on source offsets.
> This improvement proposal comes from a customer project that needs to 
> periodically scan directories on a shared storage for detecting and for 
> streaming new files into Kafka.
> The connector implementation is pretty straightforward.
> The connector uses a background thread to periodically scan directories. When 
> new inputs files are detected a tasks reconfiguration is requested. Then the 
> connector assigns a file subset to each task. 
> Each task stores sources offsets for the last sent record. The source offsets 
> data are:
>  - the size of file
>  - the bytes offset
>  - the bytes size 
> Tasks become idle when the assigned files are completed (in : 
> recordBytesOffsets + recordBytesSize = fileBytesSize).
> Then, the connector should be able to track offsets for each assigned file. 
> When all tasks has finished the connector can stop them or assigned new files 
> by requesting tasks reconfiguration. 
> Moreover, another advantage of monitoring source offsets from the connector 
> is detect slow or failed tasks and if necessary to be able to restart all 
> tasks.
> If you think this improvement is OK, I can work a pull request.
> Thanks,



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2604: KAFKA-4794: Add access to OffsetStorageReader from...

2017-02-27 Thread fhussonnois
GitHub user fhussonnois opened a pull request:

https://github.com/apache/kafka/pull/2604

KAFKA-4794: Add access to OffsetStorageReader from SourceConnector

This a first attempt to implement Add access to OffsetStorageReader from 
Source Connector.
I am not sure if I did it right. I prefer to take your feedbacks before 
writing some tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhussonnois/kafka KAFKA-4794

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2604.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2604


commit e2bb682afd052cf1c83c68933c1165b584a0acc9
Author: Florian Hussonnois 
Date:   2017-02-27T22:35:33Z

KAFKA-4794: Add access to OffsetStorageReader from SourceConnector




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Eno Thereska
RAID-10's code is much simpler (just stripe plus mirror) and under failure the 
recovery is much faster since it just has to read from a mirror, not several 
disks to reconstruct the data. Of course, the price paid is that mirroring is 
more expensive in terms of storage space. 

E.g., see discussion at 
https://community.spiceworks.com/topic/1155094-raid-10-and-raid-6-is-either-one-really-better-than-the-other
 
.
 So yes, if you can afford the space, go for RAID-10.

If utilising storage space well is what you care about, nothing beats utilising 
the JBOD disks one-by-one (while replicating at a higher level as Kafka does). 
However, there is now more complexity for Kafka.

Dong, how many disks you you typically expect in a JBOD? 12 or 24 or higher? 
Are we absolutely sure that running 2-3 brokers/JBOD is a show-stopper 
operationally? I guess that would increase the rolling restart time (more 
brokers), but it would be great if we could have a conclusive strong argument 
against it. I don't have operational experience with Kafka, so I don't have a 
strong opinion, but is everyone else convinced? 

Eno

> On 27 Feb 2017, at 22:10, Jun Rao  wrote:
> 
> Hi, Eno,
> 
> Thanks for the pointers. Doesn't RAID-10 have a similar issue during
> rebuild? In both cases, all data on existing disks have to be read during
> rebuild? RAID-10 seems to still be used widely.
> 
> Jun
> 
> On Mon, Feb 27, 2017 at 1:38 PM, Eno Thereska 
> wrote:
> 
>> Unfortunately RAID-5/6 is not typically advised anymore due to failure
>> issues, as Dong mentions, e.g.: http://www.zdnet.com/article/
>> why-raid-6-stops-working-in-2019/ > why-raid-6-stops-working-in-2019/>
>> 
>> Eno
>> 
>> 
>>> On 27 Feb 2017, at 21:16, Jun Rao  wrote:
>>> 
>>> Hi, Dong,
>>> 
>>> For RAID5, I am not sure the rebuild cost is a big concern. If a disk
>>> fails, typically an admin has to bring down the broker, replace the
>> failed
>>> disk with a new one, trigger the RAID rebuild, and bring up the broker.
>>> This way, there is no performance impact at runtime due to rebuild. The
>>> benefit is that a broker doesn't fail in a hard way when there is a disk
>>> failure and can be brought down in a controlled way for maintenance.
>> While
>>> the broker is running with a failed disk, reads may be more expensive
>> since
>>> they have to be computed from the parity. However, if most reads are from
>>> page cache, this may not be a big issue either. So, it would be useful to
>>> do some tests on RAID5 before we completely rule it out.
>>> 
>>> Regarding whether to remove an offline replica from the fetcher thread
>>> immediately. What do we do when a failed replica is a leader? Do we do
>>> nothing or mark the replica as not the leader immediately? Intuitively,
>> it
>>> seems it's better if the broker acts consistently on a failed replica
>>> whether it's a leader or a follower. For ISR churns, I was just pointing
>>> out that if we don't send StopReplicaRequest to a broker to be shut down
>> in
>>> a controlled way, then the leader will shrink ISR, expand it and shrink
>> it
>>> again after the timeout.
>>> 
>>> The KIP seems to still reference "
>>> /broker/topics/[topic]/partitions/[partitionId]/
>> controller_managed_state".
>>> 
>>> Thanks,
>>> 
>>> Jun
>>> 
>>> On Sat, Feb 25, 2017 at 7:49 PM, Dong Lin  wrote:
>>> 
 Hey Jun,
 
 Thanks for the suggestion. I think it is a good idea to know put created
 flag in ZK and simply specify isNewReplica=true in LeaderAndIsrRequest
>> if
 repilcas was in NewReplica state. It will only fail the replica
>> creation in
 the scenario that the controller fails after
 topic-creation/partition-reassignment/partition-number-change but
>> before
 actually sends out the LeaderAndIsrRequest while there is ongoing disk
 failure, which should be pretty rare and acceptable. This should
>> simplify
 the design of this KIP.
 
 Regarding RAID-5, I think the concern with RAID-5/6 is not just about
 performance when there is no failure. For example, RAID-5 can support
>> up to
 one disk failure and it takes time to rebuild disk after one disk
 failure. RAID 5 implementations are susceptible to system failures
>> because
 of trends regarding array rebuild time and the chance of drive failure
 during rebuild. There is no such performance degradation for JBOD and
>> JBOD
 can support multiple log directory failure without reducing performance
>> of
 good log directories. Would this be a reasonable reason for using JBOD
 instead of RAID-5/6?
 
 Previously we discussed wether broker should remove offline replica from
 replica fetcher thread. I still think it should do it instead of
>> printing a
 lot of error in the log4j log. We can still let controller send
 StopReplicaRequest to the broker. I am not s

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Jun Rao
Hi, Eno,

Thanks for the pointers. Doesn't RAID-10 have a similar issue during
rebuild? In both cases, all data on existing disks have to be read during
rebuild? RAID-10 seems to still be used widely.

Jun

On Mon, Feb 27, 2017 at 1:38 PM, Eno Thereska 
wrote:

> Unfortunately RAID-5/6 is not typically advised anymore due to failure
> issues, as Dong mentions, e.g.: http://www.zdnet.com/article/
> why-raid-6-stops-working-in-2019/  why-raid-6-stops-working-in-2019/>
>
> Eno
>
>
> > On 27 Feb 2017, at 21:16, Jun Rao  wrote:
> >
> > Hi, Dong,
> >
> > For RAID5, I am not sure the rebuild cost is a big concern. If a disk
> > fails, typically an admin has to bring down the broker, replace the
> failed
> > disk with a new one, trigger the RAID rebuild, and bring up the broker.
> > This way, there is no performance impact at runtime due to rebuild. The
> > benefit is that a broker doesn't fail in a hard way when there is a disk
> > failure and can be brought down in a controlled way for maintenance.
> While
> > the broker is running with a failed disk, reads may be more expensive
> since
> > they have to be computed from the parity. However, if most reads are from
> > page cache, this may not be a big issue either. So, it would be useful to
> > do some tests on RAID5 before we completely rule it out.
> >
> > Regarding whether to remove an offline replica from the fetcher thread
> > immediately. What do we do when a failed replica is a leader? Do we do
> > nothing or mark the replica as not the leader immediately? Intuitively,
> it
> > seems it's better if the broker acts consistently on a failed replica
> > whether it's a leader or a follower. For ISR churns, I was just pointing
> > out that if we don't send StopReplicaRequest to a broker to be shut down
> in
> > a controlled way, then the leader will shrink ISR, expand it and shrink
> it
> > again after the timeout.
> >
> > The KIP seems to still reference "
> > /broker/topics/[topic]/partitions/[partitionId]/
> controller_managed_state".
> >
> > Thanks,
> >
> > Jun
> >
> > On Sat, Feb 25, 2017 at 7:49 PM, Dong Lin  wrote:
> >
> >> Hey Jun,
> >>
> >> Thanks for the suggestion. I think it is a good idea to know put created
> >> flag in ZK and simply specify isNewReplica=true in LeaderAndIsrRequest
> if
> >> repilcas was in NewReplica state. It will only fail the replica
> creation in
> >> the scenario that the controller fails after
> >> topic-creation/partition-reassignment/partition-number-change but
> before
> >> actually sends out the LeaderAndIsrRequest while there is ongoing disk
> >> failure, which should be pretty rare and acceptable. This should
> simplify
> >> the design of this KIP.
> >>
> >> Regarding RAID-5, I think the concern with RAID-5/6 is not just about
> >> performance when there is no failure. For example, RAID-5 can support
> up to
> >> one disk failure and it takes time to rebuild disk after one disk
> >> failure. RAID 5 implementations are susceptible to system failures
> because
> >> of trends regarding array rebuild time and the chance of drive failure
> >> during rebuild. There is no such performance degradation for JBOD and
> JBOD
> >> can support multiple log directory failure without reducing performance
> of
> >> good log directories. Would this be a reasonable reason for using JBOD
> >> instead of RAID-5/6?
> >>
> >> Previously we discussed wether broker should remove offline replica from
> >> replica fetcher thread. I still think it should do it instead of
> printing a
> >> lot of error in the log4j log. We can still let controller send
> >> StopReplicaRequest to the broker. I am not sure I undertand why allowing
> >> broker to remove offline replica from fetcher thread will increase
> churns
> >> in ISR. Do you think this is concern with this approach?
> >>
> >> I have updated the KIP to remove created flag from ZK and change the
> filed
> >> name to isNewReplica. Can you check if there is any issue with the
> latest
> >> KIP? Thanks for your time!
> >>
> >> Regards,
> >> Dong
> >>
> >>
> >> On Sat, Feb 25, 2017 at 9:11 AM, Jun Rao  wrote:
> >>
> >>> Hi, Dong,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> Personally, I'd prefer not to write the created flag per replica in ZK.
> >>> Your suggestion of disabling replica creation if there is a bad log
> >>> directory on the broker could work. The only thing is that it may delay
> >> the
> >>> creation of new replicas. I was thinking that an alternative is to
> extend
> >>> LeaderAndIsrRequest by adding a isNewReplica field per replica. That
> >> field
> >>> will be set when a replica is transitioning from the NewReplica state
> to
> >>> Online state. Then, when a broker receives a LeaderAndIsrRequest, if a
> >>> replica is marked as the new replica, it will be created on a good log
> >>> directory, if not already present. Otherwise, it only creates the
> replica
> >>> if all log directories are good and the replica is not already present.
> >>> This w

[jira] [Commented] (KAFKA-4809) sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8

2017-02-27 Thread Ewen Cheslack-Postava (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886610#comment-15886610
 ] 

Ewen Cheslack-Postava commented on KAFKA-4809:
--

The reason we want to make sure it works properly against 0.8.2 builds is 
because it didn't exist there but we wanted to do upgrade testing. The upgrade 
testing didn't get implemented until 0.9, but to do the input/output validation 
we needed that class. So the scripts pull in the tools jar from a newer version 
and use them against the older version of Kafka. We took care in doing this 
such that the only thing being used from the newer version is that class.

We could certainly test more versions (though as this is just a basic sanity 
check I'm not sure we need more than one case for each major version), but 
since 0.8.2 requires special handling it's probably the most important of 
these. (Although I also think most of the things under sanity_checks should be 
disabled by default since they get exercised by the "real" tests anyway.)

> sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8
> 
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{sanity_checks/test_verifiable_producer.py}} should test the latest 0.10.0, 
> 0.10.1 and 0.10.2 releases.  It should also not test 0.8.2.2, since there 
> appears to be no {{VerifiableProducer}} class in Kafka 0.8.2.2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Eno Thereska
Unfortunately RAID-5/6 is not typically advised anymore due to failure issues, 
as Dong mentions, e.g.: 
http://www.zdnet.com/article/why-raid-6-stops-working-in-2019/ 


Eno


> On 27 Feb 2017, at 21:16, Jun Rao  wrote:
> 
> Hi, Dong,
> 
> For RAID5, I am not sure the rebuild cost is a big concern. If a disk
> fails, typically an admin has to bring down the broker, replace the failed
> disk with a new one, trigger the RAID rebuild, and bring up the broker.
> This way, there is no performance impact at runtime due to rebuild. The
> benefit is that a broker doesn't fail in a hard way when there is a disk
> failure and can be brought down in a controlled way for maintenance. While
> the broker is running with a failed disk, reads may be more expensive since
> they have to be computed from the parity. However, if most reads are from
> page cache, this may not be a big issue either. So, it would be useful to
> do some tests on RAID5 before we completely rule it out.
> 
> Regarding whether to remove an offline replica from the fetcher thread
> immediately. What do we do when a failed replica is a leader? Do we do
> nothing or mark the replica as not the leader immediately? Intuitively, it
> seems it's better if the broker acts consistently on a failed replica
> whether it's a leader or a follower. For ISR churns, I was just pointing
> out that if we don't send StopReplicaRequest to a broker to be shut down in
> a controlled way, then the leader will shrink ISR, expand it and shrink it
> again after the timeout.
> 
> The KIP seems to still reference "
> /broker/topics/[topic]/partitions/[partitionId]/controller_managed_state".
> 
> Thanks,
> 
> Jun
> 
> On Sat, Feb 25, 2017 at 7:49 PM, Dong Lin  wrote:
> 
>> Hey Jun,
>> 
>> Thanks for the suggestion. I think it is a good idea to know put created
>> flag in ZK and simply specify isNewReplica=true in LeaderAndIsrRequest if
>> repilcas was in NewReplica state. It will only fail the replica creation in
>> the scenario that the controller fails after
>> topic-creation/partition-reassignment/partition-number-change but before
>> actually sends out the LeaderAndIsrRequest while there is ongoing disk
>> failure, which should be pretty rare and acceptable. This should simplify
>> the design of this KIP.
>> 
>> Regarding RAID-5, I think the concern with RAID-5/6 is not just about
>> performance when there is no failure. For example, RAID-5 can support up to
>> one disk failure and it takes time to rebuild disk after one disk
>> failure. RAID 5 implementations are susceptible to system failures because
>> of trends regarding array rebuild time and the chance of drive failure
>> during rebuild. There is no such performance degradation for JBOD and JBOD
>> can support multiple log directory failure without reducing performance of
>> good log directories. Would this be a reasonable reason for using JBOD
>> instead of RAID-5/6?
>> 
>> Previously we discussed wether broker should remove offline replica from
>> replica fetcher thread. I still think it should do it instead of printing a
>> lot of error in the log4j log. We can still let controller send
>> StopReplicaRequest to the broker. I am not sure I undertand why allowing
>> broker to remove offline replica from fetcher thread will increase churns
>> in ISR. Do you think this is concern with this approach?
>> 
>> I have updated the KIP to remove created flag from ZK and change the filed
>> name to isNewReplica. Can you check if there is any issue with the latest
>> KIP? Thanks for your time!
>> 
>> Regards,
>> Dong
>> 
>> 
>> On Sat, Feb 25, 2017 at 9:11 AM, Jun Rao  wrote:
>> 
>>> Hi, Dong,
>>> 
>>> Thanks for the reply.
>>> 
>>> Personally, I'd prefer not to write the created flag per replica in ZK.
>>> Your suggestion of disabling replica creation if there is a bad log
>>> directory on the broker could work. The only thing is that it may delay
>> the
>>> creation of new replicas. I was thinking that an alternative is to extend
>>> LeaderAndIsrRequest by adding a isNewReplica field per replica. That
>> field
>>> will be set when a replica is transitioning from the NewReplica state to
>>> Online state. Then, when a broker receives a LeaderAndIsrRequest, if a
>>> replica is marked as the new replica, it will be created on a good log
>>> directory, if not already present. Otherwise, it only creates the replica
>>> if all log directories are good and the replica is not already present.
>>> This way, we don't delay the processing of new replicas in the common
>> case.
>>> 
>>> I am ok with not persisting the offline replicas in ZK and just
>> discovering
>>> them through the LeaderAndIsrRequest. It handles the cases when a broker
>>> starts up with bad log directories better. So, the additional overhead of
>>> rediscovering the offline replicas is justified.
>>> 
>>> 
>>> Another high level question. The proposal rejected RAID5/6 since it adds
>>> additional I/Os. 

Re: [DISCUSS] KIP-82 - Add Record Headers

2017-02-27 Thread radai
a few comments on the KIP as it is now:

1. instead of add(Header) + add (Iterable) i suggest we use add +
addAll. this is more in line with how java collections work and may
therefor be more intuitive

2. common user code dealing with headers will want get("someKey") /
set("someKey"), or equivalent. code using multiple headers under the same
key will be rare, and code iterating over all headers would be even rarer
(probably only for kafka-connect equivalent use cases, really). as the API
looks right now, the most common and trivial cases will be gnarly:
   get("someKey") -->
record.headers().headers("someKey").iterator().next().value(). and this is
before i start talking about how nulls/emptys are handled.
   replace("someKey") -->
record.headers().remove(record.headers().headers("someKey"));
record.headers().append(new Header("someKey", value));

this is why i think we should start with get()/set() which are single-value
map semantics (so set overwrites), then add getAll() (multi-map), append()
etc on top. make the common case pretty.

On Sun, Feb 26, 2017 at 2:01 AM, Michael Pearce 
wrote:

> Hi Becket,
>
> On 1)
>
> Yes truly we wanted mutable headers also. Alas we couldn't think of a
> solution would address Jason's point around, once a record is sent it
> shouldn't be possible to mutate it, for cases where you send twice the same
> record.
>
> Thank you so much for your solution i think this will work very nicely :)
>
> Agreed we only need to do mutable to immutable conversion
>
> I think you solution with a ".close()" taken from else where in the kafka
> protocol where mutability is existent is a great solution, and happy middle
> ground.
>
> @Jason you agree, this resolves your concerns if we had mutable headers?
>
>
> On 2)
> Agreed, this was only added as i couldn't think of a solution to that
> would address Jason's concern, but really didn't want to force end users to
> constantly write ugly boiler plate code. If we agree on you solution for 1,
> very happy to remove these.
>
> On 3)
> I also would like to keep the scope of this KIP limited to Message Headers
> for now, else we run the risk of not getting even these delivered for next
> release and we're almost now there on getting this KIP to the state
> everyone is happy. As you note address that later if theres the need.
>
>
> Ill leave it 24hrs and update the kip if no strong objections based on
> your solution for  1 & 2.
>
> Cheers
> Mike
>
> __ __
> From: Becket Qin 
> Sent: Saturday, February 25, 2017 10:33 PM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-82 - Add Record Headers
>
> Hey Michael,
>
> Thanks for the KIP. It looks good overall and it looks we only have few
> things to agree on.
>
> 1. Regarding the mutability.
>
> I think it would be a big convenience to have headers mutable during
> certain stage in the message life cycle for the use cases you mentioned. I
> agree there is a material benefit especially given that we may have to
> modify the headers for each message.
>
> That said, I also think it is fair to say that in the producer, in order to
> guarantee the correctness of the entire logic, it is necessary that at some
> point we need to make producer record immutable. For example we probably
> don't want to see that users accidentally updated the headers when the
> producer is doing the serialization or compression.
>
> Given that, would it be possible to make Headers to be able to switch from
> mutable to immutable? We have done this for the Batch in the producer. For
> example, initially the headers are mutable, but after it has gone through
> all the interceptors, we can call Headers.close() to make it immutable
> afterwards.
>
> On the consumer side, we can probably always leave the the ConsumerRecord
> mutable because after we give the messages to the users, Kafka consumer
> itself does not care about whether the headers are modified or not anymore.
>
> So far I think we only need to do the mutable to immutable conversion. If
> there are use case require immutable to mutable conversion, we may need
> something more than a closable.
>
> 2. If we agree on what mentioned above, I think it probably makes sense to
> put the addHeaders()/removeHeaders() methods into Headers class and just
> leave the headers() method in ProducerRecord and ConsumerRecord.
>
> 3. It might be useful to have headers at MessageSet level as well so we can
> avoid decompression in some cases. But given this KIP is already
> complicated, I would rather leave this out of the scope and address that
> later when needed, e.g. after having batch level interceptors.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Feb 24, 2017 at 3:56 PM, Michael Pearce 
> wrote:
>
> > KIP updated in response to the below comments:
> >
> >> 1. Is the intent of `Headers.filter` to include or exclude the
> > headers
> > > matching the key? Can you add a javadoc to clarify?
> > > 2. The KIP mentions 

Re: [DISCUSS] KIP-112: Handle disk failure for JBOD

2017-02-27 Thread Jun Rao
Hi, Dong,

For RAID5, I am not sure the rebuild cost is a big concern. If a disk
fails, typically an admin has to bring down the broker, replace the failed
disk with a new one, trigger the RAID rebuild, and bring up the broker.
This way, there is no performance impact at runtime due to rebuild. The
benefit is that a broker doesn't fail in a hard way when there is a disk
failure and can be brought down in a controlled way for maintenance. While
the broker is running with a failed disk, reads may be more expensive since
they have to be computed from the parity. However, if most reads are from
page cache, this may not be a big issue either. So, it would be useful to
do some tests on RAID5 before we completely rule it out.

Regarding whether to remove an offline replica from the fetcher thread
immediately. What do we do when a failed replica is a leader? Do we do
nothing or mark the replica as not the leader immediately? Intuitively, it
seems it's better if the broker acts consistently on a failed replica
whether it's a leader or a follower. For ISR churns, I was just pointing
out that if we don't send StopReplicaRequest to a broker to be shut down in
a controlled way, then the leader will shrink ISR, expand it and shrink it
again after the timeout.

The KIP seems to still reference "
/broker/topics/[topic]/partitions/[partitionId]/controller_managed_state".

Thanks,

Jun

On Sat, Feb 25, 2017 at 7:49 PM, Dong Lin  wrote:

> Hey Jun,
>
> Thanks for the suggestion. I think it is a good idea to know put created
> flag in ZK and simply specify isNewReplica=true in LeaderAndIsrRequest if
> repilcas was in NewReplica state. It will only fail the replica creation in
> the scenario that the controller fails after
> topic-creation/partition-reassignment/partition-number-change but before
> actually sends out the LeaderAndIsrRequest while there is ongoing disk
> failure, which should be pretty rare and acceptable. This should simplify
> the design of this KIP.
>
> Regarding RAID-5, I think the concern with RAID-5/6 is not just about
> performance when there is no failure. For example, RAID-5 can support up to
> one disk failure and it takes time to rebuild disk after one disk
> failure. RAID 5 implementations are susceptible to system failures because
> of trends regarding array rebuild time and the chance of drive failure
> during rebuild. There is no such performance degradation for JBOD and JBOD
> can support multiple log directory failure without reducing performance of
> good log directories. Would this be a reasonable reason for using JBOD
> instead of RAID-5/6?
>
> Previously we discussed wether broker should remove offline replica from
> replica fetcher thread. I still think it should do it instead of printing a
> lot of error in the log4j log. We can still let controller send
> StopReplicaRequest to the broker. I am not sure I undertand why allowing
> broker to remove offline replica from fetcher thread will increase churns
> in ISR. Do you think this is concern with this approach?
>
> I have updated the KIP to remove created flag from ZK and change the filed
> name to isNewReplica. Can you check if there is any issue with the latest
> KIP? Thanks for your time!
>
> Regards,
> Dong
>
>
> On Sat, Feb 25, 2017 at 9:11 AM, Jun Rao  wrote:
>
> > Hi, Dong,
> >
> > Thanks for the reply.
> >
> > Personally, I'd prefer not to write the created flag per replica in ZK.
> > Your suggestion of disabling replica creation if there is a bad log
> > directory on the broker could work. The only thing is that it may delay
> the
> > creation of new replicas. I was thinking that an alternative is to extend
> > LeaderAndIsrRequest by adding a isNewReplica field per replica. That
> field
> > will be set when a replica is transitioning from the NewReplica state to
> > Online state. Then, when a broker receives a LeaderAndIsrRequest, if a
> > replica is marked as the new replica, it will be created on a good log
> > directory, if not already present. Otherwise, it only creates the replica
> > if all log directories are good and the replica is not already present.
> > This way, we don't delay the processing of new replicas in the common
> case.
> >
> > I am ok with not persisting the offline replicas in ZK and just
> discovering
> > them through the LeaderAndIsrRequest. It handles the cases when a broker
> > starts up with bad log directories better. So, the additional overhead of
> > rediscovering the offline replicas is justified.
> >
> >
> > Another high level question. The proposal rejected RAID5/6 since it adds
> > additional I/Os. The main issue with RAID5 is that to write a block that
> > doesn't match the RAID stripe size, we have to first read the old parity
> to
> > compute the new one, which increases the number of I/Os (
> > http://rickardnobel.se/raid-5-write-penalty/). I am wondering if you
> have
> > tested RAID5's performance by creating a file system whose block size
> > matches the RAID stripe size (https://www.perco

[jira] [Commented] (KAFKA-3990) Kafka New Producer may raise an OutOfMemoryError

2017-02-27 Thread Adrian McCague (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-3990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886466#comment-15886466
 ] 

Adrian McCague commented on KAFKA-3990:
---

For what it's worth witnessing this as well, the received size is always 
352,518,912 bytes - still trying to track down exactly where that's coming from.

Using Streams and brokers: 0.10.1.1

{code}
2017-02-27 19:28:16 ERROR KafkaThread:30 - Uncaught exception in 
kafka-producer-network-thread | 
confluent.monitoring.interceptor.app-2-StreamThread-2-producer: 
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[?:1.8.0_112]
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[?:1.8.0_112]
at 
org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
 ~[kafka-clients-0.10.1.1.jar:?]
at 
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71) 
~[kafka-clients-0.10.1.1.jar:?]
at 
org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:154) 
~[kafka-clients-0.10.1.1.jar:?]
at 
org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:135) 
~[kafka-clients-0.10.1.1.jar:?]
at 
org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:343) 
~[kafka-clients-0.10.1.1.jar:?]
at org.apache.kafka.common.network.Selector.poll(Selector.java:291) 
~[kafka-clients-0.10.1.1.jar:?]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:260) 
~[kafka-clients-0.10.1.1.jar:?]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:236) 
~[kafka-clients-0.10.1.1.jar:?]
at 
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:135) 
~[kafka-clients-0.10.1.1.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
{code}

Sometimes the exception message is shown as the consumer (from the producer 
thread):
{code}
Uncaught exception in kafka-producer-network-thread | 
confluent.monitoring.interceptor.app-2-StreamThread-2-consumer
{code}

The payload comes on the first processed message when the application is 
started, and then all is fine for some time. We have seen it trigger later in 
the logs but not linked it to anything.
Will update if I can find anything out, and I will investigate to see if we 
have any other services bound to the broker port as other comments have found.

> Kafka New Producer may raise an OutOfMemoryError
> 
>
> Key: KAFKA-3990
> URL: https://issues.apache.org/jira/browse/KAFKA-3990
> Project: Kafka
>  Issue Type: Bug
>  Components: clients
>Affects Versions: 0.9.0.1
> Environment: Docker, Base image : CentOS
> Java 8u77
> Marathon
>Reporter: Brice Dutheil
> Attachments: app-producer-config.log, kafka-broker-logs.zip
>
>
> We are regularly seeing OOME errors on a kafka producer, we first saw :
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) ~[na:1.8.0_77]
> at java.nio.ByteBuffer.allocate(ByteBuffer.java:335) ~[na:1.8.0_77]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:93)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)
>  ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.common.network.Selector.poll(Selector.java:286) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:256) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:216) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at 
> org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:128) 
> ~[kafka-clients-0.9.0.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
> {code}
> This line refer to a buffer allocation {{ByteBuffer.allocate(receiveSize)}} 
> (see 
> https://github.com/apache/kafka/blob/0.9.0.1/clients/src/main/java/org/apache/kafka/common/network/NetworkReceive.java#L93)
> Usually the app runs fine within 200/400 MB heap and a 64 MB Metaspace. And 
> we are producing small messages 500B at most.
> Also the error don't appear on the devlopment environment, in order to 
> identify the issue we tweaked the code to give us actual data of the 
> allocation size, we got this stack :
> {code}
> 09:55:49.484 [auth] [kafka-producer-network-thread | producer-1] WARN  
> o.a.k.c.n.NetworkRece

[GitHub] kafka pull request #2603: MINOR: Minor reduce unnecessary calls to time.mill...

2017-02-27 Thread enothereska
GitHub user enothereska opened a pull request:

https://github.com/apache/kafka/pull/2603

MINOR: Minor reduce unnecessary calls to time.millisecond (part 2)

Avoid calling time.milliseconds more often than necessary. Cleaning and 
committing logic can use the timestamp at the start of the loop with minimal 
consequences.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/enothereska/kafka minor-reduce-milliseconds2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2603.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2603


commit 4a99cef1e8021287f0d0bf1b0bad2d72ff92d666
Author: Eno Thereska 
Date:   2017-02-27T19:39:24Z

Reduce number of time.millisecond calls further

commit d06d7a2328a9c39f9e21ffb6bc6881803078829a
Author: Eno Thereska 
Date:   2017-02-27T19:58:24Z

Added time to API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4809) sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886429#comment-15886429
 ] 

ASF GitHub Bot commented on KAFKA-4809:
---

GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2602

KAFKA-4809: sanity_checks/test_verifiable_producer.py should test 0.1…

…0.x and not 0.8

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2602


commit 00387d142aa139ad99f997f382c30561124df5eb
Author: Colin P. Mccabe 
Date:   2017-02-27T20:01:38Z

KAFKA-4809: sanity_checks/test_verifiable_producer.py should test 0.10.x 
and not 0.8




> sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8
> 
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{sanity_checks/test_verifiable_producer.py}} should test the latest 0.10.0, 
> 0.10.1 and 0.10.2 releases.  It should also not test 0.8.2.2, since there 
> appears to be no {{VerifiableProducer}} class in Kafka 0.8.2.2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2602: KAFKA-4809: sanity_checks/test_verifiable_producer...

2017-02-27 Thread cmccabe
GitHub user cmccabe opened a pull request:

https://github.com/apache/kafka/pull/2602

KAFKA-4809: sanity_checks/test_verifiable_producer.py should test 0.1…

…0.x and not 0.8

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cmccabe/kafka KAFKA-4809

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/2602.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2602


commit 00387d142aa139ad99f997f382c30561124df5eb
Author: Colin P. Mccabe 
Date:   2017-02-27T20:01:38Z

KAFKA-4809: sanity_checks/test_verifiable_producer.py should test 0.10.x 
and not 0.8




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (KAFKA-4809) sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Summary: sanity_checks/test_verifiable_producer.py should test 0.10.x and 
not 0.8  (was: sanity_checks/test_verifiable_producer.py should test 0.10.1 and 
0.10.2 and not 0.8)

> sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8
> 
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{sanity_checks/test_verifiable_producer.py}} should test the latest 0.10.1 
> and 0.10.2 releases.  It should also not test 0.8.2.2, since there appears to 
> be no {{VerifiableProducer}} class in Kafka 0.8.2.2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4809) sanity_checks/test_verifiable_producer.py should test 0.10.1 and 0.10.2 and not 0.8

2017-02-27 Thread Colin P. McCabe (JIRA)
Colin P. McCabe created KAFKA-4809:
--

 Summary: sanity_checks/test_verifiable_producer.py should test 
0.10.1 and 0.10.2 and not 0.8
 Key: KAFKA-4809
 URL: https://issues.apache.org/jira/browse/KAFKA-4809
 Project: Kafka
  Issue Type: Bug
Reporter: Colin P. McCabe
Assignee: Colin P. McCabe


{{sanity_checks/test_verifiable_producer.py}} should test the latest 0.10.1 and 
0.10.2 releases.  It should also not test 0.8.2.2, since there appears to be no 
{{VerifiableProducer}} class in Kafka 0.8.2.2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (KAFKA-4809) sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8

2017-02-27 Thread Colin P. McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin P. McCabe updated KAFKA-4809:
---
Description: {{sanity_checks/test_verifiable_producer.py}} should test the 
latest 0.10.0, 0.10.1 and 0.10.2 releases.  It should also not test 0.8.2.2, 
since there appears to be no {{VerifiableProducer}} class in Kafka 0.8.2.2.  
(was: {{sanity_checks/test_verifiable_producer.py}} should test the latest 
0.10.1 and 0.10.2 releases.  It should also not test 0.8.2.2, since there 
appears to be no {{VerifiableProducer}} class in Kafka 0.8.2.2.)

> sanity_checks/test_verifiable_producer.py should test 0.10.x and not 0.8
> 
>
> Key: KAFKA-4809
> URL: https://issues.apache.org/jira/browse/KAFKA-4809
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
>
> {{sanity_checks/test_verifiable_producer.py}} should test the latest 0.10.0, 
> 0.10.1 and 0.10.2 releases.  It should also not test 0.8.2.2, since there 
> appears to be no {{VerifiableProducer}} class in Kafka 0.8.2.2.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Kafka MirrorMaker issues

2017-02-27 Thread Le Cyberian
Hi Kafka Gurus :)

I am facing issues with KafkaMirror, I am using Kafka 0.10.1.1 and trying
to use mirroring to create backup of kafka logs or perhaps this might be a
great idea to do it, please let me know if its not.

my consumer.properties:

bootstrap.servers=localhost:9092
group.id=mirror

producer.properties:

bootstrap.servers=localhost:19092
compression.type=lz4

Using below command:

./kafka-mirror-maker.sh  --new.consumer --consumer.config
../config/consumer.properties --producer.config
../config/producer.properties --whitelist 'test-topic'
--abort.on.send.failure true

If i use consumer-groups on the source kafka which is consumer

kafka-consumer-groups.sh --new-consumer --describe --group mirror
--bootstrap-server localhost:9092

Output is:

GROUP  TOPIC  PARTITION
 CURRENT-OFFSET  LOG-END-OFFSET  LAG OWNER
mirror test-topic 0  unknown
  2756unknown mirror-0_/192.168.191.4

I don't see it on my producer neither the topics folder in kafka.data.dir

Can you please help me point out what i am doing wrong here?

Thanks!

Lee


Re: [DISCUSS] KIP-128: Add ByteArrayConverter for Kafka Connect

2017-02-27 Thread Guozhang Wang
Thanks Ewen,

"use the corresponding serializer internally and just add in the extra
conversion
steps for the data API" sounds good to me.

Guozhang


On Mon, Feb 27, 2017 at 8:24 AM, Ewen Cheslack-Postava 
wrote:

> It's a different interface that's being implemented. The functionality is
> the same (since it's just a simple pass through), but we intentionally
> named Converters differently than Serializers since they do more work than
> Serializers (besides the normal serialization they also need to convert
> between  and the Connect Data API.
>
> We could certainly reuse/extend that class instead, though I'm not sure
> there's much benefit in that and since they implement different interfaces
> and this is Connect-specific, it will probably be clearer to have it under
> a Connect package. Note that for other Converters the pattern we've used is
> to use the corresponding serializer internally and just add in the extra
> conversion steps for the data API.
>
> -Ewen
>
> On Sat, Feb 25, 2017 at 6:52 PM, Guozhang Wang  wrote:
>
> > I'm wondering why we can't just use ByteArarySerde in o.a.k.common?
> >
> > Guozhang
> >
> > On Sat, Feb 25, 2017 at 2:25 PM, Ewen Cheslack-Postava <
> e...@confluent.io>
> > wrote:
> >
> > > Hi all,
> > >
> > > I've added a pretty trivial KIP for adding a pass-through Converter for
> > > Kafka Connect, similar to ByteArraySerializer/Deserializer.
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 128%3A+Add+ByteArrayConverter+for+Kafka+Connect
> > >
> > > This wasn't added with the framework originally because the idea was to
> > > deal with structured data for the most part. However, we've seen a
> couple
> > > of use cases arise as the framework got more traction and I think it
> > makes
> > > sense to provide this out of the box now so people stop reinventing the
> > > wheel (and using a different fully-qualified class name) for each
> > connector
> > > that needs this functionality.
> > >
> > > I imagine this will be a rather uncontroversial addition, so if I don't
> > see
> > > any comments in the next day or two I'll just start the vote thread.
> > >
> > > -Ewen
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang


[jira] [Commented] (KAFKA-4808) send of null key to a compacted topic should throw error back to user

2017-02-27 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886365#comment-15886365
 ] 

Mayuresh Gharat commented on KAFKA-4808:


[~ijuma] Sure we can provide a better error message if we have a separate 
Error_Code. It looks like a subclass of InvalidRequestException as its indeed 
an invalid produce request for that topic.
I will work on the KIP.

Thanks,

Mayuresh

> send of null key to a compacted topic should throw error back to user
> -
>
> Key: KAFKA-4808
> URL: https://issues.apache.org/jira/browse/KAFKA-4808
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.0
>Reporter: Ismael Juma
>Assignee: Mayuresh Gharat
> Fix For: 0.10.3.0
>
>
> If a message with a null key is produced to a compacted topic, the broker 
> returns `CorruptRecordException`, which is a retriable exception. As such, 
> the producer keeps retrying until retries are exhausted or request.timeout.ms 
> expires and eventually throws a TimeoutException. This is confusing and not 
> user-friendly.
> We should throw a meaningful error back to the user. From an implementation 
> perspective, we would have to use a non retriable error code to avoid this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4744) Streams_bounce test failing occassionally

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886354#comment-15886354
 ] 

ASF GitHub Bot commented on KAFKA-4744:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2601


> Streams_bounce test failing occassionally
> -
>
> Key: KAFKA-4744
> URL: https://issues.apache.org/jira/browse/KAFKA-4744
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
> Fix For: 0.10.3.0
>
>
> The test occasionally fails, e.g., in 
> https://jenkins.confluent.io/job/system-test-kafka/499/console.
> The message is:
> kafkatest.tests.streams.streams_bounce_test.StreamsBounceTest.test_bounce: 
> FAIL: Streams Test process on ubuntu@worker5 took too long to exit
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/streams/streams_bounce_test.py",
>  line 72, in test_bounce
> self.processor1.stop()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/service.py",
>  line 255, in stop
> self.stop_node(node)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 77, in stop_node
> wait_until(lambda: not node.account.alive(pid), timeout_sec=60, 
> err_msg="Streams Test process on " + str(node.account) + " took too long to 
> exit")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Streams Test process on ubuntu@worker5 took too long to exit
> Looking at the logs it looks like the test succeeded, so it might be that we 
> need to slightly increase the time we wait for.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2601: KAFKA-4744: Increased timeout for bounce test

2017-02-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2601


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (KAFKA-4744) Streams_bounce test failing occassionally

2017-02-27 Thread Guozhang Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-4744.
--
Resolution: Fixed

Issue resolved by pull request 2601
[https://github.com/apache/kafka/pull/2601]

> Streams_bounce test failing occassionally
> -
>
> Key: KAFKA-4744
> URL: https://issues.apache.org/jira/browse/KAFKA-4744
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
>Reporter: Eno Thereska
> Fix For: 0.10.3.0
>
>
> The test occasionally fails, e.g., in 
> https://jenkins.confluent.io/job/system-test-kafka/499/console.
> The message is:
> kafkatest.tests.streams.streams_bounce_test.StreamsBounceTest.test_bounce: 
> FAIL: Streams Test process on ubuntu@worker5 took too long to exit
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/streams/streams_bounce_test.py",
>  line 72, in test_bounce
> self.processor1.stop()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/services/service.py",
>  line 255, in stop
> self.stop_node(node)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/services/streams.py",
>  line 77, in stop_node
> wait_until(lambda: not node.account.alive(pid), timeout_sec=60, 
> err_msg="Streams Test process on " + str(node.account) + " took too long to 
> exit")
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/utils/util.py",
>  line 36, in wait_until
> raise TimeoutError(err_msg)
> TimeoutError: Streams Test process on ubuntu@worker5 took too long to exit
> Looking at the logs it looks like the test succeeded, so it might be that we 
> need to slightly increase the time we wait for.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Becket Qin
Hey Mayuresh,

1) The batch would be split when an RecordTooLargeException is received.
2) Not lower the actual compression ratio, but lower the estimated
compression ratio "according to" the Actual Compression Ratio(ACR).

An example, let's start with Estimated Compression Ratio (ECR) = 1.0. Say
the compression ratio of ACR is ~0.8, instead of letting the ECR dropped to
0.8 very quickly, we only drop 0.001 every time when ACR < ECR. However,
once we see an ACR > ECR, we increment ECR by 0.05. If a
RecordTooLargeException is received, we reset the ECR back to 1.0 and split
the batch.

Thanks,

Jiangjie (Becket) Qin



On Mon, Feb 27, 2017 at 10:30 AM, Mayuresh Gharat <
gharatmayures...@gmail.com> wrote:

> Hi Becket,
>
> Seems like an interesting idea.
> I had couple of questions :
> 1) How do we decide when the batch should be split?
> 2) What do you mean by slowly lowering the "actual" compression ratio?
> An example would really help here.
>
> Thanks,
>
> Mayuresh
>
> On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin  wrote:
>
> > Hi Jay,
> >
> > Yeah, I got your point.
> >
> > I think there might be a solution which do not require adding a new
> > configuration. We can start from a very conservative compression ratio
> say
> > 1.0 and lower it very slowly according to the actual compression ratio
> > until we hit a point that we have to split a batch. At that point, we
> > exponentially back off on the compression ratio. The idea is somewhat
> like
> > TCP. This should help avoid frequent split.
> >
> > The upper bound of the batch size is also a little awkward today because
> we
> > say the batch size is based on compressed size, but users cannot set it
> to
> > the max message size because that will result in oversized messages. With
> > this change we will be able to allow the users to set the message size to
> > close to max message size.
> >
> > However the downside is that there could be latency spikes in the system
> in
> > this case due to the splitting, especially when there are many messages
> > need to be split at the same time. That could potentially be an issue for
> > some users.
> >
> > What do you think about this approach?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> > On Thu, Feb 23, 2017 at 1:31 PM, Jay Kreps  wrote:
> >
> > > Hey Becket,
> > >
> > > Yeah that makes sense.
> > >
> > > I agree that you'd really have to both fix the estimation (i.e. make it
> > per
> > > topic or make it better estimate the high percentiles) AND have the
> > > recovery mechanism. If you are underestimating often and then paying a
> > high
> > > recovery price that won't fly.
> > >
> > > I think you take my main point though, which is just that I hate to
> > exposes
> > > these super low level options to users because it is so hard to explain
> > to
> > > people what it means and how they should set it. So if it is possible
> to
> > > make either some combination of better estimation and splitting or
> better
> > > tolerance of overage that would be preferrable.
> > >
> > > -Jay
> > >
> > > On Thu, Feb 23, 2017 at 11:51 AM, Becket Qin 
> > wrote:
> > >
> > > > @Dong,
> > > >
> > > > Thanks for the comments. The default behavior of the producer won't
> > > change.
> > > > If the users want to use the uncompressed message size, they probably
> > > will
> > > > also bump up the batch size to somewhere close to the max message
> size.
> > > > This would be in the document. BTW the default batch size is 16K
> which
> > is
> > > > pretty small.
> > > >
> > > > @Jay,
> > > >
> > > > Yeah, we actually had debated quite a bit internally what is the best
> > > > solution to this.
> > > >
> > > > I completely agree it is a bug. In practice we usually leave some
> > > headroom
> > > > to allow the compressed size to grow a little if the the original
> > > messages
> > > > are not compressible, for example, 1000 KB instead of exactly 1 MB.
> It
> > is
> > > > likely safe enough.
> > > >
> > > > The major concern for the rejected alternative is performance. It
> > largely
> > > > depends on how frequent we need to split a batch, i.e. how likely the
> > > > estimation can go off. If we only need to the split work
> occasionally,
> > > the
> > > > cost would be amortized so we don't need to worry about it too much.
> > > > However, it looks that for a producer with shared topics, the
> > estimation
> > > is
> > > > always off. As an example, consider two topics, one with compression
> > > ratio
> > > > 0.6 the other 0.2, assuming exactly same traffic, the average
> > compression
> > > > ratio would be roughly 0.4, which is not right for either of the
> > topics.
> > > So
> > > > almost half of the batches (of the topics with 0.6 compression ratio)
> > > will
> > > > end up larger than the configured batch size. When it comes to more
> > > topics
> > > > such as mirror maker, this becomes more unpredictable. To avoid
> > frequent
> > > > rejection / split of the batches, we need to configured the batch
> size
>

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Mayuresh Gharat
Hi Becket,

Thanks for the expatiation.
Regarding :
1) The batch would be split when an RecordTooLargeException is received.

Lets say we sent the batch over the wire and received a
RecordTooLargeException, how do we split it as once we add the message to
the batch we loose the message level granularity. We will have to
decompress, do deep iteration and split and again compress. right? This
looks like a performance bottle neck in case of multi topic producers like
mirror maker.


Thanks,

Mayuresh

On Mon, Feb 27, 2017 at 10:51 AM, Becket Qin  wrote:

> Hey Mayuresh,
>
> 1) The batch would be split when an RecordTooLargeException is received.
> 2) Not lower the actual compression ratio, but lower the estimated
> compression ratio "according to" the Actual Compression Ratio(ACR).
>
> An example, let's start with Estimated Compression Ratio (ECR) = 1.0. Say
> the compression ratio of ACR is ~0.8, instead of letting the ECR dropped to
> 0.8 very quickly, we only drop 0.001 every time when ACR < ECR. However,
> once we see an ACR > ECR, we increment ECR by 0.05. If a
> RecordTooLargeException is received, we reset the ECR back to 1.0 and split
> the batch.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Mon, Feb 27, 2017 at 10:30 AM, Mayuresh Gharat <
> gharatmayures...@gmail.com> wrote:
>
> > Hi Becket,
> >
> > Seems like an interesting idea.
> > I had couple of questions :
> > 1) How do we decide when the batch should be split?
> > 2) What do you mean by slowly lowering the "actual" compression ratio?
> > An example would really help here.
> >
> > Thanks,
> >
> > Mayuresh
> >
> > On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin 
> wrote:
> >
> > > Hi Jay,
> > >
> > > Yeah, I got your point.
> > >
> > > I think there might be a solution which do not require adding a new
> > > configuration. We can start from a very conservative compression ratio
> > say
> > > 1.0 and lower it very slowly according to the actual compression ratio
> > > until we hit a point that we have to split a batch. At that point, we
> > > exponentially back off on the compression ratio. The idea is somewhat
> > like
> > > TCP. This should help avoid frequent split.
> > >
> > > The upper bound of the batch size is also a little awkward today
> because
> > we
> > > say the batch size is based on compressed size, but users cannot set it
> > to
> > > the max message size because that will result in oversized messages.
> With
> > > this change we will be able to allow the users to set the message size
> to
> > > close to max message size.
> > >
> > > However the downside is that there could be latency spikes in the
> system
> > in
> > > this case due to the splitting, especially when there are many messages
> > > need to be split at the same time. That could potentially be an issue
> for
> > > some users.
> > >
> > > What do you think about this approach?
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > >
> > >
> > > On Thu, Feb 23, 2017 at 1:31 PM, Jay Kreps  wrote:
> > >
> > > > Hey Becket,
> > > >
> > > > Yeah that makes sense.
> > > >
> > > > I agree that you'd really have to both fix the estimation (i.e. make
> it
> > > per
> > > > topic or make it better estimate the high percentiles) AND have the
> > > > recovery mechanism. If you are underestimating often and then paying
> a
> > > high
> > > > recovery price that won't fly.
> > > >
> > > > I think you take my main point though, which is just that I hate to
> > > exposes
> > > > these super low level options to users because it is so hard to
> explain
> > > to
> > > > people what it means and how they should set it. So if it is possible
> > to
> > > > make either some combination of better estimation and splitting or
> > better
> > > > tolerance of overage that would be preferrable.
> > > >
> > > > -Jay
> > > >
> > > > On Thu, Feb 23, 2017 at 11:51 AM, Becket Qin 
> > > wrote:
> > > >
> > > > > @Dong,
> > > > >
> > > > > Thanks for the comments. The default behavior of the producer won't
> > > > change.
> > > > > If the users want to use the uncompressed message size, they
> probably
> > > > will
> > > > > also bump up the batch size to somewhere close to the max message
> > size.
> > > > > This would be in the document. BTW the default batch size is 16K
> > which
> > > is
> > > > > pretty small.
> > > > >
> > > > > @Jay,
> > > > >
> > > > > Yeah, we actually had debated quite a bit internally what is the
> best
> > > > > solution to this.
> > > > >
> > > > > I completely agree it is a bug. In practice we usually leave some
> > > > headroom
> > > > > to allow the compressed size to grow a little if the the original
> > > > messages
> > > > > are not compressible, for example, 1000 KB instead of exactly 1 MB.
> > It
> > > is
> > > > > likely safe enough.
> > > > >
> > > > > The major concern for the rejected alternative is performance. It
> > > largely
> > > > > depends on how frequent we need to split a batch, i.e. how likely
> the
> > > > > estimation can go off. If we only 

Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-27 Thread Ismael Juma
Hi Mayuresh,

Sorry for the delay. The updated KIP states that there is no compatibility
impact, but that doesn't seem right. The fact that we changed the type of
Session.principal to `Principal` means that any code that expects it to be
`KafkaPrincipal` will break. Either because of declared types (likely) or
if it accesses `getPrincipalType` (unlikely since the value is always the
same). It's a bit annoying, but we should add a new field to `Session` with
the original principal. We can potentially deprecate the existing one, if
we're sure we don't need it (or we can leave it for now).

Ismael

On Mon, Feb 27, 2017 at 6:40 PM, Mayuresh Gharat  wrote:

> Hi Ismael, Joel, Becket
>
> Would you mind taking a look at this. We require 2 more binding votes for
> the KIP to pass.
>
> Thanks,
>
> Mayuresh
>
> On Thu, Feb 23, 2017 at 10:57 AM, Dong Lin  wrote:
>
> > +1 (non-binding)
> >
> > On Wed, Feb 22, 2017 at 10:52 PM, Manikumar 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > On Thu, Feb 23, 2017 at 3:27 AM, Mayuresh Gharat <
> > > gharatmayures...@gmail.com
> > > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks a lot for the comments and reviews.
> > > > I agree we should log the username.
> > > > What I meant by creating KafkaPrincipal was, after this KIP we would
> > not
> > > be
> > > > required to create KafkaPrincipal and if we want to maintain the old
> > > > logging, we will have to create it as we do today.
> > > > I will take care that we specify the Principal name in the log.
> > > >
> > > > Thanks again for all the reviews.
> > > >
> > > > Thanks,
> > > >
> > > > Mayuresh
> > > >
> > > > On Wed, Feb 22, 2017 at 1:45 PM, Jun Rao  wrote:
> > > >
> > > > > Hi, Mayuresh,
> > > > >
> > > > > For logging the user name, we could do either way. We just need to
> > make
> > > > > sure the expected user name is logged. Also, currently, we are
> > already
> > > > > creating a KafkaPrincipal on every request. +1 on the latest KIP.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Tue, Feb 21, 2017 at 8:05 PM, Mayuresh Gharat <
> > > > > gharatmayures...@gmail.com
> > > > > > wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > > > > > Thanks for the comments.
> > > > > >
> > > > > > I will mention in the KIP : how this change doesn't affect the
> > > default
> > > > > > authorizer implementation.
> > > > > >
> > > > > > Regarding, Currently, we log the principal name in the request
> log
> > in
> > > > > > RequestChannel, which has the format of "principalType +
> SEPARATOR
> > +
> > > > > > name;".
> > > > > > It would be good if we can keep the same convention after this
> KIP.
> > > One
> > > > > way
> > > > > > to do that is to convert java.security.Principal to
> KafkaPrincipal
> > > for
> > > > > > logging the requests.
> > > > > > --- > This would mean we have to create a new KafkaPrincipal on
> > each
> > > > > > request. Would it be OK to just specify the name of the
> principal.
> > > > > > Is there any major reason, we don't want to change the logging
> > > format?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Mayuresh
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Feb 20, 2017 at 10:18 PM, Jun Rao 
> > wrote:
> > > > > >
> > > > > > > Hi, Mayuresh,
> > > > > > >
> > > > > > > Thanks for the updated KIP. A couple of more comments.
> > > > > > >
> > > > > > > 1. Do we convert java.security.Principal to KafkaPrincipal for
> > > > > > > authorization check in SimpleAclAuthorizer? If so, it would be
> > > useful
> > > > > to
> > > > > > > mention that in the wiki so that people can understand how this
> > > > change
> > > > > > > doesn't affect the default authorizer implementation.
> > > > > > >
> > > > > > > 2. Currently, we log the principal name in the request log in
> > > > > > > RequestChannel, which has the format of "principalType +
> > SEPARATOR
> > > +
> > > > > > > name;".
> > > > > > > It would be good if we can keep the same convention after this
> > KIP.
> > > > One
> > > > > > way
> > > > > > > to do that is to convert java.security.Principal to
> > KafkaPrincipal
> > > > for
> > > > > > > logging the requests.
> > > > > > >
> > > > > > > Jun
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 17, 2017 at 5:35 PM, Mayuresh Gharat <
> > > > > > > gharatmayures...@gmail.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > I have updated the KIP. Would you mind taking another look?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Mayuresh
> > > > > > > >
> > > > > > > > On Fri, Feb 17, 2017 at 4:42 PM, Mayuresh Gharat <
> > > > > > > > gharatmayures...@gmail.com
> > > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Jun,
> > > > > > > > >
> > > > > > > > > Sure sounds good to me.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Mayuresh
> > > > > > > > >
> > > > > > > > > On Fri, Feb 17, 2017 at 1:54 PM, Jun Rao  >
> > > > wr

[jira] [Commented] (KAFKA-4808) send of null key to a compacted topic should throw error back to user

2017-02-27 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886297#comment-15886297
 ] 

Ismael Juma commented on KAFKA-4808:


[~mgharat], I think a separate error code would be a better long-term solution 
because we can then provide a specific error message. And yes, it would require 
a simple KIP. What do you think?

> send of null key to a compacted topic should throw error back to user
> -
>
> Key: KAFKA-4808
> URL: https://issues.apache.org/jira/browse/KAFKA-4808
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.0
>Reporter: Ismael Juma
>Assignee: Mayuresh Gharat
> Fix For: 0.10.3.0
>
>
> If a message with a null key is produced to a compacted topic, the broker 
> returns `CorruptRecordException`, which is a retriable exception. As such, 
> the producer keeps retrying until retries are exhausted or request.timeout.ms 
> expires and eventually throws a TimeoutException. This is confusing and not 
> user-friendly.
> We should throw a meaningful error back to the user. From an implementation 
> perspective, we would have to use a non retriable error code to avoid this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (KAFKA-4808) send of null key to a compacted topic should throw error back to user

2017-02-27 Thread Mayuresh Gharat (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mayuresh Gharat reassigned KAFKA-4808:
--

Assignee: Mayuresh Gharat

> send of null key to a compacted topic should throw error back to user
> -
>
> Key: KAFKA-4808
> URL: https://issues.apache.org/jira/browse/KAFKA-4808
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.0
>Reporter: Ismael Juma
>Assignee: Mayuresh Gharat
> Fix For: 0.10.3.0
>
>
> If a message with a null key is produced to a compacted topic, the broker 
> returns `CorruptRecordException`, which is a retriable exception. As such, 
> the producer keeps retrying until retries are exhausted or request.timeout.ms 
> expires and eventually throws a TimeoutException. This is confusing and not 
> user-friendly.
> We should throw a meaningful error back to the user. From an implementation 
> perspective, we would have to use a non retriable error code to avoid this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4808) send of null key to a compacted topic should throw error back to user

2017-02-27 Thread Mayuresh Gharat (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886288#comment-15886288
 ] 

Mayuresh Gharat commented on KAFKA-4808:


[~ijuma] I was thinking if throwing "INVALID_REQUEST" exception should work 
here, or do we need to add a new Exception type (which would require a KIP)?

> send of null key to a compacted topic should throw error back to user
> -
>
> Key: KAFKA-4808
> URL: https://issues.apache.org/jira/browse/KAFKA-4808
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.0
>Reporter: Ismael Juma
> Fix For: 0.10.3.0
>
>
> If a message with a null key is produced to a compacted topic, the broker 
> returns `CorruptRecordException`, which is a retriable exception. As such, 
> the producer keeps retrying until retries are exhausted or request.timeout.ms 
> expires and eventually throws a TimeoutException. This is confusing and not 
> user-friendly.
> We should throw a meaningful error back to the user. From an implementation 
> perspective, we would have to use a non retriable error code to avoid this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: [VOTE] KIP-111 Kafka should preserve the Principal generated by the PrincipalBuilder while processing the request received on socket channel, on the broker.

2017-02-27 Thread Mayuresh Gharat
Hi Ismael, Joel, Becket

Would you mind taking a look at this. We require 2 more binding votes for
the KIP to pass.

Thanks,

Mayuresh

On Thu, Feb 23, 2017 at 10:57 AM, Dong Lin  wrote:

> +1 (non-binding)
>
> On Wed, Feb 22, 2017 at 10:52 PM, Manikumar 
> wrote:
>
> > +1 (non-binding)
> >
> > On Thu, Feb 23, 2017 at 3:27 AM, Mayuresh Gharat <
> > gharatmayures...@gmail.com
> > > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks a lot for the comments and reviews.
> > > I agree we should log the username.
> > > What I meant by creating KafkaPrincipal was, after this KIP we would
> not
> > be
> > > required to create KafkaPrincipal and if we want to maintain the old
> > > logging, we will have to create it as we do today.
> > > I will take care that we specify the Principal name in the log.
> > >
> > > Thanks again for all the reviews.
> > >
> > > Thanks,
> > >
> > > Mayuresh
> > >
> > > On Wed, Feb 22, 2017 at 1:45 PM, Jun Rao  wrote:
> > >
> > > > Hi, Mayuresh,
> > > >
> > > > For logging the user name, we could do either way. We just need to
> make
> > > > sure the expected user name is logged. Also, currently, we are
> already
> > > > creating a KafkaPrincipal on every request. +1 on the latest KIP.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Tue, Feb 21, 2017 at 8:05 PM, Mayuresh Gharat <
> > > > gharatmayures...@gmail.com
> > > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for the comments.
> > > > >
> > > > > I will mention in the KIP : how this change doesn't affect the
> > default
> > > > > authorizer implementation.
> > > > >
> > > > > Regarding, Currently, we log the principal name in the request log
> in
> > > > > RequestChannel, which has the format of "principalType + SEPARATOR
> +
> > > > > name;".
> > > > > It would be good if we can keep the same convention after this KIP.
> > One
> > > > way
> > > > > to do that is to convert java.security.Principal to KafkaPrincipal
> > for
> > > > > logging the requests.
> > > > > --- > This would mean we have to create a new KafkaPrincipal on
> each
> > > > > request. Would it be OK to just specify the name of the principal.
> > > > > Is there any major reason, we don't want to change the logging
> > format?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mayuresh
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Feb 20, 2017 at 10:18 PM, Jun Rao 
> wrote:
> > > > >
> > > > > > Hi, Mayuresh,
> > > > > >
> > > > > > Thanks for the updated KIP. A couple of more comments.
> > > > > >
> > > > > > 1. Do we convert java.security.Principal to KafkaPrincipal for
> > > > > > authorization check in SimpleAclAuthorizer? If so, it would be
> > useful
> > > > to
> > > > > > mention that in the wiki so that people can understand how this
> > > change
> > > > > > doesn't affect the default authorizer implementation.
> > > > > >
> > > > > > 2. Currently, we log the principal name in the request log in
> > > > > > RequestChannel, which has the format of "principalType +
> SEPARATOR
> > +
> > > > > > name;".
> > > > > > It would be good if we can keep the same convention after this
> KIP.
> > > One
> > > > > way
> > > > > > to do that is to convert java.security.Principal to
> KafkaPrincipal
> > > for
> > > > > > logging the requests.
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 17, 2017 at 5:35 PM, Mayuresh Gharat <
> > > > > > gharatmayures...@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > I have updated the KIP. Would you mind taking another look?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Mayuresh
> > > > > > >
> > > > > > > On Fri, Feb 17, 2017 at 4:42 PM, Mayuresh Gharat <
> > > > > > > gharatmayures...@gmail.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Jun,
> > > > > > > >
> > > > > > > > Sure sounds good to me.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Mayuresh
> > > > > > > >
> > > > > > > > On Fri, Feb 17, 2017 at 1:54 PM, Jun Rao 
> > > wrote:
> > > > > > > >
> > > > > > > >> Hi, Mani,
> > > > > > > >>
> > > > > > > >> Good point on using PrincipalBuilder for SASL. It seems that
> > > > > > > >> PrincipalBuilder already has access to Authenticator. So, we
> > > could
> > > > > > just
> > > > > > > >> enable that in SaslChannelBuilder. We probably could do that
> > in
> > > a
> > > > > > > separate
> > > > > > > >> KIP?
> > > > > > > >>
> > > > > > > >> Hi, Mayuresh,
> > > > > > > >>
> > > > > > > >> If you don't think there is a concrete use case for using
> > > > > > > >> PrincipalBuilder in
> > > > > > > >> kafka-acls.sh, perhaps we could do the simpler approach for
> > now?
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Jun
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Fri, Feb 17, 2017 at 12:23 PM, Mayuresh Gharat <
> > > > > > > >> gharatmayures...@gmail.com> wrote:
> > > > > > > >>
> > > > > > > >> > @Manikumar,
> > > > > >

Re: [DISCUSS] KIP-126 - Allow KafkaProducer to batch based on uncompressed size

2017-02-27 Thread Mayuresh Gharat
Hi Becket,

Seems like an interesting idea.
I had couple of questions :
1) How do we decide when the batch should be split?
2) What do you mean by slowly lowering the "actual" compression ratio?
An example would really help here.

Thanks,

Mayuresh

On Fri, Feb 24, 2017 at 3:17 PM, Becket Qin  wrote:

> Hi Jay,
>
> Yeah, I got your point.
>
> I think there might be a solution which do not require adding a new
> configuration. We can start from a very conservative compression ratio say
> 1.0 and lower it very slowly according to the actual compression ratio
> until we hit a point that we have to split a batch. At that point, we
> exponentially back off on the compression ratio. The idea is somewhat like
> TCP. This should help avoid frequent split.
>
> The upper bound of the batch size is also a little awkward today because we
> say the batch size is based on compressed size, but users cannot set it to
> the max message size because that will result in oversized messages. With
> this change we will be able to allow the users to set the message size to
> close to max message size.
>
> However the downside is that there could be latency spikes in the system in
> this case due to the splitting, especially when there are many messages
> need to be split at the same time. That could potentially be an issue for
> some users.
>
> What do you think about this approach?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Thu, Feb 23, 2017 at 1:31 PM, Jay Kreps  wrote:
>
> > Hey Becket,
> >
> > Yeah that makes sense.
> >
> > I agree that you'd really have to both fix the estimation (i.e. make it
> per
> > topic or make it better estimate the high percentiles) AND have the
> > recovery mechanism. If you are underestimating often and then paying a
> high
> > recovery price that won't fly.
> >
> > I think you take my main point though, which is just that I hate to
> exposes
> > these super low level options to users because it is so hard to explain
> to
> > people what it means and how they should set it. So if it is possible to
> > make either some combination of better estimation and splitting or better
> > tolerance of overage that would be preferrable.
> >
> > -Jay
> >
> > On Thu, Feb 23, 2017 at 11:51 AM, Becket Qin 
> wrote:
> >
> > > @Dong,
> > >
> > > Thanks for the comments. The default behavior of the producer won't
> > change.
> > > If the users want to use the uncompressed message size, they probably
> > will
> > > also bump up the batch size to somewhere close to the max message size.
> > > This would be in the document. BTW the default batch size is 16K which
> is
> > > pretty small.
> > >
> > > @Jay,
> > >
> > > Yeah, we actually had debated quite a bit internally what is the best
> > > solution to this.
> > >
> > > I completely agree it is a bug. In practice we usually leave some
> > headroom
> > > to allow the compressed size to grow a little if the the original
> > messages
> > > are not compressible, for example, 1000 KB instead of exactly 1 MB. It
> is
> > > likely safe enough.
> > >
> > > The major concern for the rejected alternative is performance. It
> largely
> > > depends on how frequent we need to split a batch, i.e. how likely the
> > > estimation can go off. If we only need to the split work occasionally,
> > the
> > > cost would be amortized so we don't need to worry about it too much.
> > > However, it looks that for a producer with shared topics, the
> estimation
> > is
> > > always off. As an example, consider two topics, one with compression
> > ratio
> > > 0.6 the other 0.2, assuming exactly same traffic, the average
> compression
> > > ratio would be roughly 0.4, which is not right for either of the
> topics.
> > So
> > > almost half of the batches (of the topics with 0.6 compression ratio)
> > will
> > > end up larger than the configured batch size. When it comes to more
> > topics
> > > such as mirror maker, this becomes more unpredictable. To avoid
> frequent
> > > rejection / split of the batches, we need to configured the batch size
> > > pretty conservatively. This could actually hurt the performance because
> > we
> > > are shoehorn the messages that are highly compressible to a small batch
> > so
> > > that the other topics that are not that compressible will not become
> too
> > > large with the same batch size. At LinkedIn, our batch size is
> configured
> > > to 64 KB because of this. I think we may actually have better batching
> if
> > > we just use the uncompressed message size and 800 KB batch size.
> > >
> > > We did not think about loosening the message size restriction, but that
> > > sounds a viable solution given that the consumer now can fetch
> oversized
> > > messages. One concern would be that on the broker side oversized
> messages
> > > will bring more memory pressure. With KIP-92, we may mitigate that, but
> > the
> > > memory allocation for large messages may not be very GC friendly. I
> need
> > to
> > > think about this a little more.
> > >
> > > Thanks,
> >

Re: [VOTE] KIP-81: Bound Fetch memory usage in the consumer

2017-02-27 Thread Mickael Maison
Apologies for the late response.

Thanks Jason for the suggestion. Yes you are right, the Coordinator
connection is "tagged" with a different id, so we could retrieve it in
NetworkReceive to make the distinction.
However, currently the coordinator connection are made different by using:
Integer.MAX_VALUE - groupCoordinatorResponse.node().id()
for the Node id.

So to identify Coordinator connections, we'd have to check that the
NetworkReceive source is a value near Integer.MAX_VALUE which is a bit
hacky ...

Maybe we could add a constructor to Node that allows to pass in a
sourceId String. That way we could make all the coordinator
connections explicit (by setting it to "Coordinator-[ID]" for
example).
What do you think ?

On Tue, Jan 24, 2017 at 12:58 AM, Jason Gustafson  wrote:
> Good point. The consumer does use a separate connection to the coordinator,
> so perhaps the connection itself could be tagged for normal heap allocation?
>
> -Jason
>
> On Mon, Jan 23, 2017 at 10:26 AM, Onur Karaman > wrote:
>
>> I only did a quick scan but I wanted to point out what I think is an
>> incorrect assumption in the KIP's caveats:
>> "
>> There is a risk using the MemoryPool that, after we fill up the memory with
>> fetch data, we can starve the coordinator's connection
>> ...
>> To alleviate this issue, only messages larger than 1Kb will be allocated in
>> the MemoryPool. Smaller messages will be allocated directly on the Heap
>> like before. This allows group/heartbeat messages to avoid being delayed if
>> the MemoryPool fills up.
>> "
>>
>> So it sounds like there's an incorrect assumption that responses from the
>> coordinator will always be small (< 1Kb as mentioned in the caveat). There
>> are now a handful of request types between clients and the coordinator:
>> {JoinGroup, SyncGroup, LeaveGroup, Heartbeat, OffsetCommit, OffsetFetch,
>> ListGroups, DescribeGroups}. While true (at least today) for
>> HeartbeatResponse and a few others, I don't think we can assume
>> JoinGroupResponse, SyncGroupResponse, DescribeGroupsResponse, and
>> OffsetFetchResponse will be small, as they are effectively bounded by the
>> max message size allowed by the broker for the __consumer_offsets topic
>> which by default is 1MB.
>>
>> On Mon, Jan 23, 2017 at 9:46 AM, Mickael Maison 
>> wrote:
>>
>> > I've updated the KIP to address all the comments raised here and from
>> > the "DISCUSS" thread.
>> > See: https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> > 81%3A+Bound+Fetch+memory+usage+in+the+consumer
>> >
>> > Now, I'd like to restart the vote.
>> >
>> > On Tue, Dec 6, 2016 at 9:02 AM, Rajini Sivaram
>> >  wrote:
>> > > Hi Mickael,
>> > >
>> > > I am +1 on the overall approach of this KIP, but have a couple of
>> > comments
>> > > (sorry, should have brought them up on the discuss thread earlier):
>> > >
>> > > 1. Perhaps it would be better to do this after KAFKA-4137
>> > >  is implemented? At
>> > the
>> > > moment, coordinator shares the same NetworkClient (and hence the same
>> > > Selector) with consumer connections used for fetching records. Since
>> > > freeing of memory relies on consuming applications invoking poll()
>> after
>> > > processing previous records and potentially after committing offsets,
>> it
>> > > will be good to ensure that coordinator is not blocked for read by
>> fetch
>> > > responses. This may be simpler once coordinator has its own Selector.
>> > >
>> > > 2. The KIP says: *Once messages are returned to the user, messages are
>> > > deleted from the MemoryPool so new messages can be stored.*
>> > > Can you expand that a bit? I am assuming that partial buffers never get
>> > > freed when some messages are returned to the user since the consumer is
>> > > still holding a reference to the buffer. Would buffers be freed when
>> > > fetches for all the partitions in a response are parsed, but perhaps
>> not
>> > > yet returned to the user (i.e., is the memory freed when a reference to
>> > the
>> > > response buffer is no longer required)? It will be good to document the
>> > > (approximate) maximum memory requirement for the non-compressed case.
>> > There
>> > > is data read from the socket, cached in the Fetcher and (as Radai has
>> > > pointed out), the records still with the user application.
>> > >
>> > >
>> > > On Tue, Dec 6, 2016 at 2:04 AM, radai 
>> > wrote:
>> > >
>> > >> +1 (non-binding).
>> > >>
>> > >> small nit pick - just because you returned a response to user doesnt
>> > mean
>> > >> the memory id no longer used. for some cases the actual "point of
>> > >> termination" may be the deserializer (really impl-dependant), but
>> > >> generally, wouldnt it be "nice" to have an explicit dispose() call on
>> > >> responses (with the addition that getting the next batch of data from
>> a
>> > >> consumer automatically disposes the previous results)
>> > >>
>> > >> On Mon, Dec 5, 2016 at 6:53 AM, Edoardo Comar 
>> > wrote:
>> > >>
>> > >> >

Re: [DISCUSS] KIP-128: Add ByteArrayConverter for Kafka Connect

2017-02-27 Thread Ewen Cheslack-Postava
It's a different interface that's being implemented. The functionality is
the same (since it's just a simple pass through), but we intentionally
named Converters differently than Serializers since they do more work than
Serializers (besides the normal serialization they also need to convert
between  and the Connect Data API.

We could certainly reuse/extend that class instead, though I'm not sure
there's much benefit in that and since they implement different interfaces
and this is Connect-specific, it will probably be clearer to have it under
a Connect package. Note that for other Converters the pattern we've used is
to use the corresponding serializer internally and just add in the extra
conversion steps for the data API.

-Ewen

On Sat, Feb 25, 2017 at 6:52 PM, Guozhang Wang  wrote:

> I'm wondering why we can't just use ByteArarySerde in o.a.k.common?
>
> Guozhang
>
> On Sat, Feb 25, 2017 at 2:25 PM, Ewen Cheslack-Postava 
> wrote:
>
> > Hi all,
> >
> > I've added a pretty trivial KIP for adding a pass-through Converter for
> > Kafka Connect, similar to ByteArraySerializer/Deserializer.
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 128%3A+Add+ByteArrayConverter+for+Kafka+Connect
> >
> > This wasn't added with the framework originally because the idea was to
> > deal with structured data for the most part. However, we've seen a couple
> > of use cases arise as the framework got more traction and I think it
> makes
> > sense to provide this out of the box now so people stop reinventing the
> > wheel (and using a different fully-qualified class name) for each
> connector
> > that needs this functionality.
> >
> > I imagine this will be a rather uncontroversial addition, so if I don't
> see
> > any comments in the next day or two I'll just start the vote thread.
> >
> > -Ewen
> >
>
>
>
> --
> -- Guozhang
>


[jira] [Commented] (KAFKA-4808) send of null key to a compacted topic should throw error back to user

2017-02-27 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885759#comment-15885759
 ] 

Ismael Juma commented on KAFKA-4808:


cc [~norwood]

> send of null key to a compacted topic should throw error back to user
> -
>
> Key: KAFKA-4808
> URL: https://issues.apache.org/jira/browse/KAFKA-4808
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.10.2.0
>Reporter: Ismael Juma
> Fix For: 0.10.3.0
>
>
> If a message with a null key is produced to a compacted topic, the broker 
> returns `CorruptRecordException`, which is a retriable exception. As such, 
> the producer keeps retrying until retries are exhausted or request.timeout.ms 
> expires and eventually throws a TimeoutException. This is confusing and not 
> user-friendly.
> We should throw a meaningful error back to the user. From an implementation 
> perspective, we would have to use a non retriable error code to avoid this 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (KAFKA-4808) send of null key to a compacted topic should throw error back to user

2017-02-27 Thread Ismael Juma (JIRA)
Ismael Juma created KAFKA-4808:
--

 Summary: send of null key to a compacted topic should throw error 
back to user
 Key: KAFKA-4808
 URL: https://issues.apache.org/jira/browse/KAFKA-4808
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 0.10.2.0
Reporter: Ismael Juma
 Fix For: 0.10.3.0


If a message with a null key is produced to a compacted topic, the broker 
returns `CorruptRecordException`, which is a retriable exception. As such, the 
producer keeps retrying until retries are exhausted or request.timeout.ms 
expires and eventually throws a TimeoutException. This is confusing and not 
user-friendly.

We should throw a meaningful error back to the user. From an implementation 
perspective, we would have to use a non retriable error code to avoid this 
issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4779) Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py

2017-02-27 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885716#comment-15885716
 ] 

Ismael Juma commented on KAFKA-4779:


The test failed again, this time with a different message:

{code}

test_id:
kafkatest.tests.core.security_rolling_upgrade_test.TestSecurityRollingUpgrade.test_rolling_upgrade_phase_two.broker_protocol=SASL_PLAINTEXT.client_protocol=SSL
status: FAIL
run time:   4 minutes 32.586 seconds


1152 acked message did not make it to the Consumer. They are: 12288, 12289, 
12290, 12291, 12292, 12293, 12294, 12295, 12296, 12297, 12298, 12299, 12300, 
12301, 12302, 12303, 12304, 12305, 12306, 12307...plus 1132 more. Total Acked: 
12184, Total Consumed: 11032. We validated that the first 1000 of these missing 
messages correctly made it into Kafka's data files. This suggests they were 
lost on their way to the consumer.
Traceback (most recent call last):
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 123, in run
data = self.run_test()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
 line 176, in run_test
return self.test_context.function(self.test)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
 line 321, in wrapper
return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
 line 148, in test_rolling_upgrade_phase_two
self.run_produce_consume_validate(self.roll_in_secured_settings, 
client_protocol, broker_protocol)
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 117, in run_produce_consume_validate
self.validate()
  File 
"/var/lib/jenkins/workspace/system-test-kafka/kafka/tests/kafkatest/tests/produce_consume_validate.py",
 line 179, in validate
assert success, msg
AssertionError: 1152 acked message did not make it to the Consumer. They are: 
12288, 12289, 12290, 12291, 12292, 12293, 12294, 12295, 12296, 12297, 12298, 
12299, 12300, 12301, 12302, 12303, 12304, 12305, 12306, 12307...plus 1132 more. 
Total Acked: 12184, Total Consumed: 11032. We validated that the first 1000 of 
these missing messages correctly made it into Kafka's data files. This suggests 
they were lost on their way to the consumer.

{code}

http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-02-26--001.1488103947--apache--trunk--5b682ba/report.html
http://confluent-kafka-system-test-results.s3-us-west-2.amazonaws.com/2017-02-26--001.1488103947--apache--trunk--5b682ba/TestSecurityRollingUpgrade/test_rolling_upgrade_phase_two/broker_protocol=SASL_PLAINTEXT.client_protocol=SSL/62.tgz


> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> 
>
> Key: KAFKA-4779
> URL: https://issues.apache.org/jira/browse/KAFKA-4779
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Rajini Sivaram
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkat

[jira] [Reopened] (KAFKA-4779) Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma reopened KAFKA-4779:


> Failure in kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py
> 
>
> Key: KAFKA-4779
> URL: https://issues.apache.org/jira/browse/KAFKA-4779
> Project: Kafka
>  Issue Type: Bug
>Reporter: Apurva Mehta
>Assignee: Rajini Sivaram
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> This test failed on 01/29, on both trunk and 0.10.2, error message:
> {noformat}
> The consumer has terminated, or timed out, on node ubuntu@worker3.
> Traceback (most recent call last):
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 123, in run
> data = self.run_test()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/tests/runner_client.py",
>  line 176, in run_test
> return self.test_context.function(self.test)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.6.0-py2.7.egg/ducktape/mark/_mark.py",
>  line 321, in wrapper
> return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/core/security_rolling_upgrade_test.py",
>  line 148, in test_rolling_upgrade_phase_two
> self.run_produce_consume_validate(self.roll_in_secured_settings, 
> client_protocol, broker_protocol)
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 100, in run_produce_consume_validate
> self.stop_producer_and_consumer()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 87, in stop_producer_and_consumer
> self.check_alive()
>   File 
> "/var/lib/jenkins/workspace/system-test-kafka-0.10.2/kafka/tests/kafkatest/tests/produce_consume_validate.py",
>  line 79, in check_alive
> raise Exception(msg)
> Exception: The consumer has terminated, or timed out, on node ubuntu@worker3.
> {noformat}
> Looks like the console consumer times out: 
> {noformat}
> [2017-01-30 04:56:00,972] ERROR Error processing message, terminating 
> consumer process:  (kafka.tools.ConsoleConsumer$)
> kafka.consumer.ConsumerTimeoutException
> at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:90)
> at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120)
> at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75)
> at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50)
> at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala)
> {noformat}
> A bunch of these security_rolling_upgrade tests failed, and in all cases, the 
> producer produced ~15k messages, of which ~7k were acked, and the consumer 
> only got around ~2600 before timing out. 
> There are a lot of messages like the following for different request types on 
> the producer and consumer:
> {noformat}
> [2017-01-30 05:13:35,954] WARN Received unknown topic or partition error in 
> produce request on partition test_topic-0. The topic/partition may not exist 
> or the user may not have Describe access to it 
> (org.apache.kafka.clients.producer.internals.Sender)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] kafka pull request #2600: KAFKA-4806: Prevent double logging of ConsumerConf...

2017-02-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2600


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (KAFKA-4806) KafkaConsumer: ConsumerConfig gets logged twice.

2017-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885638#comment-15885638
 ] 

ASF GitHub Bot commented on KAFKA-4806:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/2600


> KafkaConsumer: ConsumerConfig gets logged twice.
> 
>
> Key: KAFKA-4806
> URL: https://issues.apache.org/jira/browse/KAFKA-4806
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, log
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1, 0.10.2.0
>Reporter: Marco Ebert
>Priority: Minor
>  Labels: easyfix, github-import, newbie, patch
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> The ConsumerConfig created from given Properties during KafkaConsumer 
> construction gets logged twice since the ConsumerConfig constructor does so.
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L587
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L643



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (KAFKA-4806) KafkaConsumer: ConsumerConfig gets logged twice.

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma resolved KAFKA-4806.

   Resolution: Fixed
Fix Version/s: 0.10.3.0

Issue resolved by pull request 2600
[https://github.com/apache/kafka/pull/2600]

> KafkaConsumer: ConsumerConfig gets logged twice.
> 
>
> Key: KAFKA-4806
> URL: https://issues.apache.org/jira/browse/KAFKA-4806
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, log
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1, 0.10.2.0
>Reporter: Marco Ebert
>Priority: Minor
>  Labels: easyfix, github-import, newbie, patch
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> The ConsumerConfig created from given Properties during KafkaConsumer 
> construction gets logged twice since the ConsumerConfig constructor does so.
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L587
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L643



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (KAFKA-4569) Transient failure in org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable

2017-02-27 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885611#comment-15885611
 ] 

Ismael Juma edited comment on KAFKA-4569 at 2/27/17 11:27 AM:
--

A few comments:

1. We are using a MockTime without auto-tick, so time never advances.
2. As such, we never send a heartbeat request and the comment about adjusting 
the auto commit interval seems to be misleading.
3. The `client.pollNoWakeup()` call in `HeartbeatThread` is what causes the 
fetch to be completed by the heartbeat thread. This happens on each iteration 
of `run` before the various checks and independently of whether a heartbeat 
request needs to be sent.
4. Based on the documentation for `KafkaConsumer.wakeup()`, it seems like there 
is an actual bug in the code and not the test (introduced when the Heartbeat 
thread was introduced in 0.10.1.0).
5. There are a few potential ways to fix this, but I'll leave it to 
[~hachikuji] to suggest the cleanest one as he probably has an opinion on that.


was (Author: ijuma):
A few comments:

1. We are using a MockTime without auto-tick, so time never advances.
2. As such, we never send a heartbeat request and the comment about adjusting 
the auto commit internal seems to be misleading.
3. The `client.pollNoWakeup()` call in `HeartbeatThread` is what causes the 
fetch to be completed by the heartbeat thread. This happens on each iteration 
of `run` before the various checks and independently of whether a heartbeat 
request needs to be sent.
4. Based on the documentation for `KafkaConsumer.wakeup()`, it seems like there 
is an actual bug in the code and not the test (introduced when the Heartbeat 
thread was introduced in 0.10.1.0).
5. There are a few potential ways to fix this, but I'll leave it to 
[~hachikuji] to suggest the cleanest one as he probably has an opinion on that.

> Transient failure in 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable
> -
>
> Key: KAFKA-4569
> URL: https://issues.apache.org/jira/browse/KAFKA-4569
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Guozhang Wang
>Assignee: Umesh Chaudhary
>  Labels: newbie
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> One example is:
> https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/370/testReport/junit/org.apache.kafka.clients.consumer/KafkaConsumerTest/testWakeupWithFetchDataAvailable/
> {code}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.fail(Assert.java:95)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable(KafkaConsumerTest.java:679)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.inte

[jira] [Comment Edited] (KAFKA-4569) Transient failure in org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable

2017-02-27 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885611#comment-15885611
 ] 

Ismael Juma edited comment on KAFKA-4569 at 2/27/17 11:28 AM:
--

A few comments:

1. We are using a MockTime without auto-tick, so time never advances.
2. As such, we never send a heartbeat request and the comment about adjusting 
the auto commit interval seems to be misleading.
3. The `client.pollNoWakeup()` call in `HeartbeatThread` is what causes the 
fetch to be completed by the heartbeat thread. This happens on each iteration 
of `run` before the various checks and independently of whether a heartbeat 
request needs to be sent.
4. Based on the documentation for `KafkaConsumer.wakeup()`, it seems like there 
is an actual bug in the code and not the test (introduced when the Heartbeat 
thread was added in 0.10.1.0).
5. There are a few potential ways to fix this, but I'll leave it to 
[~hachikuji] to suggest the cleanest one as he probably has an opinion on that.


was (Author: ijuma):
A few comments:

1. We are using a MockTime without auto-tick, so time never advances.
2. As such, we never send a heartbeat request and the comment about adjusting 
the auto commit interval seems to be misleading.
3. The `client.pollNoWakeup()` call in `HeartbeatThread` is what causes the 
fetch to be completed by the heartbeat thread. This happens on each iteration 
of `run` before the various checks and independently of whether a heartbeat 
request needs to be sent.
4. Based on the documentation for `KafkaConsumer.wakeup()`, it seems like there 
is an actual bug in the code and not the test (introduced when the Heartbeat 
thread was introduced in 0.10.1.0).
5. There are a few potential ways to fix this, but I'll leave it to 
[~hachikuji] to suggest the cleanest one as he probably has an opinion on that.

> Transient failure in 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable
> -
>
> Key: KAFKA-4569
> URL: https://issues.apache.org/jira/browse/KAFKA-4569
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Guozhang Wang
>Assignee: Umesh Chaudhary
>  Labels: newbie
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> One example is:
> https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/370/testReport/junit/org.apache.kafka.clients.consumer/KafkaConsumerTest/testWakeupWithFetchDataAvailable/
> {code}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.fail(Assert.java:95)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable(KafkaConsumerTest.java:679)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.

[jira] [Commented] (KAFKA-4569) Transient failure in org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable

2017-02-27 Thread Ismael Juma (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885611#comment-15885611
 ] 

Ismael Juma commented on KAFKA-4569:


A few comments:

1. We are using a MockTime without auto-tick, so time never advances.
2. As such, we never send a heartbeat request and the comment about adjusting 
the auto commit internal seems to be misleading.
3. The `client.pollNoWakeup()` call in `HeartbeatThread` is what causes the 
fetch to be completed by the heartbeat thread. This happens on each iteration 
of `run` before the various checks and independently of whether a heartbeat 
request needs to be sent.
4. Based on the documentation for `KafkaConsumer.wakeup()`, it seems like there 
is an actual bug in the code and not the test (introduced when the Heartbeat 
thread was introduced in 0.10.1.0).
5. There are a few potential ways to fix this, but I'll leave it to 
[~hachikuji] to suggest the cleanest one as he probably has an opinion on that.

> Transient failure in 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable
> -
>
> Key: KAFKA-4569
> URL: https://issues.apache.org/jira/browse/KAFKA-4569
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Guozhang Wang
>Assignee: Umesh Chaudhary
>  Labels: newbie
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> One example is:
> https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/370/testReport/junit/org.apache.kafka.clients.consumer/KafkaConsumerTest/testWakeupWithFetchDataAvailable/
> {code}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.fail(Assert.java:95)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable(KafkaConsumerTest.java:679)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker

[jira] [Updated] (KAFKA-4569) Transient failure in org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable

2017-02-27 Thread Ismael Juma (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-4569:
---
Fix Version/s: 0.10.2.1

> Transient failure in 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable
> -
>
> Key: KAFKA-4569
> URL: https://issues.apache.org/jira/browse/KAFKA-4569
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Guozhang Wang
>Assignee: Umesh Chaudhary
>  Labels: newbie
> Fix For: 0.10.3.0, 0.10.2.1
>
>
> One example is:
> https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/370/testReport/junit/org.apache.kafka.clients.consumer/KafkaConsumerTest/testWakeupWithFetchDataAvailable/
> {code}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.fail(Assert.java:95)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumerTest.testWakeupWithFetchDataAvailable(KafkaConsumerTest.java:679)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.rules.ExpectedException$ExpectedExceptionStatement.evaluate(ExpectedException.java:239)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:114)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:57)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:66)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:109)
>   at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:377)
>   at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
>   at 
> org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
>   at 
> java.util.concurrent.ThreadPoolExec

[jira] [Updated] (KAFKA-4806) KafkaConsumer: ConsumerConfig gets logged twice.

2017-02-27 Thread Marco Ebert (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Ebert updated KAFKA-4806:
---
Fix Version/s: 0.10.2.1

> KafkaConsumer: ConsumerConfig gets logged twice.
> 
>
> Key: KAFKA-4806
> URL: https://issues.apache.org/jira/browse/KAFKA-4806
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, log
>Affects Versions: 0.10.0.0, 0.10.0.1, 0.10.1.0, 0.10.1.1, 0.10.2.0
>Reporter: Marco Ebert
>Priority: Minor
>  Labels: easyfix, github-import, newbie, patch
> Fix For: 0.10.2.1
>
>
> The ConsumerConfig created from given Properties during KafkaConsumer 
> construction gets logged twice since the ConsumerConfig constructor does so.
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L587
> https://github.com/apache/kafka/blob/0.10.0.1/clients/src/main/java/org/apache/kafka/clients/consumer/KafkaConsumer.java#L643



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Re: KIP-122: Add a tool to Reset Consumer Group Offsets

2017-02-27 Thread Jorge Esteban Quilcate Otoya
@Vahid: make sense to add "new lag" info IMO, I will update the KIP.

@Becket:

1. About deleting, I think ConsumerGroupCommand already has an option to
delete Group information by topic. From delete docs: "Pass in groups to
delete topic partition offsets and ownership information over the entire
consumer group.". Let me know if this solves is enough for your case, of we
can consider to add something to the Reset Offsets tool.

2. Yes, for instance in the case of active consumers, the tool will
validate that there are no active consumers to avoid race conditions. I
have added some code snippets to the wiki, thanks for pointing that out.

El sáb., 25 feb. 2017 a las 0:29, Becket Qin ()
escribió:

> Thanks for the KIP Jorge. I think this is a useful KIP. I haven't read the
> KIP in detail yet, some comments from a quick review:
>
> 1. A glance at it it seems that there is no delete option. At LinkedIn we
> identified some cases that users want to delete the committed offset of a
> group. It would be good to include that as well.
>
> 2. It seems the KIP is missing some necessary implementation key points.
> e.g. how would the tool to commit offsets for a consumer group, does the
> broker need to know this is a special tool instead of an active consumer in
> the group (the generation check will be made on offset commit)? They are
> probably in your proof of concept code. Could you add them to the wiki as
> well?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, Feb 24, 2017 at 1:19 PM, Vahid S Hashemian <
> vahidhashem...@us.ibm.com> wrote:
>
> > Thanks Jorge for addressing my question/suggestion.
> >
> > One last thing. I noticed is that in the example you have for the "plan"
> > option
> > (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 122%3A+Add+Reset+Consumer+Group+Offsets+tooling#KIP-122:
> > AddResetConsumerGroupOffsetstooling-ExecutionOptions
> > )
> > under "Description" column, you put 0 for lag. So I assume that is the
> > current lag being reported, and not the new lag. Might be helpful to
> > explicitly specify that (i.e. CURRENT-LAG) in the column header.
> > The other option is to report both current and new lags, but I understand
> > if we don't want to do that since it's rather redundant info.
> >
> > Thanks again.
> > --Vahid
> >
> >
> >
> > From:   Jorge Esteban Quilcate Otoya 
> > To: dev@kafka.apache.org
> > Date:   02/24/2017 12:47 PM
> > Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets
> >
> >
> >
> > Hi Vahid,
> >
> > Thanks for your comments. Check my answers below:
> >
> > El vie., 24 feb. 2017 a las 19:41, Vahid S Hashemian (<
> > vahidhashem...@us.ibm.com>) escribió:
> >
> > > Hi Jorge,
> > >
> > > Thanks for the useful KIP.
> > >
> > > I have a question regarding the proposed "plan" option.
> > > The "current offset" and "lag" values of a topic partition are
> > meaningful
> > > within a consumer group. In other words, different consumer groups
> could
> > > have different values for these properties of each topic partition.
> > > I don't see that reflected in the discussion around the "plan" option.
> > > Unless we are assuming a "--group" option is also provided by user
> > (which
> > > is not clear from the KIP if that is the case).
> > >
> >
> > I have added an additional comment to state that this options will
> require
> > a "group" argument.
> > It is considered to affect only one Consumer Group.
> >
> >
> > >
> > > Also, I was wondering if you can provide at least one full command
> > example
> > > for each of the "plan", "execute", and "export" options. They would
> > > definitely help in understanding some of the details.
> > >
> > >
> > Added to the KIP.
> >
> >
> > > Sorry for the delayed question/suggestion. I hope they make sense.
> > >
> > > Thanks.
> > > --Vahid
> > >
> > >
> > >
> > > From:   Jorge Esteban Quilcate Otoya 
> > > To: dev@kafka.apache.org
> > > Date:   02/24/2017 09:51 AM
> > > Subject:Re: KIP-122: Add a tool to Reset Consumer Group Offsets
> > >
> > >
> > >
> > > Great! KIP updated.
> > >
> > >
> > >
> > > El vie., 24 feb. 2017 a las 18:22, Matthias J. Sax
> > > ()
> > > escribió:
> > >
> > > > I like this!
> > > >
> > > > --by-duration and --shift-by
> > > >
> > > >
> > > > -Matthias
> > > >
> > > > On 2/24/17 12:57 AM, Jorge Esteban Quilcate Otoya wrote:
> > > > > Renaming to --by-duration LGTM
> > > > >
> > > > > Not sure about changing it to --shift-by-duration because we could
> > end
> > > up
> > > > > with the same redundancy as before with reset: --reset-offsets
> > > > > --reset-to-*.
> > > > >
> > > > > Maybe changing --shift-offset-by to --shift-by 'n' could make it
> > > > consistent
> > > > > enough?
> > > > >
> > > > >
> > > > > El vie., 24 feb. 2017 a las 6:39, Matthias J. Sax (<
> > > > matth...@confluent.io>)
> > > > > escribió:
> > > > >
> > > > >> I just read the update KIP once more.
> > > > >>
> > > > >> I would suggest to rename --to-duration to --by-duration
> > > > >>
> >